@t3p3tony @chrishamm @Phaedrux @dc42
Update: I've been stress testing the machine after swapping the Pi again and tweaking some of the wiring runs to help avoid signal noise. The last 4 days I've started a 12 hour print every morning so that I could have it printing as long as possible without having to leave it running while I'm asleep.
The first 3 days went beautifully but today I got another of the same failures.
However, this time was a little different...
Previously the SBC Pi was running a dhcp server to provide an IP for a direct ethernet connected Pi running android and a touch screen DWC interface.
I ditched that when I rebuilt the DuetPi Lite image and instead installed a small WiFi/Ethernet router. So now the SBC Pi and DWC Android Pi are wired to that and I can connect to the machine over it's own WiFi access point (this is so I can get to it at MRRF ).
The advantage of this was that previously when it would fail, I would never be able to connect to the Pi over SSH which limited my ability to diagnose. But this time I could hit its ethernet connection (which, unlike the WiFi, doesn't go down when the failure occurs it would seem) via the WiFi router I added.
So that led me to digging into various system logs and eventually dmesg, where I found a stream of errors, such as these:
[31055.833034] ieee80211 phy0: brcmf_cfg80211_get_station: GET STA INFO failed, -110
[31062.393030] ieee80211 phy0: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110
[31064.953033] ieee80211 phy0: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110
[31064.953060] ieee80211 phy0: brcmf_cfg80211_get_station: GET STA INFO failed, -110
[31071.513043] ieee80211 phy0: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110
And it was those that led me to this Duet Forum post:
https://forum.duet3d.com/topic/23889/pi4-network-disconnect
Now, the crazy thing is: they were seeing exactly what I was.
- SPI comms reset
- WiFi dropping
- Print stopping midway through and DWC showing it was done
- Using a Pi 4 with a Duet 3 (mini, but still)
Fortunately, they seem to have a solution! In that post they recommended not using 5Ghz networks and disabling the wifi power management features. From further research it seems that WiFi with 5GHz is somewhat of an ongoing issue with the Pi 4 and it will sometimes just kill WiFi when it shouldn't.... Now, why that seems to affect SPI, I have no idea. But they seem to be related...
While I could disable power saving on WiFi, the 2.4GHz network part was a problem for silly reasons with my home WiFi access point.
So, for now I've switched to only using ethernet and have connected a separate USB ethernet adapter for the connection between the SBC Pi and the DWC Android Pi.
Whether or not this will also fix the problem for me... time will tell. I'll be back to long test prints tomorrow morning and let you know how it goes in a few days.
But so far this seems promising given they were seeing pretty identical symptoms. My fallback at this point is tearing a Pi 3B+ out of something else in my house and using that. I have another Duet 3 machine with a 3B+ and it's been rock solid.