Pi4 Network Disconnect.
-
Anyone else have their Pi4 just drop off the network and never come back? I've got 2 pi 4's hooked up to Duet3 mini 5's. The Pi's are online all the time and turn VIN on/off on the boards. Nothing seems to be a problem with Duet. I'm sure this is just a Pi issue. I can walk over and run the printer from the panel still no problem but I've lost all network connectivity.
I turned off power management on the Wifi ports but that's not helping.Any words of wisdom out there? I'm a veteran UNIX admin so I'm positive I can get them to stay connected one way or another. But just thought I ask incase there is something simple or if I just add this to the list of "That Linux Crap" grumbles...
Next steps:
Change down to the 2.4Ghz band and see if that helps.
Hook them up wired (may do this anyway since I can firewall them off easier from the rest of the house and some other ideas I have to integrating them into a larger management webpage.) -
What exactly is disconnecting? Just unable to reach DWC or is the Pi actually off the network? Do you have an HDMI display on the Pi to see what it's actually doing?
-
@phaedrux Pi totally off the net. No ping response. But as I say I can walk over and start up a print from gcode already uploaded (print again) and run macros do whatever so everything seems to still work just can't connect over the network. I have switched them both over to my 2.4Ghz SSID see if that helps. I've read that the 5Ghz mode in the PI4 can have issues.
-
Getting an HDMI display and keyboard mouse for a local session would give you a better idea of why it's dropping off the network.
-
@phaedrux Yeah I have enough computer crap I can hook something up. when/if it happens again
-
@phaedrux I was in error the problem does seem to be DCS related. Work up today to a stoped print. DWC and PI were inaccessible but the PanelDue reported "finished" but the print was only 60% complete and the nozzle was stopped (and cold) right in the middle of the print.
Looking throught the daemon.log on the pi I find:
Jun 26 04:02:17 Dabus-E5Pd DuetControlServer[793]: [info] System time has been changed Jun 26 04:02:17 Dabus-E5Pd DuetControlServer[793]: [warn] Controller has been reset Jun 26 04:02:17 Dabus-E5Pd DuetControlServer[793]: [warn] Trigger: Out-of-order reply: '' Jun 26 04:02:17 Dabus-E5Pd DuetControlServer[793]: [info] Aborted job file Jun 26 04:02:26 Dabus-E5Pd DuetControlServer[793]: [info] System time has been changed Jun 26 04:02:35 Dabus-E5Pd DuetControlServer[793]: [info] System time has been changed Jun 26 04:02:44 Dabus-E5Pd DuetControlServer[793]: [info] System time has been changed Jun 26 04:02:53 Dabus-E5Pd DuetControlServer[793]: [info] System time has been changed Jun 26 04:03:02 Dabus-E5Pd DuetControlServer[793]: [info] System time has been changed Jun 26 04:03:12 Dabus-E5Pd DuetControlServer[793]: [info] System time has been changed Jun 26 04:03:12 Dabus-E5Pd DuetControlServer[793]: [warn] Controller has been reset Jun 26 04:03:12 Dabus-E5Pd DuetControlServer[793]: [warn] Trigger: Out-of-order reply: '' . . . Jun 26 04:18:25 Dabus-E5Pd DuetWebServer[646]: DuetWebServer.Controllers.WebSocketController[0] [WebSocketController] Connection from ::ffff:192.168.10.6:62026 terminated with an exception System.Net.WebSockets.WebSocketException (2): The remote party closed the WebSocket connection without completing the close handshake. ---> System.Net.Sockets.SocketException (110): Connection timed out at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketAwaitableEventArgs.<GetResult>g__ThrowSocketException|7_0(SocketError e) at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketAwaitableEventArgs.GetResult() at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketConnection.ProcessReceives() at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketConnection.DoReceive() at System.IO.Pipelines.PipeCompletion.ThrowLatchedException() at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result) at System.IO.Pipelines.Pipe.GetReadAsyncResult() at System.IO.Pipelines.Pipe.DefaultPipeReader.GetResult(Int16 token) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1UpgradeMessageBody.ReadAsyncInternalAwaited(ValueTask`1 readTask, CancellationToken cancellationToken) at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpRequestStream.ReadAsyncInternal(Memory`1 destination, CancellationToken cancellationToken) at System.Net.WebSockets.ManagedWebSocket.EnsureBufferContainsAsync(Int32 minimumRequiredBytes, CancellationToken cancellationToken, Boolean throwOnPrematureClosure) at System.Net.WebSockets.ManagedWebSocket.ReceiveAsyncPrivate[TWebSocketReceiveResultGetter,TWebSocketReceiveResult](Memory`1 payloadBuffer, CancellationToken cancellationToken, TWebSocketReceiveResultGetter resultGetter) at System.Net.WebSockets.ManagedWebSocket.ReceiveAsyncPrivate[TWebSocketReceiveResultGetter,TWebSocketReceiveResult](Memory`1 payloadBuffer, CancellationToken cancellationToken, TWebSocketReceiveResultGetter resultGetter) at DuetWebServer.Controllers.WebSocketController.ReadFromClient(WebSocket webSocket, AsyncAutoResetEvent dataAcknowledged, CancellationToken cancellationToken) in /home/christian/Duet3D/DuetSoftwareFramework/src/DuetWebServer/Controllers/WebSocketController.cs:line 207 Jun 26 04:18:25 Dabus-E5Pd DuetWebServer[646]: DuetWebServer.Controllers.WebSocketController[0] WebSocket disconnected from ::ffff:192.168.10.6:62026 Jun 26 04:18:25 Dabus-E5Pd DuetWebServer[646]: Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker[2] Executed action DuetWebServer.Controllers.WebSocketController.Get (DuetWebServer) in 29936151.5672ms Jun 26 04:18:25 Dabus-E5Pd DuetWebServer[646]: Microsoft.AspNetCore.Routing.EndpointMiddleware[1] Executed endpoint 'DuetWebServer.Controllers.WebSocketController.Get (DuetWebServer)' Jun 26 04:18:25 Dabus-E5Pd DuetWebServer[646]: Microsoft.AspNetCore.Hosting.Diagnostics[2] Request finished HTTP/1.1 GET http://192.168.10.80/machine - - - 101 - - 29936154.0028ms Jun 26 04:18:33 Dabus-E5Pd DuetControlServer[793]: [info] System time has been changed Jun 26 04:18:42 Dabus-E5Pd DuetControlServer[793]: [info] System time has been changed Jun 26 04:18:42 Dabus-E5Pd DuetControlServer[793]: [warn] Controller has been reset Jun 26 04:18:42 Dabus-E5Pd DuetControlServer[793]: [warn] Trigger: Out-of-order reply: ''
and that all just repeats until i kicked it when I got up. This is a stock DuetPI lite image. and its happening on 2 Pi's only thing I have added is mjpg-streamer to run a webcam
-
Can you provide the results of M122 please so we may see the firmware and DCS versions?
-
@phaedrux sure
6/26/2021, 11:10:25 AM m122 === Diagnostics === RepRapFirmware for Duet 3 Mini 5+ version 3.3 (2021-06-15 21:46:11) running on Duet 3 Mini5plus WiFi (SBC mode) Board ID: 9UD2K-6096U-D65J0-40KM8-2H03Z-Z5VSJ Used output buffers: 1 of 40 (26 max) === RTOS === Static ram: 102724 Dynamic ram: 94280 of which 0 recycled Never used RAM 46700, free system stack 120 words Tasks: SBC(ready,9.0%,338) HEAT(notifyWait,0.1%,344) Move(notifyWait,0.8%,276) CanReceiv(notifyWait,0.0%,941) CanSender(notifyWait,0.0%,357) CanClock(delaying,0.0%,332) TMC(notifyWait,2.6%,80) MAIN(running,85.8%,524) IDLE(ready,0.0%,29) AIN(delaying,1.8%,264), total 100.0% Owned mutexes: HTTP(MAIN) === Platform === Last reset 03:01:17 ago, cause: power up Last software reset at 2021-06-26 08:08, reason: User, none spinning, available RAM 46700, slot 0 Software reset code 0x0012 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x00000000 BFAR 0xe000ed38 SP 0x00000000 Task SBC Freestk 0 n/a Error status: 0x00 Aux0 errors 0,0,0 MCU revision 3, ADC conversions started 10877941, completed 10877940, timed out 0, errs 0 Step timer max interval 1477 MCU temperature: min 28.9, current 36.1, max 36.3 Supply voltage: min 0.4, current 24.4, max 24.6, under voltage events: 0, over voltage events: 0, power good: yes Heap OK, handles allocated/used 0/0, heap memory allocated/used/recyclable 0/0/0, gc cycles 0 Driver 0: position 13603, ok, SG min/max 0/436, read errors 0, write errors 0, ifcnt 13, reads 3613, writes 13, timeouts 0, DMA errors 0 Driver 1: position 16299, ok, SG min/max 0/510, read errors 0, write errors 0, ifcnt 13, reads 3613, writes 13, timeouts 0, DMA errors 0 Driver 2: position 17280, standstill, SG min/max 0/296, read errors 0, write errors 0, ifcnt 13, reads 3612, writes 13, timeouts 0, DMA errors 0 Driver 3: position 0, ok, SG min/max 0/510, read errors 0, write errors 0, ifcnt 11, reads 3614, writes 11, timeouts 0, DMA errors 0 Driver 4: position 0, standstill, SG min/max 0/248, read errors 0, write errors 0, ifcnt 13, reads 3613, writes 13, timeouts 0, DMA errors 0 Driver 5: position 0, standstill, SG min/max 0/0, read errors 0, write errors 0, ifcnt 9, reads 3617, writes 9, timeouts 0, DMA errors 0 Driver 6: position 0, standstill, SG min/max 0/0, read errors 0, write errors 0, ifcnt 9, reads 3616, writes 9, timeouts 0, DMA errors 0 Date/time: 2021-06-26 11:10:25 Cache data hit count 4294967295 Slowest loop: 162.53ms; fastest: 0.07ms === Storage === Free file entries: 10 SD card 0 not detected, interface speed: 0.0MBytes/sec SD card longest read time 0.0ms, write time 0.0ms, max retries 0 === Move === DMs created 83, maxWait 672202ms, bed compensation in use: mesh, comp offset 0.000 === MainDDARing === Scheduled moves 60499, completed moves 60472, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state 3 === AuxDDARing === Scheduled moves 0, completed moves 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === Heat === Bed heaters = 0 -1, chamberHeaters = -1 -1 Heater 0 is on, I-accum = 0.3 Heater 1 is on, I-accum = 0.4 === GCodes === Segments left: 1 Movement lock held by null HTTP* is doing "M122" in state(s) 0 Telnet is idle in state(s) 0 File* is doing "G1 X173.376999 Y199.276001 E0.144710" in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger* is idle in state(s) 0 Queue* is idle in state(s) 0 LCD is idle in state(s) 0 SBC is idle in state(s) 0 Daemon is idle in state(s) 0 Aux2 is idle in state(s) 0 Autopause is idle in state(s) 0 Code queue is empty. === Filament sensors === Extruder 0 sensor: ok === CAN === Messages queued 97808, received 0, lost 0, longest wait 0ms for reply type 0, peak Tx sync delay 0, free buffers 17 (min 17), ts 54388/0/0 Tx timeouts 0,4,54387,0,0,43414 last cancelled message type 4514 dest 127 === SBC interface === State: 4, failed transfers: 1, checksum errors: 0 Last transfer: 5ms ago RX/TX seq numbers: 12355/12355 SPI underruns 0, overruns 0 Disconnects: 0, timeouts: 0, IAP RAM available 0x10638 Buffer RX/TX: 4000/1320-4096 === Duet Control Server === Duet Control Server v3.3.0 File: Buffered code: G1 X173.377 Y199.276 E0.14471 Buffered code: G1 X173.971 Y199.276 E0.02073 Buffered code: G1 X170.446 Y202.800 E0.17402 Buffered code: G1 X170.446 Y203.394 E0.02073 Buffered code: G1 X174.564 Y199.276 E0.20334 Buffered code: G1 X175.158 Y199.276 E0.02073 Buffered code: G1 X170.446 Y203.988 E0.23266 Buffered code: G1 X170.446 Y204.401 E0.01442 Buffered code: G1 X170.627 Y204.401 E0.00631 Buffered code: G1 X175.752 Y199.276 E0.25305 Buffered code: G1 X176.346 Y199.276 E0.02073 Buffered code: G1 X171.221 Y204.401 E0.25305 Buffered code: G1 X171.814 Y204.401 E0.02073 Buffered code: G1 X176.940 Y199.276 E0.25305 Buffered code: G1 X177.533 Y199.276 E0.02073 Buffered code: G1 X172.408 Y204.401 E0.25305 Buffered code: G1 X173.002 Y204.401 E0.02073 Buffered code: G1 X178.127 Y199.276 E0.25305 Buffered code: G1 X178.721 Y199.276 E0.02073 Buffered code: G1 X173.596 Y204.401 E0.25305 Buffered code: G1 X174.190 Y204.401 E0.02073 Buffered code: G1 X178.902 Y199.689 E0.23267 Buffered code: G1 X178.902 Y200.283 E0.02073 Buffered code: G1 X174.783 Y204.401 E0.20335 Buffered code: G1 X175.377 Y204.401 E0.02073 Buffered code: G1 X178.902 Y200.876 E0.17403 Buffered code: G1 X178.902 Y201.470 E0.02073 Buffered code: G1 X175.971 Y204.401 E0.14471 Buffered code: G1 X176.565 Y204.401 E0.02073 Buffered code: G1 X178.902 Y202.064 E0.11539 Buffered code: G1 X178.902 Y202.658 E0.02073 Buffered code: G1 X177.159 Y204.401 E0.08608 ==> 1536 bytes Code buffer space: 2584 Configured SPI speed: 8000000Hz Full transfers per second: 42.83, max wait times: 46.7ms/0.0ms Codes per second: 41.22 Maximum length of RX/TX data transfers: 3660/1684 File /opt/dsf/sd/gcodes/ExoSlide BMG-M_0.2mm_PETG_ENDER5PLUS_9h23m.gcode is selected, processing
-
@nurgelrot Please try to replace the microSD card (if you get a new one ideally A1/A2 rated) and check if that improves things. We've had some reports similar to yours which were caused by slow I/O (bad SD cards). You might find I/O errors in the SBC's system log too.
-
@chrishamm Well A2 SD cards didn't help. Unfortunately the one that failed this time is mounted in a way I can;t hook a screen up. The other one now has a hdmi monitor attached so if it fails maybe I can get more data... The good news is the print on the unreachable DWC session is still going so hopefully it will finish okay and I wont have wasted a print. Maybe I'll be able to get some info out of the PanelDue. before I reset the pi.
-
@nurgelrot Do you still get
System time has been changed
reports in the log? If yes, that indicates excessive load on the Pi. -
@chrishamm well seems that its the Wifi NIC in this case I'll go look at the other PI's last failure and see if I can see something similar. But that time the print aborted - I normally just cant connect and the print finishes just fine.
Anyway in last night's case the print completed but I could not reach the pi over the net and my syslog is full of this:
19:11:26 Dabus-E3d kernel: [18170.601329] brcmfmac: brcmf_sdio_txfail: sdio error, abort command and terminate frame
Which there seems to be a number of causes and "fixes" could be power could be firmware could be load. I know I'm getting good power I'm doing the the rpi-update right now to to make sure I have all the latest firmware for the Pi's.
Nothing is running on this pi but Duet software [Edit: and the massive bloat that is Linux these days] and its running the Duet non gui image and the load is always over 50% (usually over 90) idle so load is unlikely to be the problem unless its some specific bottleneck someplace.
See how this firmware update goes... have and active window open to the pi following the logs real time kicking off another print...
-
Patching the firmware using
rpi-update
seems to have corrected the issue on one of the now 3 raspberry Pi4's I have. It rendered the other two unbootable... Don't know what to do. I purchased all these at the same time form the same vender maybe a bad batch?The just stop transmitting. Their NIC's are online but the cant reach anything but themselves. soon as i plug in a diffrent wifi or cable (and down the bad nic) they work fine again.
Removing and reloading the Broadcom WiFi kernel modules is the only thing that repairs the issue for the internal wifi... Guess I'm going to go wired or use my working for years Pi3's. If I goo 100% wired I'll switch to Odroid C4's. I hate Pi's almost as much as I hate Linux at this point.
-
@nurgelrot said in Pi4 Network Disconnect.:
Patching the firmware using rpi-update seems to have corrected the issue on one of the now 3 raspberry Pi4's I have. It rendered the other two unbootable... Don't know what to do. I purchased all these at the same time form the same vender maybe a bad batch?
https://www.raspberrypi.org/documentation/raspbian/applications/rpi-update.md
-
I think I have found the secret sauce. For whatever reason these PI 4B's don't like Power management when running in 5GHz this is a known issue. But in my case they seem to hate 5GHz all together. However, I've been up and running for a day with no errors at all (a first) By pinning them onto a 2.4Ghz SSID and turning off power management.
I stuck:
iw dev wlan0 set power_save off
into /etc/rc.local
They appear happy but I'll keep watching them:
For the record my access point is a TP-Link Archer A9 v6.0 acting only in AP mode.
The router/dhcp is OpenBSD 6.9 amd64.Marking Solved for now. If I can figure out how
-
@nurgelrot said in Pi4 Network Disconnect.:
Marking Solved for now. If I can figure out how
Sorry to dig up old posts, but can you confirm that the problem was gone after you did this?
I'm having the exact same problem. -
-
@adammhaile Well they where working until I replaced them With Odroid C4's and then pulled off the odriod's from all but 1 printer. SBC mode has been more trouble than its worth.