[3.6.0-beta.3] DWC connection issues Duet2 / Duex5
-
@droftarts
hi, I do not have a panel due connected.It's just a Duet2 Wifi and a Duex5 connected via ribbon cable and a Raspberry serving the access point, where the duet2 is connected to.
-
@droftarts
maybe it is related, as the M122 is not timed out, its rather the rr_model request. Maybe the M122 is causing a delay due to the duex5 subquery in the rr_model response. -
@dc42
can you give me a hint what I can do to track this error down?I'm able to reproduce it within seconds.
Thanks
Tim -
@timschneider
Is it possible you have a variation of my problem which has been solver?https://forum.duet3d.com/topic/37329/3-6-0-beta-3-duet-2-issues-lost-communication/3
-
@timschneider I've updated my Mini 5+ to WiFi firmware v2.2.1, which is the beta.3 release, and tried the same test, but I don't get a disconnect. I hit M122 repeatedly 22 times. It seems like I only got 18 responses, but may be I'd hit return before the M122 had returned fully. I also tested with the WiFi firmware v2.2.0 (in the 3.5.4 release), and that was fine.
Can you revert the WiFi firmware to v2.2.0 and see if still causes the same problem?
@DaveA May possibly be related, but yours was specific to the Duet 2 Ethernet, this is WiFi. But then I don't write firmware!
Ian
-
@droftarts
thank you for your tests!I was able to hook up via USB and grab some debug logs.
It is somehow related to the connection mangement and kind of ran out of buffer, even if it appears that the buffers are wasted.
HTTP req, command words { GET /rr_model HTTP/1.0 }, parameters { key= flags=fd99 }<\n> New conn on socket 2 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n> Received 287 bytes<\n> HTTP req, command words { GET /rr_model HTTP/1.0 }, parameters { key= flags=fd99 }<\n> New conn on socket 3 for local port 80<\n> New conn on socket 3 for local port 80<\n> New conn on socket 3 for local port 80<\n> New conn on socket 3 for local port 80<\n> New conn on socket 3 for local port 80<\n> New conn on socket 3 for local port 80<\n> ... New conn on socket 3 for local port 80<\n> New conn on socket 3 for local port 80<\n> New conn on socket 3 for local port 80<\n> New conn on socket 3 for local port 80<\n> New conn on socket 3 for local port 80<\n> New conn on socket 3 for local port 80<\n> New conn on socket 3 for local port 80<\n> New conn on socket 3 for local port 80<\n> New conn on socket 3 for local port 80<\n> New conn on socket 3 for local port 80<\n> Ran out of output buffers at ../src/Networking/HttpResponder.cpp(1109)<\n> New conn on socket 3 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n> Received 287 bytes<\n>
HTTP req, command words { GET /rr_model HTTP/1.0 }, parameters { key= flags=fd99 }<\n> New conn on socket 0 for local port 80<\n> New conn on socket 0 for local port 80<\n> New conn on socket 0 for local port 80<\n> New conn on socket 0 for local port 80<\n> New conn on socket 0 for local port 80<\n> New conn on socket 0 for local port 80<\n> New conn on socket 0 for local port 80<\n> New conn on socket 0 for local port 80<\n> New conn on socket 0 for local port 80<\n> New conn on socket 0 for local port 80<\n> New conn on socket 0 for local port 80<\n> New conn on socket 0 for local port 80<\n> New conn on socket 0 for local port 80<\n> Ran out of output buffers at ../src/Networking/HttpResponder.cpp(1109)<\n> New conn on socket 0 for local port 80<\n> New conn on socket 0 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n> Received 302 bytes<\n> New conn on socket 1 for local port 80<\n>
New conn on socket 1 for local port 80<\n> New conn on socket 1 for local port 80<\n> New conn on socket 1 for local port 80<\n> New conn on socket 1 for local port 80<\n> New conn on socket 1 for local port 80<\n> New conn on socket 1 for local port 80<\n> Ran out of output buffers at ../src/Networking/HttpResponder.cpp(1109)<\n> New conn on socket 1 for local port 80<\n> New conn on socket 1 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n> Received 288 bytes<\n> HTTP req, command words { GET /rr_model HTTP/1.0 }, parameters { key=job flags=d5 }<\n> Ran out of output buffers at ../src/Networking/HttpResponder.cpp(1109)<\n> Ran out of output buffers at ../src/Networking/HttpResponder.cpp(1109)<\n> New conn on socket 0 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n> Received 419 bytes<\n> Sending reply, file = no<\n> HTTP req, command words { GET /rr_reply HTTP/1.0 }, parameters { }<\n> Sending G-Code reply to HTTP client 1 of 2 (length 3324)<\n> New conn on socket 0 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n> Received 431 bytes<\n> HTTP req, command words { GET /rr_model HTTP/1.0 }, parameters { flags=d99fn }<\n> New conn on socket 2 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n>
New conn on socket 1 for local port 80<\n> New conn on socket 1 for local port 80<\n> Ran out of output buffers at ../src/Networking/HttpResponder.cpp(1109)<\n> New conn on socket 1 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n> Received 288 bytes<\n> HTTP req, command words { GET /rr_model HTTP/1.0 }, parameters { key=job flags=d5 }<\n> Ran out of output buffers at ../src/Networking/HttpResponder.cpp(1109)<\n> Ran out of output buffers at ../src/Networking/HttpResponder.cpp(1109)<\n> Ran out of output buffers at ../src/Networking/HttpResponder.cpp(1109)<\n> New conn on socket 0 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n> Received 445 bytes<\n> HTTP req, command words { GET /rr_gcode HTTP/1.0 }, parameters { gcode=M122 }<\n> Sending JSON reply, length 12<\n> New conn on socket 0 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n> Received 288 bytes<\n> New conn on socket 1 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n> Received 288 bytes<\n> New conn on socket 2 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n> Received 431 bytes<\n> New conn on socket 3 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n> Received 288 bytes<\n> Peer disconnected on socket 2<\n> New conn on socket 2 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n> Received 302 bytes<\n> Timed out gcodeReply, free buffers=1<\n> Timed out gcodeReply, free buffers=2<\n> Timed out gcodeReply, free buffers=3<\n> Timed out gcodeReply, free buffers=5<\n> Timed out gcodeReply, free buffers=7<\n> Timed out gcodeReply, free buffers=9<\n> Timed out gcodeReply, free buffers=11<\n> Timed out gcodeReply, free buffers=12<\n> Timed out gcodeReply, free buffers=13<\n> Released gcodeReply, free buffers=26<\n> Released gcodeReply, free buffers=26<\n> Peer disconnected on socket 1<\n> Peer disconnected on socket 0<\n> New conn on socket 1 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n> Received 309 bytes<\n> HTTP req, command words { GET /rr_model HTTP/1.0 }, parameters { key=sensors.filamentMonitors flags=d4 }<\n> Sending reply, file = no<\n> Webserver: rejecting message with: 401 Not authorized<\n> New conn on socket 0 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n> Received 302 bytes<\n> Sending reply, file = no<\n> HTTP req, command words { GET /rr_model HTTP/1.0 }, parameters { key=move.compensation flags=d4 }<\n> Webserver: rejecting message with: 401 Not authorized<\n> New conn on socket 1 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n>
Full logs attached
output_2025-01-27_14-41-50.txt output_2025-01-27_14-26-22.txt output_2025-01-27_14-16-17.txt output_2025-01-27_14-14-06.txt output_2025-01-27_14-12-58.txt -
@droftarts
it is on both WiFi Versions - just checked v2.2.0HTTP req, command words { GET /rr_gcode HTTP/1.0 }, parameters { gcode=m122 }<\n> Sending JSON reply, length 12<\n> Ran out of output buffers at ../src/Networking/HttpResponder.cpp(1109)<\n> Ran out of output buffers at ../src/Networking/HttpResponder.cpp(1109)<\n> New conn on socket 0 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n> Received 287 bytes<\n> New conn on socket 1 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n> Received 308 bytes<\n> New conn on socket 2 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n> Received 442 bytes<\n> Peer disconnected on socket 2<\n> New conn on socket 2 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n> Received 458 bytes<\n> Timed out gcodeReply, free buffers=1<\n> Timed out gcodeReply, free buffers=2<\n> Timed out gcodeReply, free buffers=3<\n> Timed out gcodeReply, free buffers=5<\n> Timed out gcodeReply, free buffers=7<\n> Timed out gcodeReply, free buffers=9<\n> Timed out gcodeReply, free buffers=11<\n> Timed out gcodeReply, free buffers=12<\n> Timed out gcodeReply, free buffers=13<\n> Released gcodeReply, free buffers=26<\n> Released gcodeReply, free buffers=26<\n> Peer disconnected on socket 2<\n> New conn on socket 2 for local port 80<\n> HTTP connection accepted<\n>
full log
output_2025-01-27_14-56-34.txt -
@timschneider Great, thanks, I'll poke @dc42 with this info.
Ian
-
@timschneider What HTTP clients do you have connected and do you have a PanelDue attached, too? It looks like the G-code reply (e.g. for
M122
) queues up because at least one client does not request it in regular intervals. -
@chrishamm ah good catch.
in the logs above I've one DWC and sometimes more, upto 6 DWCs from the same machine via browser and one python client querying the object model.
HTTP req, command words { GET /rr_model HTTP/1.0 }, parameters { key=move.compensation flags=d4 }<\n>
is this client supposed to query rr_reply even if it is not asking for? all clients (should) use propper session keys.
the upto 6 DWCs just to poke the out of buffer and check if it makes a difference.
To be clear, either 1 DWC and python client or 6 DWC and 1 python clients.EDIT: no panel due / standalone mode
EDIT 2:
rr_reply Retrieve the last G-code reply. The G-code reply is buffered per connected HTTP client and it is discarded when every HTTP client has fetched it or when the firmware is short on memory and the client has not requested it within reasonable time (1 second).
-
@timschneider Yes, for the time being I suggest you request
rr_reply
as well. AFAIR G-code replies are saved up to 8 seconds. We need to discuss how connections from "bots" could be handled better, perhaps an extra key forrr_connect
could be added to indicate that the new client does not wish to poll G-code replies. -
@chrishamm
ok, checked it and indeed reducing the clients to one DWC session helped to mitigate the problem - or to say I was not able to trigger it via DWC with repeating M122 gcodes. But, i took the chance to rush down the rabbit hole and was able to trigger it withM122 M122 M122
as one command.HTTP connection accepted<\n> Found responder<\n> Received 466 bytes<\n> HTTP req, command words { GET /rr_gcode HTTP/1.0 }, parameters { gcode=M122 M122 M122 M122 }<\n> Sending JSON reply, length 12<\n> New conn on socket 0 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n> Received 431 bytes<\n> Peer disconnected on socket 0<\n> New conn on socket 0 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n> Received 431 bytes<\n> Peer disconnected on socket 0<\n> New conn on socket 0 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n> Received 431 bytes<\n> Released gcodeReply, free buffers=26<\n> Peer disconnected on socket 0<\n> New conn on socket 0 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n> Received 458 bytes<\n> HTTP req, command words { GET /rr_connect HTTP/1.0 }, parameters { password=meltingplot time=2025-1-27T15:33:25 sessionKey=yes }<\n> Sending JSON reply, length 94<\n> New conn on socket 0 for local port 80<\n> HTTP connection accepted<\n> Found responder<\n>
{ GET /rr_gcode HTTP/1.0 }, parameters { gcode=m122 }Bad recv status sizeError: failed to retrieve WiFi status message: another SPI transfer is pendingBad recv status sizeBad recv status sizeError: failed to retrieve WiFi status message: another SPI transfer is pendingBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeReleased gcodeReply, free buffers=26Error: failed to retrieve WiFi status message: SPI timeoutBad recv status size
don't know if this is somehow related, just catched it as a by-catch.
OK, lesson learned - teach the bot to query rr_reply every second, just in case. But still there is some problem with memory management and error handling.
-
@chrishamm @droftarts
it was to early - it will crash even with only 2 DWC connections opened. I also tried to poll rr_reply ever second but even that is not enough. If you have bad luck and two DWC sessions open, it will crash on the first or second try.EDIT:
same for
M409 K"" F"d2"
-
@timschneider Thanks for the report. I've logged this at https://github.com/Duet3D/RepRapFirmware/issues/1073
-
@droftarts
@chrishamm thank you for open up a bug report for it, but i'm almost certain that this is just one bug out of at least two.somehow the rr_reply can add up and even a quit short and simple output can crash the webserver / frontend.
the following output shows the output in the console when at least two clients are connected, in this case two DWC sessionsafter adding up, it will kill all open sessions
if only one DWC session is open the output looks like the following and it is impossible (via DWC) to add up the output
We use a reverse proxy to connect to duet and I observed the above mentioned behavior in the past, but alwas thought it is related to the reverse proxy and was not paying attention to it. The above shown screenshots are NOT connected to the reverse proxy - I was directly connected to the duet.
EDIT: I've checked if i'm able to add up the output in DSF setup or crash the frontend - and it was not possible.
-
@timschneider DSF has a lot more memory available so it's very unlikely to get it to crash. If long G-code replies are not released quickly enough in standalone mode, that can lead to output buffer starvation which makes the whole webserver unresponsive (for a few seconds at least). That's why I proposed to call
rr_reply
relatively frequently from "bot" clients. The other issue which had to do with failed SPI transfers is being investigated. -
@chrishamm This explanation does not seem to explain the behaviour shown above.
In the above example there are only two DWC sessions open and the overall outbut is just about 1100 characters.
I did another test with the following macros:
M98 P"0:/macros/test-bufferoverflow.g"
echo "123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890" echo "123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890" echo "123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
M98 P"0:/macros/test-bufferoverflow2.g" -> two times the content of test-bufferoverflow.g (6 times echo with 210 chars)
M98 P"0:/macros/test-bufferoverflow3.g" -> two times the content of test-bufferoverflow2.g (12 times echo with 210 chars)
M98 P"0:/macros/test-bufferoverflow4.g" -> two times the content of test-bufferoverflow3.g (24 times echo with 210 chars)Executing macro M98 P"0:/macros/test-bufferoverflow4.g" will work
Executing two times macro M98 P"0:/macros/test-bufferoverflow4.g" will fail
Executing two times macro M98 P"0:/macros/test-bufferoverflow3.g" will fail
Executing two times macro M98 P"0:/macros/test-bufferoverflow2.g" will sometimes fail
Executing three times macro M98 P"0:/macros/test-bufferoverflow2.g" will fail
so it looks like it is realted to the memory, but it is no "out of memory" due to the size of the output.
with only one DWC session i'm able to get it crashing with two times M98 P"0:/macros/test-bufferoverflow4.g" - but not with the other macros.
The problem from my side is, that when two clients are ocnnected and issue a G-Code with response this sometimes crashes the webserver and both DWC connections are reconnecting. From theorie it should work as long as there are outbut buffers, but somehow it appears they get wasted somewhere.
Output of M122
30.1.2025, 18:36:37 M122 === Diagnostics === RepRapFirmware for Duet 2 WiFi/Ethernet version 3.6.0-beta.3 (2025-01-16 19:09:12) running on Duet WiFi 1.02 or later + DueX5v0.11 Board ID: 08DJM-9178L-L4MSJ-6J1FL-3S86J-TB2LN Used output buffers: 2 of 26 (26 max) === RTOS === Static ram: 23520 Dynamic ram: 68716 of which 220 recycled Never used RAM 27540, free system stack 182 words Tasks: NETWORK(1,ready,16.6%,195) HEAT(3,nWait 5,0.1%,283) Move(4,nWait 5,0.0%,361) DUEX(5,nWait 5,0.0%,23) MAIN(1,running,82.8%,712) IDLE(0,ready,0.5%,29), total 100.0% Owned mutexes: === Platform === Last reset 30:48:46 ago, cause: power up Last software reset at 2025-01-27 15:25, reason: User, Gcodes spinning, available RAM 28108, slot 2 Software reset code 0x0003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x00400000 BFAR 0xe000ed38 SP 0x00000000 Task MAIN Freestk 0 n/a Error status: 0x0c MCU temperature: min 28.7, current 29.2, max 30.3 Supply voltage: min 23.8, current 24.0, max 24.3, under voltage events: 0, over voltage events: 0, power good: yes Heap OK, handles allocated/used 99/7, heap memory allocated/used/recyclable 2048/1484/1348, gc cycles 5713 Events: 0 queued, 0 completed Date/time: 2025-01-30 18:36:30 Slowest loop: 17.84ms; fastest: 0.18ms I2C nak errors 0, send timeouts 0, receive timeouts 0, finishTimeouts 0, resets 0 === Storage === Free file entries: 8 SD card 0 detected, interface speed: 20.0MBytes/sec SD card longest read time 8.5ms, write time 0.0ms, max retries 0 === Move === Segments created 0, maxWait 0ms, bed comp in use: none, height map offset 0.000, hiccups added 0/0 (0.00ms), max steps late 0, ebfmin 0.00, ebfmax 0.00 Pos req/act/dcf: 0.00/0/0.00 0.00/0/0.00 0.00/0/0.00 no step interrupt scheduled Driver 0: standstill, SG min n/a Driver 1: standstill, SG min n/a Driver 2: standstill, SG min n/a Driver 3: standstill, SG min n/a Driver 4: standstill, SG min n/a Driver 5: standstill, SG min n/a Driver 6: standstill, SG min n/a Driver 7: standstill, SG min n/a Driver 8: standstill, SG min n/a Driver 9: standstill, SG min n/a Driver 10: Driver 11: === DDARing 0 === Scheduled moves 0, completed 0, LaErrors 0, Underruns [0, 0, 0] Segments left 0 Code queue is empty === Heat === Bed heaters 0 -1 -1 -1, chamber heaters -1 -1 -1 -1, ordering errs 0 === GCodes === Movement locks held by null HTTP is idle in state(s) 0 Telnet is idle in state(s) 0 File is idle in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger is idle in state(s) 0 Queue is idle in state(s) 0 LCD is idle in state(s) 0 Daemon is doing "G4 P100" in state(s) 0 0, running macro Autopause is idle in state(s) 0 === Filament sensors === check 0 clear 399166717 Extruder 0: pos 2160.00, errs: frame 0 parity 0 ovrun 0 pol 0 ovdue 0 === DueX === Read count 0, 0.00 reads/min === Network === Slowest loop: 21.90ms; fastest: 0.07ms Responder states: HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) HTTP sessions: 1 of 8 === WiFi === Interface state: active Module is connected to access point Failed messages: pending 0, notrdy 0, noresp 0 Firmware version 2.2.1 MAC address bc:dd:c2:31:9a:40 Module reset reason: Turned on by main processor, Vcc 3.46, flash size 4194304, free heap 42496 WiFi IP address 10.42.0.2 Signal strength -21dBm, channel 11, mode 802.11n, reconnections 0 Clock register 00002002 Socket states: 0 0 0 0 0 0 0 0
-
@timschneider if you enable all Move debugging in recent versions of RRF then huge amounts of debug output are generated, which means all bets are off. M111 should only be used to enable debugging selectively, using the P and D parameters so select the modules and subsystems to debug.
Duet 2 is somewhat memory constrained so it has less output buffer memory available than Duet 3 boards. That said, it does appear from the output you have posted that there may be a particular issue with the interface to the WiFi module when there is a lot of data to output. This is on my list to investigate.