Duet3D Logo Duet3D
    • Tags
    • Documentation
    • Order
    • Register
    • Login

    [3.6.0-beta.3] DWC connection issues Duet2 / Duex5

    Scheduled Pinned Locked Moved
    Beta Firmware
    6
    33
    847
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • timschneiderundefined
      timschneider @droftarts
      last edited by

      @droftarts
      thank you for your tests!

      I was able to hook up via USB and grab some debug logs.

      It is somehow related to the connection mangement and kind of ran out of buffer, even if it appears that the buffers are wasted.

      HTTP req, command words { GET /rr_model HTTP/1.0 }, parameters { key= flags=fd99 }<\n>
      New conn on socket 2 for local port 80<\n>
      HTTP connection accepted<\n>
      Found responder<\n>
      Received 287 bytes<\n>
      HTTP req, command words { GET /rr_model HTTP/1.0 }, parameters { key= flags=fd99 }<\n>
      New conn on socket 3 for local port 80<\n>
      New conn on socket 3 for local port 80<\n>
      New conn on socket 3 for local port 80<\n>
      New conn on socket 3 for local port 80<\n>
      New conn on socket 3 for local port 80<\n>
      New conn on socket 3 for local port 80<\n>
      ...
      New conn on socket 3 for local port 80<\n>
      New conn on socket 3 for local port 80<\n>
      New conn on socket 3 for local port 80<\n>
      New conn on socket 3 for local port 80<\n>
      New conn on socket 3 for local port 80<\n>
      New conn on socket 3 for local port 80<\n>
      New conn on socket 3 for local port 80<\n>
      New conn on socket 3 for local port 80<\n>
      New conn on socket 3 for local port 80<\n>
      New conn on socket 3 for local port 80<\n>
      Ran out of output buffers at ../src/Networking/HttpResponder.cpp(1109)<\n>
      New conn on socket 3 for local port 80<\n>
      HTTP connection accepted<\n>
      Found responder<\n>
      Received 287 bytes<\n>
      
      HTTP req, command words { GET /rr_model HTTP/1.0 }, parameters { key= flags=fd99 }<\n>
      New conn on socket 0 for local port 80<\n>
      New conn on socket 0 for local port 80<\n>
      New conn on socket 0 for local port 80<\n>
      New conn on socket 0 for local port 80<\n>
      New conn on socket 0 for local port 80<\n>
      New conn on socket 0 for local port 80<\n>
      New conn on socket 0 for local port 80<\n>
      New conn on socket 0 for local port 80<\n>
      New conn on socket 0 for local port 80<\n>
      New conn on socket 0 for local port 80<\n>
      New conn on socket 0 for local port 80<\n>
      New conn on socket 0 for local port 80<\n>
      New conn on socket 0 for local port 80<\n>
      Ran out of output buffers at ../src/Networking/HttpResponder.cpp(1109)<\n>
      New conn on socket 0 for local port 80<\n>
      New conn on socket 0 for local port 80<\n>
      HTTP connection accepted<\n>
      Found responder<\n>
      Received 302 bytes<\n>
      New conn on socket 1 for local port 80<\n>
      
      New conn on socket 1 for local port 80<\n>
      New conn on socket 1 for local port 80<\n>
      New conn on socket 1 for local port 80<\n>
      New conn on socket 1 for local port 80<\n>
      New conn on socket 1 for local port 80<\n>
      New conn on socket 1 for local port 80<\n>
      Ran out of output buffers at ../src/Networking/HttpResponder.cpp(1109)<\n>
      New conn on socket 1 for local port 80<\n>
      New conn on socket 1 for local port 80<\n>
      HTTP connection accepted<\n>
      Found responder<\n>
      Received 288 bytes<\n>
      HTTP req, command words { GET /rr_model HTTP/1.0 }, parameters { key=job flags=d5 }<\n>
      Ran out of output buffers at ../src/Networking/HttpResponder.cpp(1109)<\n>
      Ran out of output buffers at ../src/Networking/HttpResponder.cpp(1109)<\n>
      New conn on socket 0 for local port 80<\n>
      HTTP connection accepted<\n>
      Found responder<\n>
      Received 419 bytes<\n>
      Sending reply, file = no<\n>
      HTTP req, command words { GET /rr_reply HTTP/1.0 }, parameters { }<\n>
      Sending G-Code reply to HTTP client 1 of 2 (length 3324)<\n>
      New conn on socket 0 for local port 80<\n>
      HTTP connection accepted<\n>
      Found responder<\n>
      Received 431 bytes<\n>
      HTTP req, command words { GET /rr_model HTTP/1.0 }, parameters { flags=d99fn }<\n>
      New conn on socket 2 for local port 80<\n>
      HTTP connection accepted<\n>
      Found responder<\n>
      
      New conn on socket 1 for local port 80<\n>
      New conn on socket 1 for local port 80<\n>
      Ran out of output buffers at ../src/Networking/HttpResponder.cpp(1109)<\n>
      New conn on socket 1 for local port 80<\n>
      HTTP connection accepted<\n>
      Found responder<\n>
      Received 288 bytes<\n>
      HTTP req, command words { GET /rr_model HTTP/1.0 }, parameters { key=job flags=d5 }<\n>
      Ran out of output buffers at ../src/Networking/HttpResponder.cpp(1109)<\n>
      Ran out of output buffers at ../src/Networking/HttpResponder.cpp(1109)<\n>
      Ran out of output buffers at ../src/Networking/HttpResponder.cpp(1109)<\n>
      New conn on socket 0 for local port 80<\n>
      HTTP connection accepted<\n>
      Found responder<\n>
      Received 445 bytes<\n>
      HTTP req, command words { GET /rr_gcode HTTP/1.0 }, parameters { gcode=M122 }<\n>
      Sending JSON reply, length 12<\n>
      New conn on socket 0 for local port 80<\n>
      HTTP connection accepted<\n>
      Found responder<\n>
      Received 288 bytes<\n>
      New conn on socket 1 for local port 80<\n>
      HTTP connection accepted<\n>
      Found responder<\n>
      Received 288 bytes<\n>
      New conn on socket 2 for local port 80<\n>
      HTTP connection accepted<\n>
      Found responder<\n>
      Received 431 bytes<\n>
      New conn on socket 3 for local port 80<\n>
      HTTP connection accepted<\n>
      Found responder<\n>
      Received 288 bytes<\n>
      Peer disconnected on socket 2<\n>
      New conn on socket 2 for local port 80<\n>
      HTTP connection accepted<\n>
      Found responder<\n>
      Received 302 bytes<\n>
      Timed out gcodeReply, free buffers=1<\n>
      Timed out gcodeReply, free buffers=2<\n>
      Timed out gcodeReply, free buffers=3<\n>
      Timed out gcodeReply, free buffers=5<\n>
      Timed out gcodeReply, free buffers=7<\n>
      Timed out gcodeReply, free buffers=9<\n>
      Timed out gcodeReply, free buffers=11<\n>
      Timed out gcodeReply, free buffers=12<\n>
      Timed out gcodeReply, free buffers=13<\n>
      Released gcodeReply, free buffers=26<\n>
      Released gcodeReply, free buffers=26<\n>
      Peer disconnected on socket 1<\n>
      Peer disconnected on socket 0<\n>
      New conn on socket 1 for local port 80<\n>
      HTTP connection accepted<\n>
      Found responder<\n>
      Received 309 bytes<\n>
      HTTP req, command words { GET /rr_model HTTP/1.0 }, parameters { key=sensors.filamentMonitors flags=d4 }<\n>
      Sending reply, file = no<\n>
      Webserver: rejecting message with: 401 Not authorized<\n>
      New conn on socket 0 for local port 80<\n>
      HTTP connection accepted<\n>
      Found responder<\n>
      Received 302 bytes<\n>
      Sending reply, file = no<\n>
      HTTP req, command words { GET /rr_model HTTP/1.0 }, parameters { key=move.compensation flags=d4 }<\n>
      Webserver: rejecting message with: 401 Not authorized<\n>
      New conn on socket 1 for local port 80<\n>
      HTTP connection accepted<\n>
      Found responder<\n>
      

      Full logs attached
      output_2025-01-27_14-41-50.txt output_2025-01-27_14-26-22.txt output_2025-01-27_14-16-17.txt output_2025-01-27_14-14-06.txt output_2025-01-27_14-12-58.txt

      timschneiderundefined 1 Reply Last reply Reply Quote 0
      • timschneiderundefined
        timschneider @timschneider
        last edited by

        @droftarts
        it is on both WiFi Versions - just checked v2.2.0

        HTTP req, command words { GET /rr_gcode HTTP/1.0 }, parameters { gcode=m122 }<\n>
        Sending JSON reply, length 12<\n>
        Ran out of output buffers at ../src/Networking/HttpResponder.cpp(1109)<\n>
        Ran out of output buffers at ../src/Networking/HttpResponder.cpp(1109)<\n>
        New conn on socket 0 for local port 80<\n>
        HTTP connection accepted<\n>
        Found responder<\n>
        Received 287 bytes<\n>
        New conn on socket 1 for local port 80<\n>
        HTTP connection accepted<\n>
        Found responder<\n>
        Received 308 bytes<\n>
        New conn on socket 2 for local port 80<\n>
        HTTP connection accepted<\n>
        Found responder<\n>
        Received 442 bytes<\n>
        Peer disconnected on socket 2<\n>
        New conn on socket 2 for local port 80<\n>
        HTTP connection accepted<\n>
        Found responder<\n>
        Received 458 bytes<\n>
        Timed out gcodeReply, free buffers=1<\n>
        Timed out gcodeReply, free buffers=2<\n>
        Timed out gcodeReply, free buffers=3<\n>
        Timed out gcodeReply, free buffers=5<\n>
        Timed out gcodeReply, free buffers=7<\n>
        Timed out gcodeReply, free buffers=9<\n>
        Timed out gcodeReply, free buffers=11<\n>
        Timed out gcodeReply, free buffers=12<\n>
        Timed out gcodeReply, free buffers=13<\n>
        Released gcodeReply, free buffers=26<\n>
        Released gcodeReply, free buffers=26<\n>
        Peer disconnected on socket 2<\n>
        New conn on socket 2 for local port 80<\n>
        HTTP connection accepted<\n>
        

        full log
        output_2025-01-27_14-56-34.txt

        droftartsundefined 1 Reply Last reply Reply Quote 0
        • droftartsundefined
          droftarts administrators @timschneider
          last edited by

          @timschneider Great, thanks, I'll poke @dc42 with this info.

          Ian

          Bed-slinger - Mini5+ WiFi/1LC | RRP Fisher v1 - D2 WiFi | Polargraph - D2 WiFi | TronXY X5S - 6HC/Roto | CNC router - 6HC | Tractus3D T1250 - D2 Eth

          1 Reply Last reply Reply Quote 0
          • chrishammundefined
            chrishamm administrators @timschneider
            last edited by

            @timschneider What HTTP clients do you have connected and do you have a PanelDue attached, too? It looks like the G-code reply (e.g. for M122) queues up because at least one client does not request it in regular intervals.

            Duet software engineer

            timschneiderundefined 1 Reply Last reply Reply Quote 0
            • timschneiderundefined
              timschneider @chrishamm
              last edited by timschneider

              @chrishamm ah good catch.

              in the logs above I've one DWC and sometimes more, upto 6 DWCs from the same machine via browser and one python client querying the object model.

              HTTP req, command words { GET /rr_model HTTP/1.0 }, parameters { key=move.compensation flags=d4 }<\n>
              

              is this client supposed to query rr_reply even if it is not asking for? all clients (should) use propper session keys.

              the upto 6 DWCs just to poke the out of buffer and check if it makes a difference.
              To be clear, either 1 DWC and python client or 6 DWC and 1 python clients.

              EDIT: no panel due / standalone mode

              EDIT 2:

              rr_reply
              Retrieve the last G-code reply. The G-code reply is buffered per connected HTTP client and it is discarded when every HTTP client has fetched it or when the firmware is short on memory and the client has not requested it within reasonable time (1 second).
              
              chrishammundefined 1 Reply Last reply Reply Quote 0
              • chrishammundefined
                chrishamm administrators @timschneider
                last edited by

                @timschneider Yes, for the time being I suggest you request rr_reply as well. AFAIR G-code replies are saved up to 8 seconds. We need to discuss how connections from "bots" could be handled better, perhaps an extra key for rr_connect could be added to indicate that the new client does not wish to poll G-code replies.

                Duet software engineer

                timschneiderundefined 1 Reply Last reply Reply Quote 0
                • timschneiderundefined
                  timschneider @chrishamm
                  last edited by

                  @chrishamm
                  ok, checked it and indeed reducing the clients to one DWC session helped to mitigate the problem - or to say I was not able to trigger it via DWC with repeating M122 gcodes. But, i took the chance to rush down the rabbit hole and was able to trigger it with M122 M122 M122 as one command.

                  HTTP connection accepted<\n>
                  Found responder<\n>
                  Received 466 bytes<\n>
                  HTTP req, command words { GET /rr_gcode HTTP/1.0 }, parameters { gcode=M122 M122 M122 M122 }<\n>
                  Sending JSON reply, length 12<\n>
                  New conn on socket 0 for local port 80<\n>
                  HTTP connection accepted<\n>
                  Found responder<\n>
                  Received 431 bytes<\n>
                  Peer disconnected on socket 0<\n>
                  New conn on socket 0 for local port 80<\n>
                  HTTP connection accepted<\n>
                  Found responder<\n>
                  Received 431 bytes<\n>
                  Peer disconnected on socket 0<\n>
                  New conn on socket 0 for local port 80<\n>
                  HTTP connection accepted<\n>
                  Found responder<\n>
                  Received 431 bytes<\n>
                  Released gcodeReply, free buffers=26<\n>
                  Peer disconnected on socket 0<\n>
                  New conn on socket 0 for local port 80<\n>
                  HTTP connection accepted<\n>
                  Found responder<\n>
                  Received 458 bytes<\n>
                  HTTP req, command words { GET /rr_connect HTTP/1.0 }, parameters { password=meltingplot time=2025-1-27T15:33:25 sessionKey=yes }<\n>
                  Sending JSON reply, length 94<\n>
                  New conn on socket 0 for local port 80<\n>
                  HTTP connection accepted<\n>
                  Found responder<\n>
                  
                  
                  { GET /rr_gcode HTTP/1.0 }, parameters { gcode=m122 }Bad recv status sizeError: failed to retrieve WiFi status message: another SPI transfer is pendingBad recv status sizeBad recv status sizeError: failed to retrieve WiFi status message: another SPI transfer is pendingBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeError: failed to retrieve WiFi status message: SPI timeoutBad recv status sizeReleased gcodeReply, free buffers=26Error: failed to retrieve WiFi status message: SPI timeoutBad recv status size
                  

                  don't know if this is somehow related, just catched it as a by-catch.

                  OK, lesson learned - teach the bot to query rr_reply every second, just in case. But still there is some problem with memory management and error handling.

                  timschneiderundefined 1 Reply Last reply Reply Quote 2
                  • timschneiderundefined
                    timschneider @timschneider
                    last edited by timschneider

                    @chrishamm @droftarts
                    it was to early - it will crash even with only 2 DWC connections opened. I also tried to poll rr_reply ever second but even that is not enough. If you have bad luck and two DWC sessions open, it will crash on the first or second try.

                    EDIT:
                    same for
                    M409 K"" F"d2"

                    chrishammundefined 1 Reply Last reply Reply Quote 0
                    • chrishammundefined
                      chrishamm administrators @timschneider
                      last edited by

                      @timschneider Thanks for the report. I've logged this at https://github.com/Duet3D/RepRapFirmware/issues/1073

                      chrishamm created this issue in Duet3D/RepRapFirmware

                      closed [Bug]: Long G-code replies can crash WiFi boards in standalone mode #1073

                      Duet software engineer

                      timschneiderundefined 1 Reply Last reply Reply Quote 1
                      • timschneiderundefined
                        timschneider @chrishamm
                        last edited by timschneider

                        @droftarts
                        @chrishamm thank you for open up a bug report for it, but i'm almost certain that this is just one bug out of at least two.

                        somehow the rr_reply can add up and even a quit short and simple output can crash the webserver / frontend.
                        the following output shows the output in the console when at least two clients are connected, in this case two DWC sessions

                        15721bd2-60c3-48f4-a537-53c0975f5fb0-grafik.png

                        after adding up, it will kill all open sessions
                        e488bf63-74f0-4b26-a17c-33ffe633c98a-grafik.png

                        if only one DWC session is open the output looks like the following and it is impossible (via DWC) to add up the output
                        c293a3d7-594f-4c60-83a2-6a0df5c3bfbc-grafik.png

                        We use a reverse proxy to connect to duet and I observed the above mentioned behavior in the past, but alwas thought it is related to the reverse proxy and was not paying attention to it. The above shown screenshots are NOT connected to the reverse proxy - I was directly connected to the duet.

                        EDIT: I've checked if i'm able to add up the output in DSF setup or crash the frontend - and it was not possible.

                        chrishammundefined dc42undefined 2 Replies Last reply Reply Quote 0
                        • chrishammundefined
                          chrishamm administrators @timschneider
                          last edited by

                          @timschneider DSF has a lot more memory available so it's very unlikely to get it to crash. If long G-code replies are not released quickly enough in standalone mode, that can lead to output buffer starvation which makes the whole webserver unresponsive (for a few seconds at least). That's why I proposed to call rr_reply relatively frequently from "bot" clients. The other issue which had to do with failed SPI transfers is being investigated.

                          Duet software engineer

                          timschneiderundefined 1 Reply Last reply Reply Quote 0
                          • timschneiderundefined
                            timschneider @chrishamm
                            last edited by timschneider

                            @chrishamm This explanation does not seem to explain the behaviour shown above.

                            In the above example there are only two DWC sessions open and the overall output is just about 1100 characters.

                            I did another test with the following macros:

                            M98 P"0:/macros/test-bufferoverflow.g"

                            echo "123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
                            echo "123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
                            echo "123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
                            

                            M98 P"0:/macros/test-bufferoverflow2.g" -> two times the content of test-bufferoverflow.g (6 times echo with 210 chars)
                            M98 P"0:/macros/test-bufferoverflow3.g" -> two times the content of test-bufferoverflow2.g (12 times echo with 210 chars)
                            M98 P"0:/macros/test-bufferoverflow4.g" -> two times the content of test-bufferoverflow3.g (24 times echo with 210 chars)

                            Executing macro M98 P"0:/macros/test-bufferoverflow4.g" will work
                            537915c1-d331-41f9-b478-5c6dbf4c4d26-grafik.png

                            Executing two times macro M98 P"0:/macros/test-bufferoverflow4.g" will fail
                            30bb2aa2-f63b-430c-bc4c-d08c53500383-grafik.png

                            Executing two times macro M98 P"0:/macros/test-bufferoverflow3.g" will fail
                            40aed0e4-3513-40c2-9da8-aca34277ce0b-grafik.png

                            Executing two times macro M98 P"0:/macros/test-bufferoverflow2.g" will sometimes fail
                            176098bd-3145-41e1-a078-3870551de1ec-grafik.png

                            Executing three times macro M98 P"0:/macros/test-bufferoverflow2.g" will fail
                            32014b1b-a69f-4bf2-9711-7aed9d92a653-grafik.png

                            so it looks like it is realted to the memory, but it is no "out of memory" due to the size of the output.

                            with only one DWC session i'm able to get it crashing with two times M98 P"0:/macros/test-bufferoverflow4.g" - but not with the other macros.

                            The problem from my side is, that when two clients are connected and issue a G-Code with response, this sometimes crashes the webserver and both DWC connections are reconnecting. From theorie it should work as long as there are output buffers, but somehow it appears they get wasted somewhere.

                            Output of M122

                            30.1.2025, 18:36:37 	M122
                            === Diagnostics ===
                            RepRapFirmware for Duet 2 WiFi/Ethernet version 3.6.0-beta.3 (2025-01-16 19:09:12) running on Duet WiFi 1.02 or later + DueX5v0.11
                            Board ID: 08DJM-9178L-L4MSJ-6J1FL-3S86J-TB2LN
                            Used output buffers: 2 of 26 (26 max)
                            === RTOS ===
                            Static ram: 23520
                            Dynamic ram: 68716 of which 220 recycled
                            Never used RAM 27540, free system stack 182 words
                            Tasks: NETWORK(1,ready,16.6%,195) HEAT(3,nWait 5,0.1%,283) Move(4,nWait 5,0.0%,361) DUEX(5,nWait 5,0.0%,23) MAIN(1,running,82.8%,712) IDLE(0,ready,0.5%,29), total 100.0%
                            Owned mutexes:
                            === Platform ===
                            Last reset 30:48:46 ago, cause: power up
                            Last software reset at 2025-01-27 15:25, reason: User, Gcodes spinning, available RAM 28108, slot 2
                            Software reset code 0x0003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x00400000 BFAR 0xe000ed38 SP 0x00000000 Task MAIN Freestk 0 n/a
                            Error status: 0x0c
                            MCU temperature: min 28.7, current 29.2, max 30.3
                            Supply voltage: min 23.8, current 24.0, max 24.3, under voltage events: 0, over voltage events: 0, power good: yes
                            Heap OK, handles allocated/used 99/7, heap memory allocated/used/recyclable 2048/1484/1348, gc cycles 5713
                            Events: 0 queued, 0 completed
                            Date/time: 2025-01-30 18:36:30
                            Slowest loop: 17.84ms; fastest: 0.18ms
                            I2C nak errors 0, send timeouts 0, receive timeouts 0, finishTimeouts 0, resets 0
                            === Storage ===
                            Free file entries: 8
                            SD card 0 detected, interface speed: 20.0MBytes/sec
                            SD card longest read time 8.5ms, write time 0.0ms, max retries 0
                            === Move ===
                            Segments created 0, maxWait 0ms, bed comp in use: none, height map offset 0.000, hiccups added 0/0 (0.00ms), max steps late 0, ebfmin 0.00, ebfmax 0.00
                            Pos req/act/dcf: 0.00/0/0.00 0.00/0/0.00 0.00/0/0.00
                            no step interrupt scheduled
                            Driver 0: standstill, SG min n/a
                            Driver 1: standstill, SG min n/a
                            Driver 2: standstill, SG min n/a
                            Driver 3: standstill, SG min n/a
                            Driver 4: standstill, SG min n/a
                            Driver 5: standstill, SG min n/a
                            Driver 6: standstill, SG min n/a
                            Driver 7: standstill, SG min n/a
                            Driver 8: standstill, SG min n/a
                            Driver 9: standstill, SG min n/a
                            Driver 10: 
                            Driver 11: 
                            === DDARing 0 ===
                            Scheduled moves 0, completed 0, LaErrors 0, Underruns [0, 0, 0]
                            Segments left 0
                            Code queue is empty
                            === Heat ===
                            Bed heaters 0 -1 -1 -1, chamber heaters -1 -1 -1 -1, ordering errs 0
                            === GCodes ===
                            Movement locks held by null
                            HTTP is idle in state(s) 0
                            Telnet is idle in state(s) 0
                            File is idle in state(s) 0
                            USB is idle in state(s) 0
                            Aux is idle in state(s) 0
                            Trigger is idle in state(s) 0
                            Queue is idle in state(s) 0
                            LCD is idle in state(s) 0
                            Daemon is doing "G4 P100" in state(s) 0 0, running macro
                            Autopause is idle in state(s) 0
                            === Filament sensors ===
                            check 0 clear 399166717
                            Extruder 0: pos 2160.00, errs: frame 0 parity 0 ovrun 0 pol 0 ovdue 0
                            === DueX ===
                            Read count 0, 0.00 reads/min
                            === Network ===
                            Slowest loop: 21.90ms; fastest: 0.07ms
                            Responder states: HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0)
                            HTTP sessions: 1 of 8
                            === WiFi ===
                            Interface state: active
                            Module is connected to access point 
                            Failed messages: pending 0, notrdy 0, noresp 0
                            Firmware version 2.2.1
                            MAC address bc:dd:c2:31:9a:40
                            Module reset reason: Turned on by main processor, Vcc 3.46, flash size 4194304, free heap 42496
                            WiFi IP address 10.42.0.2
                            Signal strength -21dBm, channel 11, mode 802.11n, reconnections 0
                            Clock register 00002002
                            Socket states: 0 0 0 0 0 0 0 0
                            
                            1 Reply Last reply Reply Quote 0
                            • dc42undefined
                              dc42 administrators @timschneider
                              last edited by dc42

                              @timschneider if you enable all Move debugging in recent versions of RRF then huge amounts of debug output are generated, which means all bets are off. M111 should only be used to enable debugging selectively, using the P and D parameters so select the modules and subsystems to debug.

                              Duet 2 is somewhat memory constrained so it has less output buffer memory available than Duet 3 boards. That said, it does appear from the output you have posted that there may be a particular issue with the interface to the WiFi module when there is a lot of data to output. This is on my list to investigate.

                              Duet WiFi hardware designer and firmware engineer
                              Please do not ask me for Duet support via PM or email, use the forum
                              http://www.escher3d.com, https://miscsolutions.wordpress.com

                              timschneiderundefined 1 Reply Last reply Reply Quote 1
                              • timschneiderundefined
                                timschneider @dc42
                                last edited by timschneider

                                @dc42 @chrishamm
                                thank you both for your time and patience.
                                I moved forward with the knowledge of the low memory issue and reprogrammed the python client to be a fully fledged digital twin of the duet printer as per object model. In order to archive this I followed the description of M409 and implemented the a and next handling aswell as the seqs handling. So what I do now is, poll the rr_model endpoint with key=seqs to keep track of the sequence numbers and keep the load on the duet as low as possible. If a change happens, it will act accordingly, e.g. fetch a reply or partially fetch the updated object model recursive with max depth of level + 1, again to keep the load as low as possible.

                                but even that didn't solved the problem - it appears in some cases to make it even worse (i guess two output buffers per json response)

                                In the following there is one DWC Client and one python client connected.

                                Anyhow I wanted to understand the problem and at least find a workaround, but I was not able to.

                                What I found was that on an error - DWC is doing a reconnect straight, which makes it even worse, as this will increase the number of connected clients and will leave gcode replies unreceived and the buffer allocated, I guess?

                                3e8b16bb-46b9-4129-b06d-7b3b0200399a-grafik.png

                                Using the same macro as above. Directly after sending the command, the following rr_model request in DWC will fail - python client will also receive a HTTP 503 busy and will backoff 5 sec. before even trying to do anything (leaving the session open / timeout=8s - and leaving the reply probably not fetched for this session / due to the 503 busy response). DWC will open a new session and the internal connected clients count is increased, still keeping the buffer allocated. DWC tries to fetch a rr_model and will fail again with HTTP 503 - again reconnect / client count +1 / buffer allocated and so on ...

                                a5881d65-8451-47a2-b801-e79fe44ea5b3-grafik.png

                                The last rr_reply in the timeline is empty - and the reconnect and following rr_model request are successfull.

                                Maybe it is possible to change the behaviour of the DWC reconnect mechanism and check if the HTTP response is 503 - wait and do not reconnect and on the duet side if multiple json requests fail - drop pending gcode replies? Or as an alternative always serve the rr_model seqs requests to give a client a chance to consume rr_reply to free up output buffers. Just randomly polling the rr_reply endpoint will steal the response from DWC, so thats also not an option.

                                timschneiderundefined chrishammundefined 2 Replies Last reply Reply Quote 1
                                • timschneiderundefined
                                  timschneider @timschneider
                                  last edited by timschneider

                                  @timschneider
                                  wow that did the trick, when the python client receives an 503 it will just query the rr_reply immediately until the response is empty to free up buffers on the duet - DWC will not reconnect and everything works 🙂 even with M98 P"0:/macros/test-bufferoverflow3.g", but DWC will sometimes not receive the output of that macro as the python client consumed both replies.

                                  It will even survive M98 P"0:/macros/test-bufferoverflow4.g"

                                  Using this method having 6 concurrent DWC connections open and one python client is no problem anymore - it will even work with the M98 P"0:/macros/test-bufferoverflow4.g"

                                  Edit: there is still some problem in data and memory management, sometimes i do receive partial data Response payload is not completed: <ContentLengthError: 400, message='Not enough data for satisfy content length header.'> due to connection closing mid transmitting from duet to the reverse proxy.

                                  1 Reply Last reply Reply Quote 0
                                  • chrishammundefined
                                    chrishamm administrators @timschneider
                                    last edited by

                                    @timschneider What do you have "AJAX retries" in the DWC settings set to?

                                    Duet software engineer

                                    timschneiderundefined 2 Replies Last reply Reply Quote 0
                                    • timschneiderundefined
                                      timschneider @chrishamm
                                      last edited by

                                      @chrishamm
                                      d8570a95-0de9-412c-b915-f1a40a72e98c-grafik.png

                                      1 Reply Last reply Reply Quote 0
                                      • timschneiderundefined
                                        timschneider @chrishamm
                                        last edited by

                                        @chrishamm
                                        Damn! This really looks promising!

                                        https://github.com/Duet3D/RepRapFirmware/commit/a3498d31aa746d0486e5a109c091f15a8b8e41c0

                                        This could explain all the issues we and our customers have been facing – what could be described as DWC instability.
                                        We use object labels in every print, most of our customers have a large file list, most print jobs take longer than a day, often 3-5 days. I am really excited and looking forward to trying this out!

                                        0 chrishamm committed to Duet3D/RepRapFirmware
                                        Fix for too long G-code replies (#1073)
                                        
                                        Added extra parameter to OutputBuffer::Allocate to specify if the buffer
                                        to be allocated may be taken from the output buffer reserve. This is
                                        always true except if appending to an existing output buffer chain or
                                        when the request may be repeated without downsides.
                                        Also reduced the number of reserved output buffers to 2 on Duet 2 series
                                        chrishammundefined 1 Reply Last reply Reply Quote 0
                                        • chrishammundefined
                                          chrishamm administrators @timschneider
                                          last edited by

                                          @timschneider Please give it a try then. Do you need a firmware binary for testing?

                                          Duet software engineer

                                          timschneiderundefined 2 Replies Last reply Reply Quote 0
                                          • timschneiderundefined
                                            timschneider @chrishamm
                                            last edited by

                                            @chrishamm if you have one at hand, that would be nice. If not, I can compile my own withing the next days.

                                            1 Reply Last reply Reply Quote 1
                                            • First post
                                              Last post
                                            Unless otherwise noted, all forum content is licensed under CC-BY-SA