Duet3D Logo Duet3D
    • Tags
    • Documentation
    • Order
    • Register
    • Login

    Print stopped at 83% for seemingly no reason

    Scheduled Pinned Locked Moved
    General Discussion
    2
    12
    1.2k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • shogranundefined
      shogran
      last edited by shogran

      Hi there,

      Started an overnight print that should have taken ~10 hours and noticed the next day it was still sitting at 83% via the web interface. Thought nothing of it and checked back an hour later to see it was still at the same point, I assumed the interface had frozen so I went and checked in on the printer and saw that the printhead was stopped in place on the incomplete model.

      The bed and nozzle were still hot and according to the interface at the correct temperatures of 75 and 240 degrees respectively and I could pause the print which would move the head back to 0, then resume. However the resumed print would just continue to sit in the same place. There were no errors in the console and no matter how many times I paused and resumed the print it did not continue. In the end I ran M122 and then cancelled the print. I'm running version 2.02RC3 with server version 1.21 and web interface 1.22.4-b1. The GCODE file was sliced in Simplify3D 4 and uploaded via FileZilla, I've downloaded the file and it looks fine to me. It looks like it occured at layer 127 (Z height 17.66mm). The machine is a CoreXY.

      Due to the size of the files I've put the GCODE and some screenshots/photos here along with my config.g file and the M122 text which I'll try to get below as a code snippet: https://www.dropbox.com/sh/384kq6ocd85e95e/AADj-WkBKcL7vvWY1bmKvIn8a?dl=0

      If I can provide anything else that will make diagnosing this easier then let me know!

      EDIT: If I wrap the code below in the ``` code tags I get told I can't submit as "Post content was flagged as spam by Akismet.com".

      10:09:35
      M122
      === Diagnostics ===
      RepRapFirmware for Duet 2 WiFi/Ethernet version 2.02RC3(RTOS) running on Duet WiFi 1.02 or later
      Board ID: 08DDM-9FAM2-LW4S8-6JTDG-3SD6T-13YBW
      Used output buffers: 1 of 20 (18 max)
      === RTOS ===
      Static ram: 28532
      Dynamic ram: 98600 of which 0 recycled
      Exception stack ram used: 624
      Never used ram: 3316
      Tasks: NETWORK(ready,328) HEAT(blocked,1232) MAIN(running,3484)
      Owned mutexes:
      === Platform ===
      Last reset 14:38:10 ago, cause: power up
      Last software reset at 2018-12-04 19:40, reason: User, spinning module GCodes, available RAM 3452 bytes (slot 1)
      Software reset code 0x0003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0041f000 BFAR 0xe000ed38 SP 0xffffffff Task 0x4e49414d
      Error status: 0
      Free file entries: 9
      SD card 0 detected, interface speed: 20.0MBytes/sec
      SD card longest block write time: 0.0ms, max retries 0
      MCU temperature: min 38.6, current 38.7, max 38.9
      Supply voltage: min 23.2, current 24.2, max 24.5, under voltage events: 0, over voltage events: 0
      Driver 0: standstill, SG min/max not available
      Driver 1: standstill, SG min/max not available
      Driver 2: standstill, SG min/max not available
      Driver 3: standstill, SG min/max not available
      Driver 4: standstill, SG min/max not available
      Date/time: 2018-12-07 10:09:32
      Cache data hit count 4294967295
      Slowest loop: 1.25ms; fastest: 0.07ms
      === Move ===
      Hiccups: 0, StepErrors: 0, LaErrors: 0, FreeDm: 240, MinFreeDm: 240, MaxWait: 0ms, Underruns: 0, 0
      Scheduled moves: 402064, completed moves: 401876
      Bed compensation in use: mesh
      Bed probe heights: 0.000 0.000 0.000 0.000 0.000
      === Heat ===
      Bed heaters = 0 -1 -1 -1, chamberHeaters = -1 -1
      Heater 0 is on, I-accum = 0.4
      Heater 1 is on, I-accum = 0.6
      === GCodes ===
      Segments left: 0
      Stack records: 2 allocated, 0 in use
      Movement lock held by null
      http is idle in state(s) 0
      telnet is idle in state(s) 0
      file is doing "M106 S204" in state(s) 0
      serial is idle in state(s) 0
      aux is idle in state(s) 0
      daemon is idle in state(s) 0
      queue is idle in state(s) 0
      autopause is idle in state(s) 0
      Code queue is not empty:
      Queued 'M106 S204' for move 401967
      Queued 'M106 S102' for move 401968
      Queued 'M106 S204' for move 401970
      Queued 'M106 S102' for move 401975
      Queued 'M106 S204' for move 401978
      Queued 'M106 S102' for move 401979
      Queued 'M106 S204' for move 401981
      Queued 'M106 S102' for move 401986
      Queued 'M106 S204' for move 401989
      Queued 'M106 S102' for move 401990
      Queued 'M106 S204' for move 402016
      Queued 'M106 S102' for move 402017
      Queued 'M106 S204' for move 402019
      Queued 'M106 S102' for move 402024
      Queued 'M106 S204' for move 402027
      Queued 'M106 S102' for move 402028
      16 of 16 codes have been queued.
      === Network ===
      Slowest loop: 178.61ms; fastest: 0.08ms
      Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0)
      HTTP sessions: 1 of 8

      • WiFi -
        Network state is running
        WiFi module is connected to access point
        Failed messages: pending 0, notready 0, noresp 0
        WiFi firmware version 1.21
        WiFi MAC address 5c:cf:7f:ee:68:10
        WiFi Vcc 3.39, reset reason Turned on by main processor
        WiFi flash size 4194304, free heap 14168
        WiFi IP address 192.168.1.120
        WiFi signal strength -76dBm, reconnections 0, sleep mode modem
        Socket states: 0 0 0 0 0 0 0 0
        === Expansion ===
        10:09:33
        Message Log cleared!
      1 Reply Last reply Reply Quote 0
      • dc42undefined
        dc42 administrators
        last edited by dc42

        I'm sorry, I think you have found a firmware bug. This line from your report:

        Scheduled moves: 402064, completed moves: 401876

        has a difference of 188, which is much greater than the size of the move queue. So it must be failing to count some completed moves. That has caused the code queue to become full with M106 commands; so the firmware is waiting for the completed move count to catch up.

        I think your only hope of saving the print is to pause it, check that a resurrect.g file has been created, use Emergency Stop to reset the Duet, and use the resurrect feature (M916) to restart the print. I think firmware 2.02RC3 inserts a G92 command in resurrect.g prior to calling resurrect-prologue.g, and if that is the case then you probably only need a M116 command in resurrect-prologue.g.

        EDIT: I just checked the code. To avoid resetting the Duet (which will power the motors down, potentially losing position) you can pause the print, back up the resurrect.g file, then cancel the print. Then use a text editor to remove the G92 command from resurrect.g. Then use M916 to resume the print (as before you will need M116 in the resurrect-prologue.g file).

        Duet WiFi hardware designer and firmware engineer
        Please do not ask me for Duet support via PM or email, use the forum
        http://www.escher3d.com, https://miscsolutions.wordpress.com

        shogranundefined 1 Reply Last reply Reply Quote 0
        • dc42undefined
          dc42 administrators
          last edited by

          PS - I'm trying to track down how the move counts became so different. Did you pause the print a large number of times?

          Duet WiFi hardware designer and firmware engineer
          Please do not ask me for Duet support via PM or email, use the forum
          http://www.escher3d.com, https://miscsolutions.wordpress.com

          1 Reply Last reply Reply Quote 0
          • shogranundefined
            shogran @dc42
            last edited by

            @dc42 said in Print stopped at 83% for seemingly no reason:

            I'm sorry, I think you have found a firmware bug. This line from your report:

            Scheduled moves: 402064, completed moves: 401876

            has a difference of 188, which is much greater than the size of the move queue. So it must be failing to count some completed moves. That has caused the code queue to become full with M106 commands; so the firmware is waiting for the completed move count to catch up.

            I think your only hope of saving the print is to pause it, check that a resurrect.g file has been created, use Emergency Stop to reset the Duet, and use the resurrect feature (M916) to restart the print. I think firmware 2.02RC3 inserts a G92 command in resurrect.g prior to calling resurrect-prologue.g, and if that is the case then you probably only need a M116 command in resurrect-prologue.g.

            EDIT: I just checked the code. To avoid resetting the Duet (which will power the motors down, potentially losing position) you can pause the print, back up the resurrect.g file, then cancel the print. Then use a text editor to remove the G92 command from resurrect.g. Then use M916 to resume the print (as before you will need M116 in the resurrect-prologue.g file).

            @dc42 said in Print stopped at 83% for seemingly no reason:

            PS - I'm trying to track down how the move counts became so different. Did you pause the print a large number of times?

            Thanks for the feedback dc, much appreciated.

            I did pause and resume the file a few times when I noticed it wasn't printing but not while the print was working. I do have stall detection enabled which I know triggered a few times; could that have caused it/acted the same way as a pause?

            dc42undefined 1 Reply Last reply Reply Quote 0
            • dc42undefined
              dc42 administrators @shogran
              last edited by

              @shogran said in Print stopped at 83% for seemingly no reason:

              I do have stall detection enabled which I know triggered a few times; could that have caused it/acted the same way as a pause?

              Thanks, that may be where the problem is. What action do you have configured when a stall is detected?

              Duet WiFi hardware designer and firmware engineer
              Please do not ask me for Duet support via PM or email, use the forum
              http://www.escher3d.com, https://miscsolutions.wordpress.com

              shogranundefined 1 Reply Last reply Reply Quote 0
              • shogranundefined
                shogran @dc42
                last edited by

                @dc42 said in Print stopped at 83% for seemingly no reason:

                @shogran said in Print stopped at 83% for seemingly no reason:

                I do have stall detection enabled which I know triggered a few times; could that have caused it/acted the same way as a pause?

                Thanks, that may be where the problem is. What action do you have configured when a stall is detected?

                This is my stall detection line:

                M915 X Y S10 R3 F0 ; Stall detection, set X and Y to sensitivity 10, pause print, rehome, unfiltered
                

                and this is the rehome.g file it's using:

                G91                       ; relative mode
                G1 Z5 F200 S2; lower bed
                G1 S1 X-305 Y274.5 F3000   ; course home X or Y
                G1 S1 X-305              ; course home X
                G1 S1 Y274.5               ; course home Y
                G1 X4 Y-4 F600             ; move away from the endstops
                G1 S1 X-305                ; fine home X
                G90
                G1 Y274.5             ; fine home Y
                G91
                G1 Z-5 F200 S2; lower bed
                
                1 Reply Last reply Reply Quote 0
                • dc42undefined
                  dc42 administrators
                  last edited by

                  Thanks.

                  It looks like it occured at layer 127 (Z height 17.66mm)

                  The GCode file says that layer 127 has Z height 12.700mm. So do you mean layer 127, or 12.700mm height? It's possible that DWC is displaying the wrong layer count.

                  I've reviewed the part of the firmware that handles an emergency pause (which is done when there is power loss or a stall is detected), and I found two issues:

                  1. When an emergency pause occurs, an off-by-one error meant that the number of scheduled moves ends up one higher than it should be;
                  2. After an emergency pause, the queue of deferred commands was not being resynced to allow for moves that were discarded from the queue.

                  What I think happened was that you had a large number of repeated stalls around the same point in the print, during or shortly before bridging moves. Each one caused the number of scheduled moves to increase by one, and some deferred M106 commands to be left in the queue instead of being purged. This explains the following part of the M122 report:

                  Code queue is not empty:
                  Queued 'M106 S204' for move 401967
                  Queued 'M106 S102' for move 401968
                  Queued 'M106 S204' for move 401970
                  Queued 'M106 S102' for move 401975
                  Queued 'M106 S204' for move 401978
                  Queued 'M106 S102' for move 401979
                  Queued 'M106 S204' for move 401981
                  Queued 'M106 S102' for move 401986
                  Queued 'M106 S204' for move 401989
                  Queued 'M106 S102' for move 401990
                  Queued 'M106 S204' for move 402016
                  Queued 'M106 S102' for move 402017
                  Queued 'M106 S204' for move 402019
                  Queued 'M106 S102' for move 402024
                  Queued 'M106 S204' for move 402027
                  Queued 'M106 S102' for move 402028

                  The move numbers are so close together that I think the multiple pairs M106 moves must repetitions of the same sequence of bridging moves.

                  Do you think it possible that the stall detection was triggered 150 or more times, with at least the last 8 being all around the same place? If you have logging enabled, then the stalls may be recorded in the log file.

                  Duet WiFi hardware designer and firmware engineer
                  Please do not ask me for Duet support via PM or email, use the forum
                  http://www.escher3d.com, https://miscsolutions.wordpress.com

                  shogranundefined 1 Reply Last reply Reply Quote 0
                  • shogranundefined
                    shogran @dc42
                    last edited by

                    @dc42 said in Print stopped at 83% for seemingly no reason:

                    Thanks.

                    It looks like it occured at layer 127 (Z height 17.66mm)

                    The GCode file says that layer 127 has Z height 12.700mm. So do you mean layer 127, or 12.700mm height? It's possible that DWC is displaying the wrong layer count.

                    Ah, that would make sense actually as it was printing with a 0.1mm layer height. DWC may have been displaying the wrong info in this instance.

                    I've reviewed the part of the firmware that handles an emergency pause (which is done when there is power loss or a stall is detected), and I found two issues:

                    1. When an emergency pause occurs, an off-by-one error meant that the number of scheduled moves ends up one higher than it should be;
                    2. After an emergency pause, the queue of deferred commands was not being resynced to allow for moves that were discarded from the queue.

                    What I think happened was that you had a large number of repeated stalls around the same point in the print, during or shortly before bridging moves. Each one caused the number of scheduled moves to increase by one, and some deferred M106 commands to be left in the queue instead of being purged. This explains the following part of the M122 report:

                    Code queue is not empty:
                    Queued 'M106 S204' for move 401967
                    Queued 'M106 S102' for move 401968
                    Queued 'M106 S204' for move 401970
                    Queued 'M106 S102' for move 401975
                    Queued 'M106 S204' for move 401978
                    Queued 'M106 S102' for move 401979
                    Queued 'M106 S204' for move 401981
                    Queued 'M106 S102' for move 401986
                    Queued 'M106 S204' for move 401989
                    Queued 'M106 S102' for move 401990
                    Queued 'M106 S204' for move 402016
                    Queued 'M106 S102' for move 402017
                    Queued 'M106 S204' for move 402019
                    Queued 'M106 S102' for move 402024
                    Queued 'M106 S204' for move 402027
                    Queued 'M106 S102' for move 402028

                    The move numbers are so close together that I think the multiple pairs M106 moves must repetitions of the same sequence of bridging moves.

                    Do you think it possible that the stall detection was triggered 150 or more times, with at least the last 8 being all around the same place? If you have logging enabled, then the stalls may be recorded in the log file.

                    Unfortunately I didn't have logging enabled, though I have now added it to my config.g for future use. I think it's possible that it was triggered that many times as I've only recently enabled it and am still tuning the sensitivity. I've definitely been in the room a few times when it has triggered a few times in the space of a few seconds on the same point before moving past it.

                    1 Reply Last reply Reply Quote 0
                    • dc42undefined
                      dc42 administrators
                      last edited by

                      Note, the sensitivity to stalls increases with motor temperature. So a stall sensitivity in the M915 command that was OK when the motors were cold may be too sensitive when the motors are hot due to printing for some time.

                      Duet WiFi hardware designer and firmware engineer
                      Please do not ask me for Duet support via PM or email, use the forum
                      http://www.escher3d.com, https://miscsolutions.wordpress.com

                      shogranundefined 2 Replies Last reply Reply Quote 0
                      • shogranundefined
                        shogran @dc42
                        last edited by

                        @dc42 Thanks, I've restarted the print with a reduced sensitivity on the stall detection and logging enabled. I'll update here with the outcome.

                        1 Reply Last reply Reply Quote 0
                        • shogranundefined
                          shogran @dc42
                          last edited by

                          @dc42 Printed without any issues after reducing the sensitivity and not having the stall detection trigger at all that I could see. Looks like that may have been the cause.

                          dc42undefined 1 Reply Last reply Reply Quote 0
                          • dc42undefined
                            dc42 administrators @shogran
                            last edited by

                            @shogran said in Print stopped at 83% for seemingly no reason:

                            @dc42 Printed without any issues after reducing the sensitivity and not having the stall detection trigger at all that I could see. Looks like that may have been the cause.

                            i'm glad it worked. I've put a fix for the off-by-one issue in the forthcoming 2.02RC6 release. Please accept my apologies for the failed print.

                            Duet WiFi hardware designer and firmware engineer
                            Please do not ask me for Duet support via PM or email, use the forum
                            http://www.escher3d.com, https://miscsolutions.wordpress.com

                            1 Reply Last reply Reply Quote 1
                            • First post
                              Last post
                            Unless otherwise noted, all forum content is licensed under CC-BY-SA