RRF 3.01-RC8/DWC 2.1.3/DSF 2.0.0: Mid-print hanging



  • D3 + Pi4.
    I haven't found a 100% repeatable way of replicating this yet, but for the last 5 prints I've done the printer just hangs for ~5-10 seconds mid-print. Running the same file twice has it stop at different places. Only every seems to do it once per print, and always happens within the first minute or so of the print.
    During this time data stops scrolling in DCS when in debug mode but doesn't show any error. DWC also stops updating - can be noticed by temperature values not updating. I've never had this before on this system and the only change is applying latest update bundle.

    update: it's now happened twice randomly within one print



  • Can you try running DCS from the console and not in debug mode? I'm wondering if it has to do with journalling when DCS is run as a service.

    edit: I'm also guessing it has to do with gcode execution rate. When it was happening for me, I was printing something that was all curves and printing at 100mm/sec so the gcode rate was very fast.

    An M122 would also show any buffer issues.



  • @gtj0 said in RRF 3.01-RC8/DWC 2.1.3/DSF 2.0.0: Mid-print hanging:

    Can you try running DCS from the console and not in debug mode? I'm wondering if it has to do with journalling when DCS is run as a service.

    To confirm, stop the DCS service and simply run: sudo /opt/dsf/bin/DuetControlServer ?
    If this is the case, are we to assume that 2.0.0 is outputting more data than previous versions? This is the first time I've experienced this kind of hang with or without DCS running in debug mode.



  • @ChrisP said in RRF 3.01-RC8/DWC 2.1.3/DSF 2.0.0: Mid-print hanging:

    @gtj0 said in RRF 3.01-RC8/DWC 2.1.3/DSF 2.0.0: Mid-print hanging:

    Can you try running DCS from the console and not in debug mode? I'm wondering if it has to do with journalling when DCS is run as a service.

    To confirm, stop the DCS service and simply run: sudo /opt/dsf/bin/DuetControlServer ?

    Correct

    If this is the case, are we to assume that 2.0.0 is outputting more data than previous versions? This is the first time I've experienced this kind of hang with or without DCS running in debug mode.

    Not necessarily. I'm just grabbing at straws. I'm going to try my "curvy" print again as well.



  • Cool. Done that... will try the same print again and see what happens 👍



  • @ChrisP Assuming it pauses, grab an M122 as soon after the pause as practical. It may help see what's going on. Of course, any console messages would be helpful as well.



  • @gtj0 said in RRF 3.01-RC8/DWC 2.1.3/DSF 2.0.0: Mid-print hanging:

    @ChrisP Assuming it pauses, grab an M122 as soon after the pause as practical. It may help see what's going on. Of course, any console messages would be helpful as well.

    Took me three attempts to get this because the pauses don't last for long, but I don't think this even shows anything useful. This one happened almost right at the end of the print. Having printed quite a few Marvins to test this now, I can see that on this print it usually seems to happen twice - once near the beginning and once near the end, but it's not at at the same spot. Weird.

    === Diagnostics ===
    RepRapFirmware for Duet 3 MB6HC version 3.01-RC8 running on Duet 3 MB6HC v0.6 or 1.0
    Board ID: 08DJM-956L2-G43S8-6J1FJ-3SD6M-9S3GFUsed output buffers: 1 of 40 (13 max)
    === RTOS ===
    Static ram: 154580
    Dynamic ram: 162220 of which 40 recycled
    Exception stack ram used: 512
    Never used ram: 75864
    Tasks: NETWORK(ready,2076) HEAT(blocked,1184) CanReceiv(suspended,3824) CanSender(suspended,1436) CanClock(blocked,1432) TMC(blocked,72) MAIN(running,4404) IDLE(ready,80)
    Owned mutexes:
    === Platform ===
    Last reset 00:34:37 ago, cause: reset button
    Last software reset at 2020-04-18 11:32, reason: User, spinning module LinuxInterface, available RAM 76368 bytes (slot 3)
    Software reset code 0x0010 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0444a000 BFAR 0x00000000 SP 0xffffffff Task 0x4e49414d
    Error status: 0
    Free file entries: 10
    SD card 0 not detected, interface speed: 37.5MBytes/sec
    SD card longest block write time: 0.0ms, max retries 0
    MCU temperature: min 51.8, current 52.2, max 52.4
    Supply voltage: min 11.3, current 11.4, max 12.1, under voltage events: 0, over voltage events: 0, power good: yes
    12V rail voltage: min 10.4, current 10.5, max 11.3, under voltage events: 0
    Driver 0: standstill, reads 51177, writes 0 timeouts 0, SG min/max 0/345
    Driver 1: standstill, reads 51178, writes 0 timeouts 0, SG min/max 0/497
    Driver 2: standstill, reads 51177, writes 0 timeouts 0, SG min/max 0/157
    Driver 3: standstill, reads 51178, writes 0 timeouts 0, SG min/max not available
    Driver 4: standstill, reads 51177, writes 0 timeouts 0, SG min/max 0/411
    Driver 5: standstill, reads 51178, writes 0 timeouts 0, SG min/max 0/153
    Date/time: 2020-04-18 16:46:00
    Slowest loop: 5.38ms; fastest: 0.22ms
    === Move ===
    Hiccups: 0(0), FreeDm: 375, MinFreeDm: 295, MaxWait: 0ms
    Bed compensation in use: none, comp offset 0.000
    === MainDDARing ===
    Scheduled moves: 59023, completed moves: 59023, StepErrors: 0, LaErrors: 0, Underruns: 0, 0  CDDA state: -1
    === AuxDDARing ===
    Scheduled moves: 0, completed moves: 0, StepErrors: 0, LaErrors: 0, Underruns: 0, 0  CDDA state: -1
    === Heat ===
    Bed heaters = 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1
    Heater 0 is on, I-accum = 0.6
    Heater 1 is on, I-accum = 0.6
    === GCodes ===
    Segments left: 0
    Movement lock held by null
    HTTP is idle in state(s) 0
    Telnet is idle in state(s) 0
    File* is idle in state(s) 0
    USB is ready with "M122" in state(s) 0
    Aux is idle in state(s) 0
    Trigger* is idle in state(s) 0
    Queue is idle in state(s) 0
    LCD is idle in state(s) 0
    SBC is idle in state(s) 0
    Daemon* is idle in state(s) 0 0, running macro
    Autopause is idle in state(s) 0
    Code queue is empty.
    === Network ===
    Slowest loop: 0.70ms; fastest: 0.01ms
    Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0)
    HTTP sessions: 0 of 8
    - Ethernet -
    State: disabled
    Error counts: 0 0 0 0 0
    Socket states: 0 0 0 0 0 0 0 0
    === CAN ===
    Messages sent 2432, longest wait 0ms for type 0
    === Linux interface ===
    State: 0, failed transfers: 0
    Last transfer: 1205ms ago
    RX/TX seq numbers: 64173/64174
    SPI underruns 0, overruns 0
    Number of disconnects: 0
    Buffer RX/TX: 0/0-0
    ok
    Done printing file
    Finished printing file 0:/gcodes/Test Parts/20200415 - Marvin_KeyChain.gcode, print time was 0h 33m
    


  • @ChrisP Just as I was reading your post I got a hang and got an M122. Maybe these will tell @chrishamm something...

    === Diagnostics ===
    RepRapFirmware for Duet 3 MB6HC version 3.01-RC8 running on Duet 3 MB6HC v0.6 or 1.0
    Board ID: 08DGM-9T66A-G63SJ-6J9F4-3SD6S-1U03BUsed output buffers: 1 of 40 (12 max)
    === RTOS ===
    Static ram: 154580
    Dynamic ram: 162976 of which 60 recycled
    Exception stack ram used: 536
    Never used ram: 75064
    Tasks: NETWORK(ready,2076) HEAT(blocked,1184) CanReceiv(suspended,3824) CanSender(suspended,1432) CanClock(blocked,1432) TMC(blocked,80) MAIN(running,2612) IDLE(ready,80)
    Owned mutexes:
    === Platform ===
    Last reset 21:12:11 ago, cause: software
    Last software reset time unknown, reason: User, spinning module LinuxInterface, available RAM 77796 bytes (slot 2)
    Software reset code 0x0010 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x04432000 BFAR 0x00000000 SP 0xffffffff Task 0x4e49414d
    Error status: 0
    Free file entries: 10
    SD card 0 not detected, interface speed: 37.5MBytes/sec
    SD card longest block write time: 0.0ms, max retries 0
    MCU temperature: min 38.4, current 43.0, max 43.2
    Supply voltage: min 25.2, current 25.3, max 25.5, under voltage events: 0, over voltage events: 0, power good: yes
    12V rail voltage: min 12.1, current 12.2, max 12.2, under voltage events: 0
    Driver 0: ok, reads 22565, writes 18 timeouts 0, SG min/max 0/1023
    Driver 1: ok, reads 22566, writes 18 timeouts 0, SG min/max 0/1023
    Driver 2: ok, reads 22561, writes 22 timeouts 0, SG min/max 0/1023
    Driver 3: standstill, reads 22536, writes 48 timeouts 0, SG min/max 0/209
    Driver 4: standstill, reads 22535, writes 48 timeouts 0, SG min/max 0/212
    Driver 5: standstill, reads 22536, writes 48 timeouts 0, SG min/max 0/207
    Date/time: 2020-04-18 10:14:43
    Slowest loop: 13.42ms; fastest: 0.21ms
    === Move ===
    Hiccups: 0(0), FreeDm: 348, MinFreeDm: 282, MaxWait: 64329175ms
    Bed compensation in use: none, comp offset 0.000
    === MainDDARing ===
    Scheduled moves: 106327, completed moves: 106321, StepErrors: 0, LaErrors: 0, Underruns: 104, 0  CDDA state: 3
    === AuxDDARing ===
    Scheduled moves: 0, completed moves: 0, StepErrors: 0, LaErrors: 0, Underruns: 0, 0  CDDA state: -1
    === Heat ===
    Bed heaters = 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1
    Heater 1 is on, I-accum = 0.3
    === GCodes ===
    Segments left: 0
    Movement lock held by null
    HTTP* is ready with "M122" in state(s) 0
    Telnet is idle in state(s) 0
    File* is idle in state(s) 0
    USB is idle in state(s) 0
    Aux is idle in state(s) 0
    Trigger is idle in state(s) 0
    Queue* is idle in state(s) 0
    LCD is idle in state(s) 0
    SBC is idle in state(s) 0
    Daemon* is idle in state(s) 0
    Autopause is idle in state(s) 0
    Code queue is empty.
    === Network ===
    Slowest loop: 0.99ms; fastest: 0.01ms
    Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0)
    HTTP sessions: 0 of 8
    - Ethernet -
    State: disabled
    Error counts: 0 0 0 0 0
    Socket states: 0 0 0 0 0 0 0 0
    === CAN ===
    Messages sent 20258, longest wait 0ms for type 0
    === Linux interface ===
    State: 0, failed transfers: 0
    Last transfer: 26ms ago
    RX/TX seq numbers: 29515/13058
    SPI underruns 0, overruns 0
    Number of disconnects: 1
    Buffer RX/TX: 24/216-0
    === Duet Control Server ===
    Duet Control Server v2.0.0.0
    File:
    Buffered code: G1 X-15.583 Y-169.950 E3.1797
    Buffered code: G1 X-15.655 Y-169.928 E3.1825
    Buffered code: G1 X-16.032 Y-169.812 E3.1969
    Buffered code: G1 X-16.936 Y-169.534 E3.2323
    Buffered code: G1 X-17.008 Y-169.512 E3.2353
    Buffered code: G1 X-17.077 Y-169.485 E3.2384
    Buffered code: G1 X-17.585 Y-169.289 E3.2611
    Buffered code: G1 X-18.963 Y-168.756 E3.3215
    Buffered code: G1 X-19.030 Y-168.730 E3.3242
    Buffered code: G1 X-19.100 Y-168.702 E3.3269
    Buffered code: G1 X-19.694 Y-168.464 E3.3500
    Buffered code: G1 X-20.956 Y-167.957 E3.4015
    Buffered code: G1 X-21.026 Y-167.929 E3.4044
    Buffered code: G1 X-21.094 Y-167.896 E3.4074
    Buffered code: G1 X-22.887 Y-167.034 E3.4858
    Buffered code: G1 X-22.955 Y-167.001 E3.4886
    ==> 756 bytes
    Code buffer space: 3304
    Configured SPI speed: 8000000 Hz
    Full transfers per second: 36.72
    File /opt/dsf/sd/gcodes//MODMASK-MED-TPU_RevA.gcode is selected, processing
    

    @chrishamm Just FYI... When I tried the M122 from CodeConsole, it actually hung until the print started again.



  • Well I guess at least it isn't just me then 😕
    Same situation? Just a random blip with no obvious error?



  • @ChrisP Exactly. Wouldn't be a big deal except the nozzle is still in contact with the print and making ugly blobs which pretty much ruin the print.



  • Indeed. The first time it happened I had a panic because I thought it had completely died, then suddenly it spring back into life. I'm also finding that the period it stops for is fairly random too, though it never seems to be more than ~5s or so.



  • Now this is interesting... Just got this message in conjunction with a hang:

    [info] System time has been changed
    

  • administrators

    Do the hangs perhaps occur when the NTP client updates the system time? Do the hangs stop if you disable the NTP process?



  • If the hangs are something to do with the NTP updates I'm curious to know why it only affects the latest release.


  • administrators

    I've found a potential deadlock which I'm about to fix in DSF 2.1.0. I am wondering if it is related to the delays you've been seeing.

    @gtj0 In your debug log there was a note saying "last transfer > 1s ago" which is quite unusual. What is the normal CPU load of your board while printing? If it is quite high, DCS tends to hang for a moment so perhaps we should increase the default priority of DCS in the systemd service file and check if that improves things.

    PS: I tried to reproduce the hangs with the default priority, 100% CPU load, and the upcoming DSF 2.1.0 and I don't seem to get any more lags.



  • This bug has resulted in this release being unusable for me now as it causing large blobs to form when it pauses which results in the head catching and knocking parts off the bed.
    For now I've reverted to the RC7 setup until the next release is out 👍



  • @chrishamm said in RRF 3.01-RC8/DWC 2.1.3/DSF 2.0.0: Mid-print hanging:

    I've found a potential deadlock which I'm about to fix in DSF 2.1.0. I am wondering if it is related to the delays you've been seeing.

    @gtj0 In your debug log there was a note saying "last transfer > 1s ago" which is quite unusual. What is the normal CPU load of your board while printing? If it is quite high, DCS tends to hang for a moment so perhaps we should increase the default priority of DCS in the systemd service file and check if that improves things.

    PS: I tried to reproduce the hangs with the default priority, 100% CPU load, and the upcoming DSF 2.1.0 and I don't seem to get any more lags.

    I think increasing the priority is a good idea. Philosophically, I'd consider the DCS a "real-time" process. I do usually have Chrome running on my Jetson Nano displaying a video stream but that only results in about a 25-30% CPU load and I have tried it without the video stream running but still got the pauses.

    I'll keep an eye out for DSF commits and test them as they become available.



  • @dc42 said in RRF 3.01-RC8/DWC 2.1.3/DSF 2.0.0: Mid-print hanging:

    Do the hangs perhaps occur when the NTP client updates the system time?

    I thought maybe but...

    Do the hangs stop if you disable the NTP process?

    No. That would have been too easy. 🙂

    @chrishamm Is there something in the DCS synchronizing the time between the SBC and the Duet?
    I'm wondering why the DCS prints that message in the first place.


  • administrators

    @gtj0 That message pops up when the periodic updater task in DCS notices a time shift (> 5s) after a predefined delay. When it notices that, it thinks the system time has been changed and sends an update to the Duet.



  • @gtj0 said in RRF 3.01-RC8/DWC 2.1.3/DSF 2.0.0: Mid-print hanging:

    I think increasing the priority is a good idea. Philosophically, I'd consider the DCS a "real-time" process. I do usually have Chrome running on my Jetson Nano displaying a video stream but that only results in about a 25-30% CPU load and I have tried it without the video stream running but still got the pauses.

    While I agree with the idea of increasing the priority as I would also see DCS as a real time process, I wouldn't encourage it as a 'fix' for this hanging issue. Having moved back to 1.3.2 over an hour ago I've not experienced a single pause, whereas previously I was getting 10+ second pauses every 10/15 minutes.
    I'm keen to try DSF 2.1.0 to see if the issue @chrishamm found was the cause.



  • @ChrisP said in RRF 3.01-RC8/DWC 2.1.3/DSF 2.0.0: Mid-print hanging:

    @gtj0 said in RRF 3.01-RC8/DWC 2.1.3/DSF 2.0.0: Mid-print hanging:

    I think increasing the priority is a good idea. Philosophically, I'd consider the DCS a "real-time" process. I do usually have Chrome running on my Jetson Nano displaying a video stream but that only results in about a 25-30% CPU load and I have tried it without the video stream running but still got the pauses.

    While I agree with the idea of increasing the priority as I would also see DCS as a real time process, I wouldn't encourage it as a 'fix' for this hanging issue. Having moved back to 1.3.2 over an hour ago I've not experienced a single pause, whereas previously I was getting 10+ second pauses every 10/15 minutes.
    I'm keen to try DSF 2.1.0 to see if the issue @chrishamm found was the cause.

    Oh yeah. I wasn't suggesting it as a "fix". Just a future preventative measure.



  • @chrishamm said in RRF 3.01-RC8/DWC 2.1.3/DSF 2.0.0: Mid-print hanging:

    @gtj0 That message pops up when the periodic updater task in DCS notices a time shift (> 5s) after a predefined delay. When it notices that, it thinks the system time has been changed and sends an update to the Duet.

    Ok so a symptom not a cause then.



  • Thought I was crazy but am having the exact same problems. Worked great with RC7, seen it pause for 5-10 seconds three times with RC8.



  • @chrishamm Not fixed with DSF commit d46965d
    and RRF commit 56c8e1d583d7f17e5c4266c7ce8cc227d40e3bdd (SBC improvements)

    Still hanging for many seconds. No messages in DCS log.


Log in to reply