RRF 3.01-RC8/DWC 2.1.3/DSF 2.0.0: Mid-print hanging
-
@ChrisP said in RRF 3.01-RC8/DWC 2.1.3/DSF 2.0.0: Mid-print hanging:
@gtj0 said in RRF 3.01-RC8/DWC 2.1.3/DSF 2.0.0: Mid-print hanging:
Can you try running DCS from the console and not in debug mode? I'm wondering if it has to do with journalling when DCS is run as a service.
To confirm, stop the DCS service and simply run:
sudo /opt/dsf/bin/DuetControlServer
?Correct
If this is the case, are we to assume that 2.0.0 is outputting more data than previous versions? This is the first time I've experienced this kind of hang with or without DCS running in debug mode.
Not necessarily. I'm just grabbing at straws. I'm going to try my "curvy" print again as well.
-
Cool. Done that... will try the same print again and see what happens
-
@ChrisP Assuming it pauses, grab an M122 as soon after the pause as practical. It may help see what's going on. Of course, any console messages would be helpful as well.
-
@gtj0 said in RRF 3.01-RC8/DWC 2.1.3/DSF 2.0.0: Mid-print hanging:
@ChrisP Assuming it pauses, grab an M122 as soon after the pause as practical. It may help see what's going on. Of course, any console messages would be helpful as well.
Took me three attempts to get this because the pauses don't last for long, but I don't think this even shows anything useful. This one happened almost right at the end of the print. Having printed quite a few Marvins to test this now, I can see that on this print it usually seems to happen twice - once near the beginning and once near the end, but it's not at at the same spot. Weird.
=== Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.01-RC8 running on Duet 3 MB6HC v0.6 or 1.0 Board ID: 08DJM-956L2-G43S8-6J1FJ-3SD6M-9S3GFUsed output buffers: 1 of 40 (13 max) === RTOS === Static ram: 154580 Dynamic ram: 162220 of which 40 recycled Exception stack ram used: 512 Never used ram: 75864 Tasks: NETWORK(ready,2076) HEAT(blocked,1184) CanReceiv(suspended,3824) CanSender(suspended,1436) CanClock(blocked,1432) TMC(blocked,72) MAIN(running,4404) IDLE(ready,80) Owned mutexes: === Platform === Last reset 00:34:37 ago, cause: reset button Last software reset at 2020-04-18 11:32, reason: User, spinning module LinuxInterface, available RAM 76368 bytes (slot 3) Software reset code 0x0010 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0444a000 BFAR 0x00000000 SP 0xffffffff Task 0x4e49414d Error status: 0 Free file entries: 10 SD card 0 not detected, interface speed: 37.5MBytes/sec SD card longest block write time: 0.0ms, max retries 0 MCU temperature: min 51.8, current 52.2, max 52.4 Supply voltage: min 11.3, current 11.4, max 12.1, under voltage events: 0, over voltage events: 0, power good: yes 12V rail voltage: min 10.4, current 10.5, max 11.3, under voltage events: 0 Driver 0: standstill, reads 51177, writes 0 timeouts 0, SG min/max 0/345 Driver 1: standstill, reads 51178, writes 0 timeouts 0, SG min/max 0/497 Driver 2: standstill, reads 51177, writes 0 timeouts 0, SG min/max 0/157 Driver 3: standstill, reads 51178, writes 0 timeouts 0, SG min/max not available Driver 4: standstill, reads 51177, writes 0 timeouts 0, SG min/max 0/411 Driver 5: standstill, reads 51178, writes 0 timeouts 0, SG min/max 0/153 Date/time: 2020-04-18 16:46:00 Slowest loop: 5.38ms; fastest: 0.22ms === Move === Hiccups: 0(0), FreeDm: 375, MinFreeDm: 295, MaxWait: 0ms Bed compensation in use: none, comp offset 0.000 === MainDDARing === Scheduled moves: 59023, completed moves: 59023, StepErrors: 0, LaErrors: 0, Underruns: 0, 0 CDDA state: -1 === AuxDDARing === Scheduled moves: 0, completed moves: 0, StepErrors: 0, LaErrors: 0, Underruns: 0, 0 CDDA state: -1 === Heat === Bed heaters = 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1 Heater 0 is on, I-accum = 0.6 Heater 1 is on, I-accum = 0.6 === GCodes === Segments left: 0 Movement lock held by null HTTP is idle in state(s) 0 Telnet is idle in state(s) 0 File* is idle in state(s) 0 USB is ready with "M122" in state(s) 0 Aux is idle in state(s) 0 Trigger* is idle in state(s) 0 Queue is idle in state(s) 0 LCD is idle in state(s) 0 SBC is idle in state(s) 0 Daemon* is idle in state(s) 0 0, running macro Autopause is idle in state(s) 0 Code queue is empty. === Network === Slowest loop: 0.70ms; fastest: 0.01ms Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0) HTTP sessions: 0 of 8 - Ethernet - State: disabled Error counts: 0 0 0 0 0 Socket states: 0 0 0 0 0 0 0 0 === CAN === Messages sent 2432, longest wait 0ms for type 0 === Linux interface === State: 0, failed transfers: 0 Last transfer: 1205ms ago RX/TX seq numbers: 64173/64174 SPI underruns 0, overruns 0 Number of disconnects: 0 Buffer RX/TX: 0/0-0 ok Done printing file Finished printing file 0:/gcodes/Test Parts/20200415 - Marvin_KeyChain.gcode, print time was 0h 33m
-
@ChrisP Just as I was reading your post I got a hang and got an M122. Maybe these will tell @chrishamm something...
=== Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.01-RC8 running on Duet 3 MB6HC v0.6 or 1.0 Board ID: 08DGM-9T66A-G63SJ-6J9F4-3SD6S-1U03BUsed output buffers: 1 of 40 (12 max) === RTOS === Static ram: 154580 Dynamic ram: 162976 of which 60 recycled Exception stack ram used: 536 Never used ram: 75064 Tasks: NETWORK(ready,2076) HEAT(blocked,1184) CanReceiv(suspended,3824) CanSender(suspended,1432) CanClock(blocked,1432) TMC(blocked,80) MAIN(running,2612) IDLE(ready,80) Owned mutexes: === Platform === Last reset 21:12:11 ago, cause: software Last software reset time unknown, reason: User, spinning module LinuxInterface, available RAM 77796 bytes (slot 2) Software reset code 0x0010 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x04432000 BFAR 0x00000000 SP 0xffffffff Task 0x4e49414d Error status: 0 Free file entries: 10 SD card 0 not detected, interface speed: 37.5MBytes/sec SD card longest block write time: 0.0ms, max retries 0 MCU temperature: min 38.4, current 43.0, max 43.2 Supply voltage: min 25.2, current 25.3, max 25.5, under voltage events: 0, over voltage events: 0, power good: yes 12V rail voltage: min 12.1, current 12.2, max 12.2, under voltage events: 0 Driver 0: ok, reads 22565, writes 18 timeouts 0, SG min/max 0/1023 Driver 1: ok, reads 22566, writes 18 timeouts 0, SG min/max 0/1023 Driver 2: ok, reads 22561, writes 22 timeouts 0, SG min/max 0/1023 Driver 3: standstill, reads 22536, writes 48 timeouts 0, SG min/max 0/209 Driver 4: standstill, reads 22535, writes 48 timeouts 0, SG min/max 0/212 Driver 5: standstill, reads 22536, writes 48 timeouts 0, SG min/max 0/207 Date/time: 2020-04-18 10:14:43 Slowest loop: 13.42ms; fastest: 0.21ms === Move === Hiccups: 0(0), FreeDm: 348, MinFreeDm: 282, MaxWait: 64329175ms Bed compensation in use: none, comp offset 0.000 === MainDDARing === Scheduled moves: 106327, completed moves: 106321, StepErrors: 0, LaErrors: 0, Underruns: 104, 0 CDDA state: 3 === AuxDDARing === Scheduled moves: 0, completed moves: 0, StepErrors: 0, LaErrors: 0, Underruns: 0, 0 CDDA state: -1 === Heat === Bed heaters = 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1 Heater 1 is on, I-accum = 0.3 === GCodes === Segments left: 0 Movement lock held by null HTTP* is ready with "M122" in state(s) 0 Telnet is idle in state(s) 0 File* is idle in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger is idle in state(s) 0 Queue* is idle in state(s) 0 LCD is idle in state(s) 0 SBC is idle in state(s) 0 Daemon* is idle in state(s) 0 Autopause is idle in state(s) 0 Code queue is empty. === Network === Slowest loop: 0.99ms; fastest: 0.01ms Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0) HTTP sessions: 0 of 8 - Ethernet - State: disabled Error counts: 0 0 0 0 0 Socket states: 0 0 0 0 0 0 0 0 === CAN === Messages sent 20258, longest wait 0ms for type 0 === Linux interface === State: 0, failed transfers: 0 Last transfer: 26ms ago RX/TX seq numbers: 29515/13058 SPI underruns 0, overruns 0 Number of disconnects: 1 Buffer RX/TX: 24/216-0 === Duet Control Server === Duet Control Server v2.0.0.0 File: Buffered code: G1 X-15.583 Y-169.950 E3.1797 Buffered code: G1 X-15.655 Y-169.928 E3.1825 Buffered code: G1 X-16.032 Y-169.812 E3.1969 Buffered code: G1 X-16.936 Y-169.534 E3.2323 Buffered code: G1 X-17.008 Y-169.512 E3.2353 Buffered code: G1 X-17.077 Y-169.485 E3.2384 Buffered code: G1 X-17.585 Y-169.289 E3.2611 Buffered code: G1 X-18.963 Y-168.756 E3.3215 Buffered code: G1 X-19.030 Y-168.730 E3.3242 Buffered code: G1 X-19.100 Y-168.702 E3.3269 Buffered code: G1 X-19.694 Y-168.464 E3.3500 Buffered code: G1 X-20.956 Y-167.957 E3.4015 Buffered code: G1 X-21.026 Y-167.929 E3.4044 Buffered code: G1 X-21.094 Y-167.896 E3.4074 Buffered code: G1 X-22.887 Y-167.034 E3.4858 Buffered code: G1 X-22.955 Y-167.001 E3.4886 ==> 756 bytes Code buffer space: 3304 Configured SPI speed: 8000000 Hz Full transfers per second: 36.72 File /opt/dsf/sd/gcodes//MODMASK-MED-TPU_RevA.gcode is selected, processing
@chrishamm Just FYI... When I tried the M122 from CodeConsole, it actually hung until the print started again.
-
Well I guess at least it isn't just me then
Same situation? Just a random blip with no obvious error? -
@ChrisP Exactly. Wouldn't be a big deal except the nozzle is still in contact with the print and making ugly blobs which pretty much ruin the print.
-
Indeed. The first time it happened I had a panic because I thought it had completely died, then suddenly it spring back into life. I'm also finding that the period it stops for is fairly random too, though it never seems to be more than ~5s or so.
-
Now this is interesting... Just got this message in conjunction with a hang:
[info] System time has been changed
-
Do the hangs perhaps occur when the NTP client updates the system time? Do the hangs stop if you disable the NTP process?
-
If the hangs are something to do with the NTP updates I'm curious to know why it only affects the latest release.
-
I've found a potential deadlock which I'm about to fix in DSF 2.1.0. I am wondering if it is related to the delays you've been seeing.
@gtj0 In your debug log there was a note saying "last transfer > 1s ago" which is quite unusual. What is the normal CPU load of your board while printing? If it is quite high, DCS tends to hang for a moment so perhaps we should increase the default priority of DCS in the systemd service file and check if that improves things.
PS: I tried to reproduce the hangs with the default priority, 100% CPU load, and the upcoming DSF 2.1.0 and I don't seem to get any more lags.
-
This bug has resulted in this release being unusable for me now as it causing large blobs to form when it pauses which results in the head catching and knocking parts off the bed.
For now I've reverted to the RC7 setup until the next release is out -
@chrishamm said in RRF 3.01-RC8/DWC 2.1.3/DSF 2.0.0: Mid-print hanging:
I've found a potential deadlock which I'm about to fix in DSF 2.1.0. I am wondering if it is related to the delays you've been seeing.
@gtj0 In your debug log there was a note saying "last transfer > 1s ago" which is quite unusual. What is the normal CPU load of your board while printing? If it is quite high, DCS tends to hang for a moment so perhaps we should increase the default priority of DCS in the systemd service file and check if that improves things.
PS: I tried to reproduce the hangs with the default priority, 100% CPU load, and the upcoming DSF 2.1.0 and I don't seem to get any more lags.
I think increasing the priority is a good idea. Philosophically, I'd consider the DCS a "real-time" process. I do usually have Chrome running on my Jetson Nano displaying a video stream but that only results in about a 25-30% CPU load and I have tried it without the video stream running but still got the pauses.
I'll keep an eye out for DSF commits and test them as they become available.
-
@dc42 said in RRF 3.01-RC8/DWC 2.1.3/DSF 2.0.0: Mid-print hanging:
Do the hangs perhaps occur when the NTP client updates the system time?
I thought maybe but...
Do the hangs stop if you disable the NTP process?
No. That would have been too easy.
@chrishamm Is there something in the DCS synchronizing the time between the SBC and the Duet?
I'm wondering why the DCS prints that message in the first place. -
@gtj0 That message pops up when the periodic updater task in DCS notices a time shift (> 5s) after a predefined delay. When it notices that, it thinks the system time has been changed and sends an update to the Duet.
-
@gtj0 said in RRF 3.01-RC8/DWC 2.1.3/DSF 2.0.0: Mid-print hanging:
I think increasing the priority is a good idea. Philosophically, I'd consider the DCS a "real-time" process. I do usually have Chrome running on my Jetson Nano displaying a video stream but that only results in about a 25-30% CPU load and I have tried it without the video stream running but still got the pauses.
While I agree with the idea of increasing the priority as I would also see DCS as a real time process, I wouldn't encourage it as a 'fix' for this hanging issue. Having moved back to 1.3.2 over an hour ago I've not experienced a single pause, whereas previously I was getting 10+ second pauses every 10/15 minutes.
I'm keen to try DSF 2.1.0 to see if the issue @chrishamm found was the cause. -
@ChrisP said in RRF 3.01-RC8/DWC 2.1.3/DSF 2.0.0: Mid-print hanging:
@gtj0 said in RRF 3.01-RC8/DWC 2.1.3/DSF 2.0.0: Mid-print hanging:
I think increasing the priority is a good idea. Philosophically, I'd consider the DCS a "real-time" process. I do usually have Chrome running on my Jetson Nano displaying a video stream but that only results in about a 25-30% CPU load and I have tried it without the video stream running but still got the pauses.
While I agree with the idea of increasing the priority as I would also see DCS as a real time process, I wouldn't encourage it as a 'fix' for this hanging issue. Having moved back to 1.3.2 over an hour ago I've not experienced a single pause, whereas previously I was getting 10+ second pauses every 10/15 minutes.
I'm keen to try DSF 2.1.0 to see if the issue @chrishamm found was the cause.Oh yeah. I wasn't suggesting it as a "fix". Just a future preventative measure.
-
@chrishamm said in RRF 3.01-RC8/DWC 2.1.3/DSF 2.0.0: Mid-print hanging:
@gtj0 That message pops up when the periodic updater task in DCS notices a time shift (> 5s) after a predefined delay. When it notices that, it thinks the system time has been changed and sends an update to the Duet.
Ok so a symptom not a cause then.
-
Thought I was crazy but am having the exact same problems. Worked great with RC7, seen it pause for 5-10 seconds three times with RC8.