Reset during print Part 2 - Duet 3.1.1 / 3.2beta3 - Duet3+SBC
-
@chrishamm original and zero temps version:
[PLA] CFFFP_Top - ReRender.gcode
[PLA] CFFFP_Top - ReRender - zero temps.gcodePrint using debug just stopped, attached is everything in scrollback buffer. Failure part below.
[debug] Waiting for finish of G1 F4800 X221.859 Y142.854 E0.51614 [debug] File: Sent G1 F4800 X215.386 Y162.84 E2.88069, remaining space 184, needed 56 [debug] File: Sent G0 F7200 X215.386 Y162.681, remaining space 136, needed 48 [debug] File: Sent G0 X216.503 Y162.681, remaining space 96, needed 40 [debug] File: Sent G0 X216.503 Y162.84, remaining space 56, needed 40 [debug] File: Sent G1 F4800 X221.859 Y142.854 E0.51614, remaining space 0, needed 56 [warn] Bad header checksum (expected 0x0000, got 0x1890) [warn] Bad header checksum (expected 0x0000, got 0x1890) [warn] Bad header checksum (expected 0x0000, got 0x1890) [warn] Restarting transfer because the number of maximum retries has been exceeded [debug] Cancelled G1 F4800 X221.859 Y142.854 E0.51614 [debug] Cancelled G0 X216.503 Y162.84 [debug] Cancelled G0 X216.503 Y162.681 [warn] Controller has been reset [debug] Cancelled G0 F7200 X215.386 Y162.681 [debug] Cancelled G1 F4800 X215.386 Y162.84 E2.88069 [debug] Cancelled G0 X112.148 Y162.84 [debug] Cancelled G0 X112.148 Y162.681 [debug] Cancelled G0 F7200 X221.858 Y133.072 [debug] Cancelled G1 F4800 X141.964 Y162.84 E2.10885 [debug] Cancelled G1 F4800 X110.759 Y162.841 E2.86914 [debug] Cancelled G0 F7200 X215.504 Y127.32 [debug] Cancelled G1 X141.926 Y162.984 [debug] Cancelled G1 F4800 X214.831 Y127.353 E0.91645 [debug] Cancelled G0 F7200 X205.323 Y162.84 [debug] Cancelled G0 F7200 X141.926 Y162.681 [debug] Cancelled G1 F4800 X205.204 Y162.84 E2.88069 [debug] Cancelled G1 F4800 X123.417 Y81.182 E1.68581 [warn] File: Out-of-order reply: '' [warn] File: Out-of-order reply: '' [debug] Cancelled G0 X58.3 Y98.673 [debug] Cancelled G0 X144.11 Y162.84 [debug] Cancelled G0 F7200 X58.3 Y100.269 [debug] Cancelled G1 X58.101 Y100.269 [debug] Cancelled G1 F4800 X62.453 Y81.182 E2.88069 [debug] Cancelled G1 F4800 X58.139 Y100.125 E0.48921 [debug] Cancelled G0 X144.11 Y162.681 [debug] Cancelled G0 F7200 X63.215 Y81.182 [debug] Cancelled G0 F7200 X133.729 Y81.182 [debug] Cancelled G0 X58.139 Y98.673 [debug] Cancelled G1 F4800 X134.028 Y81.181 E2.10885 [debug] Cancelled G0 F7200 X110.615 Y162.681 [debug] Cancelled G1 X110.615 Y162.879 [debug] Cancelled G0 F7200 X123.547 Y81.182 [debug] Cancelled G1 F4800 X221.859 Y133.675 E0.22419 [info] Aborted job file [info] Cancelled printing file 0:/gcodes/[PLA] CFFFP_Top - ReRender - zero temps.gcode, print time was 1h 18m [debug] Requesting update of key job, seq 47 -> 48 [debug] Updated key job
-
@Via I flashed 3.1.1 to the board with Bossa and made a new SD card config. It didn't take too long
-
I've put Duet3 into standalone now, retried print using 3.1.1 and print failed again. M122 shows Memory protection fault at time of failure.
=== Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.1.1 running on Duet 3 MB6HC v0.6 or 1.0 (standalone mode) Board ID: 08DJM-956L2-G43S4-6J9FA-3S86T-1B5LD Used output buffers: 1 of 40 (11 max) === RTOS === Static ram: 154604 Dynamic ram: 162852 of which 44 recycled Exception stack ram used: 272 Never used ram: 75444 Tasks: NETWORK(ready,364) ETHERNET(blocked,436) HEAT(blocked,1200) CanReceiv(suspended,3820) CanSender(suspended,1488) CanClock(blocked,1452) TMC(blocked,204) MAIN(running,4472) IDLE(ready,76) Owned mutexes: === Platform === Last reset 00:00:47 ago, cause: software Last software reset at 2020-11-18 11:53, reason: Memory protection fault, spinning module GCodes, available RAM 75108 bytes (slot 0) Software reset code 0x4163 HFSR 0x00000000 CFSR 0x00000001 ICSR 0x04427804 BFAR 0x00000000 SP 0x204175c4 Task MAIN Stack: 0000000a 0000000a 40070000 20417668 204176b9 00000015 00000000 0000000d 0046207b 00000052 20417630 Error status: 0 MCU temperature: min 37.2, current 37.4, max 38.0 Supply voltage: min 13.0, current 13.0, max 13.1, under voltage events: 0, over voltage events: 0, power good: yes 12V rail voltage: min 12.0, current 12.1, max 12.1, under voltage events: 0 Driver 0: standstill, reads 38262, writes 14 timeouts 0, SG min/max 0/0 Driver 1: standstill, reads 38263, writes 14 timeouts 0, SG min/max 0/0 Driver 2: standstill, reads 38263, writes 14 timeouts 0, SG min/max 0/0 Driver 3: standstill, reads 38264, writes 14 timeouts 0, SG min/max 0/0 Driver 4: standstill, reads 38264, writes 14 timeouts 0, SG min/max 0/0 Driver 5: standstill, reads 38268, writes 11 timeouts 0, SG min/max 0/0 Date/time: 2020-11-18 11:54:07 Slowest loop: 6.14ms; fastest: 0.14ms === Storage === Free file entries: 10 SD card 0 detected, interface speed: 25.0MBytes/sec SD card longest read time 0.8ms, write time 0.0ms, max retries 0 === Move === Hiccups: 0(0), FreeDm: 375, MinFreeDm: 375, MaxWait: 0ms Bed compensation in use: none, comp offset 0.000 === MainDDARing === Scheduled moves: 0, completed moves: 0, StepErrors: 0, LaErrors: 0, Underruns: 0, 0 CDDA state: -1 === AuxDDARing === Scheduled moves: 0, completed moves: 0, StepErrors: 0, LaErrors: 0, Underruns: 0, 0 CDDA state: -1 === Heat === Bed heaters = 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1 === GCodes === Segments left: 0 Movement lock held by null HTTP is idle in state(s) 0 Telnet is idle in state(s) 0 File is idle in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger is idle in state(s) 0 Queue is idle in state(s) 0 LCD is idle in state(s) 0 SBC is idle in state(s) 0 Daemon is idle in state(s) 0 Aux2 is idle in state(s) 0 Autopause is idle in state(s) 0 Code queue is empty. === Network === Slowest loop: 9.70ms; fastest: 0.03ms Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0), 0 sessions Telnet(0), 0 sessions HTTP sessions: 1 of 8 - Ethernet - State: active Error counts: 0 0 0 0 0 Socket states: 5 2 2 2 2 0 0 0 === CAN === Messages sent 190, longest wait 0ms for type 0 === Linux interface === State: 0, failed transfers: 0 Last transfer: 47750ms ago RX/TX seq numbers: 0/1 SPI underruns 0, overruns 0 Number of disconnects: 0 Buffer RX/TX: 0/0-0
-
I've got more details about this problem, it looks like
lxpanel
(a program for the desktop running in the background) has a memory leak and which, at some point, causes problems with DSF. I hope this will be fixed by Raspbian soon.@Via Please try out 3.2.0-b3.2 in standalone mode, the memory protection fault should be fixed in that build.
-
@chrishamm Thanks for the update.
Would using the non-gui version of duetpi and forgoing connected screen work around this issue for the short term keeping SBC control?
Have updated standalone to 3.2.0-b3.2 and running test again now.
-
@Via Yes, I think so. My printer has the GUI-less variant installed (DuetPi lite), so that probably explains why I couldn't observe this problem before.
-
@chrishamm No luck again on standalone with 3.2.0-beta3.2, this time "AssertionFailed"
I will have another try later with duetpi lite.
=== Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.2-beta3.2 running on Duet 3 MB6HC v0.6 or 1.0 (standalone mode) Board ID: 08DJM-956L2-G43S4-6J9FA-3S86T-1B5LD Used output buffers: 1 of 40 (11 max) === RTOS === Static ram: 122236 Dynamic ram: 168580 of which 376 recycled Never used RAM 101000, free system stack 180 words Tasks: NETWORK(ready,165) ETHERNET(blocked,109) HEAT(blocked,293) CanReceiv(blocked,948) CanSender(blocked,371) CanClock(blocked,358) TMC(blocked,54) MAIN(running,1111) IDLE(ready,19) Owned mutexes: === Platform === Last reset 00:00:28 ago, cause: software Last software reset at 2020-11-19 11:11, reason: AssertionFailed, GCodes spinning, available RAM 101000, slot 0 Software reset code 0x4123 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x00427000 BFAR 0x00000000 SP 0x2040fd54 Task MAIN Stack: 00000599 004886b0 00468ce5 00000000 ffffffff 20427700 2040e2a0 20427ab0 ffffffff 00000000 56a3ba63 a5a5a5a5 00468d9b 2040fda4 00000000 20423aa8 004664d5 20427aa0 00440cc3 00000000 20427aa4 20408001 2040fdb4 00000101 00469700 0046972e 61000000 Error status: 0x00 MCU temperature: min 37.1, current 37.8, max 38.0 Supply voltage: min 13.0, current 13.0, max 13.1, under voltage events: 0, over voltage events: 0, power good: yes 12V rail voltage: min 12.0, current 12.1, max 12.1, under voltage events: 0 Driver 0: position 0, standstill, reads 9148, writes 14 timeouts 0, SG min/max 0/0 Driver 1: position 0, standstill, reads 9149, writes 14 timeouts 0, SG min/max 0/0 Driver 2: position 0, standstill, reads 9151, writes 14 timeouts 0, SG min/max 0/0 Driver 3: position 0, standstill, reads 9152, writes 14 timeouts 0, SG min/max 0/0 Driver 4: position 0, standstill, reads 9153, writes 14 timeouts 0, SG min/max 0/0 Driver 5: position 0, standstill, reads 9157, writes 11 timeouts 0, SG min/max 0/0 Date/time: 2020-11-19 11:12:08 Slowest loop: 6.68ms; fastest: 0.21ms === Storage === Free file entries: 10 SD card 0 detected, interface speed: 25.0MBytes/sec SD card longest read time 1.2ms, write time 0.0ms, max retries 0 === Move === Hiccups: 0(0), FreeDm: 375, MinFreeDm: 375, MaxWait: 0ms Bed compensation in use: none, comp offset 0.000 === MainDDARing === Scheduled moves 0, completed moves 0, StepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === AuxDDARing === Scheduled moves 0, completed moves 0, StepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === Heat === Bed heaters = 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1 === GCodes === Segments left: 0 Movement lock held by null HTTP is idle in state(s) 0 Telnet is idle in state(s) 0 File is idle in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger is idle in state(s) 0 Queue is idle in state(s) 0 LCD is idle in state(s) 0 SBC is idle in state(s) 0 Daemon is idle in state(s) 0 Aux2 is idle in state(s) 0 Autopause is idle in state(s) 0 Code queue is empty. === Network === Slowest loop: 11.39ms; fastest: 0.03ms Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0), 0 sessions Telnet(0), 0 sessions HTTP sessions: 1 of 8 - Ethernet - State: active Error counts: 0 0 0 0 0 Socket states: 5 2 2 2 2 0 0 0 === CAN === Messages sent 113, send timeouts 113, longest wait 0ms for type 0, free CAN buffers 47
-
@Via , I will look into this later today.
-
@dc42 Many thanks.
Is it worth trying with DuetPi Lite? Was going to try earlier but work got in the way
-
@Via said in Reset during print Part 2 - Duet 3.1.1 / 3.2beta3 - Duet3+SBC:
@dc42 Many thanks.
Is it worth trying with DuetPi Lite? Was going to try earlier but work got in the way
I doubt it. The assertion failure looks like it was caused either by memory corruption, or possibly by a power brownout. I guess a hardware problem is also a possibility.
Can you set up a macro that runs the file in simulation mode within a loop, to see if you can provoke the failure that way?
-
@dc42 I've ran the file as a normal simulation which completes fine.
Trying to do a macro to loop it but my gcode knowledge is terrible and keep freezing the board can you point me in right direction?
Was thinking this (I know never ending loops are bad in themselves but was just going to let it run until I reset it) but looks like it is just buffering all the M24s in one go how would I go about waiting for it to finish one before starting the next?
M37 S1 M23 "[PLA] CFFFP_Top - ReRender - zero temps.gcode" while true M24
-
@dc42 Stopped overthinking it and just added the M37/M23/M24 to the end of gcode which is looping properly now.
Has done ~130 loops with no issues in simulation.
-
Simulation loop ran for 4 1/2 hours (~380 loops) with no issues.
Ripped 24V PSU out of another printer today and unplugged heaters and fans so just steppers and sensors plugged in and done another test print, again this did not complete.
This time M122 back to Memory Protection Fault
=== Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.2-beta3.2 running on Duet 3 MB6HC v0.6 or 1.0 (standalone mode) Board ID: 08DJM-956L2-G43S4-6J9FA-3S86T-1B5LD Used output buffers: 2 of 40 (11 max) === RTOS === Static ram: 122236 Dynamic ram: 168580 of which 376 recycled Never used RAM 101000, free system stack 200 words Tasks: NETWORK(ready,161) ETHERNET(blocked,109) HEAT(blocked,297) CanReceiv(blocked,948) CanSender(blocked,371) CanClock(blocked,356) TMC(blocked,54) MAIN(running,1111) IDLE(ready,19) Owned mutexes: === Platform === Last reset 00:00:20 ago, cause: software Last software reset at 2020-11-21 19:46, reason: MemoryProtectionFault mmarValid daccViol, Platform spinning, available RAM 101000, slot 1 Software reset code 0x4160 HFSR 0x00000000 CFSR 0x00000082 ICSR 0x00427804 BFAR 0x00038100 SP 0x2040aa78 Task TMC Stack: 00000001 2040a928 00000000 00410210 00000000 00469d0f 00415682 21000000 427a0001 3c800001 00000000 3d000001 00000000 00000000 37533333 43d697a0 00000000 41dde946 43fa0000 44a80000 2040ff38 00469d0f 00000000 40078000 20400020 00415681 0000001e Error status: 0x00 MCU temperature: min 39.2, current 39.9, max 40.0 Supply voltage: min 23.9, current 24.0, max 24.0, under voltage events: 0, over voltage events: 0, power good: yes 12V rail voltage: min 12.0, current 12.1, max 12.1, under voltage events: 0 Driver 0: position 0, standstill, reads 35967, writes 14 timeouts 0, SG min/max 0/0 Driver 1: position 0, standstill, reads 35968, writes 14 timeouts 0, SG min/max 0/0 Driver 2: position 0, standstill, reads 35969, writes 14 timeouts 0, SG min/max 0/0 Driver 3: position 0, standstill, reads 35970, writes 14 timeouts 0, SG min/max 0/0 Driver 4: position 0, standstill, reads 35971, writes 14 timeouts 0, SG min/max 0/0 Driver 5: position 0, standstill, reads 35975, writes 11 timeouts 0, SG min/max 0/0 Date/time: 2020-11-21 19:46:54 Slowest loop: 6.26ms; fastest: 0.15ms === Storage === Free file entries: 10 SD card 0 detected, interface speed: 25.0MBytes/sec SD card longest read time 0.8ms, write time 0.0ms, max retries 0 === Move === Hiccups: 0(0), FreeDm: 375, MinFreeDm: 375, MaxWait: 0ms Bed compensation in use: none, comp offset 0.000 === MainDDARing === Scheduled moves 0, completed moves 0, StepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === AuxDDARing === Scheduled moves 0, completed moves 0, StepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === Heat === Bed heaters = 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1 === GCodes === Segments left: 0 Movement lock held by null HTTP is idle in state(s) 0 Telnet is idle in state(s) 0 File is idle in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger is idle in state(s) 0 Queue is idle in state(s) 0 LCD is idle in state(s) 0 SBC is idle in state(s) 0 Daemon is idle in state(s) 0 Aux2 is idle in state(s) 0 Autopause is idle in state(s) 0 Code queue is empty. === Network === Slowest loop: 5.46ms; fastest: 0.03ms Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0), 0 sessions Telnet(0), 0 sessions HTTP sessions: 1 of 8 - Ethernet - State: active Error counts: 0 0 0 0 0 Socket states: 5 2 2 2 2 0 0 0 === CAN === Messages sent 82, send timeouts 82, longest wait 0ms for type 0, free CAN buffers 46
-
@Via thanks, I will look into that data later.
-
@Via, please post the config.g file that you were using with 3.2beta3.2 when that assertion failure occurred.
Do you have any filament monitors configured? [Edit: from your M122 report i think that you do not.]
-
-
@dc42 any update on this?
-
I wasn't able to pin this down, except to say that it looks like random memory failure or corruption. Please upgrade to 3.4beta4. If it happens again, please post the print file and config.g if you haven't already (I'm using a tablet, so I can't tell), also daemon.g if you are using it.
-
@dc42 Failed again on beta4 (M122 at bottom) they are posted already but will group together here.
[PLA] CFFFP_Top - ReRender - zero temps (1).gcode
full sys directory: https://www.viaraix.net/duet3-sys.zip
=== Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.2-beta4 running on Duet 3 MB6HC v0.6 or 1.0 (standalone mode) Board ID: 08DJM-956L2-G43S4-6J9FA-3S86T-1B5LD Used output buffers: 1 of 40 (11 max) === RTOS === Static ram: 123212 Dynamic ram: 168584 of which 704 recycled Never used RAM 99692, free system stack 188 words Tasks: NETWORK(ready,169) ETHERNET(blocked,110) HEAT(blocked,298) CanReceiv(blocked,947) CanSender(blocked,371) CanClock(blocked,358) TMC(blocked,54) MAIN(running,1113) IDLE(ready,19) Owned mutexes: === Platform === Last reset 00:00:19 ago, cause: software Last software reset at 2020-11-27 11:41, reason: MemoryProtectionFault mmarValid daccViol, GCodes spinning, available RAM 99692, slot 2 Software reset code 0x4163 HFSR 0x00000000 CFSR 0x00000082 ICSR 0x0044a804 BFAR 0x00000000 SP 0x204100c8 Task MAIN Stack: 00000002 ffffffff 00434a29 20410308 000002bc 00469483 0046a3c8 610f0000 2042565c 004691bf 2042565c 00000000 000002bb 0000000a 2042565c 204256c8 00000000 00000001 00000000 00468e53 204256c8 00000000 204256c8 004691bf 00000000 00000000 20425540 Error status: 0x00 MCU temperature: min 39.0, current 39.4, max 39.9 Supply voltage: min 23.9, current 24.0, max 24.0, under voltage events: 0, over voltage events: 0, power good: yes 12V rail voltage: min 12.0, current 12.1, max 12.1, under voltage events: 0 Driver 0: position 0, standstill, reads 32774, writes 14 timeouts 0, SG min/max 0/0 Driver 1: position 0, standstill, reads 32775, writes 14 timeouts 0, SG min/max 0/0 Driver 2: position 0, standstill, reads 32776, writes 14 timeouts 0, SG min/max 0/0 Driver 3: position 0, standstill, reads 32777, writes 14 timeouts 0, SG min/max 0/0 Driver 4: position 0, standstill, reads 32779, writes 14 timeouts 0, SG min/max 0/0 Driver 5: position 0, standstill, reads 32782, writes 11 timeouts 0, SG min/max 0/0 Date/time: 2020-11-27 11:41:21 Slowest loop: 6.06ms; fastest: 0.15ms === Storage === Free file entries: 10 SD card 0 detected, interface speed: 25.0MBytes/sec SD card longest read time 0.7ms, write time 0.0ms, max retries 0 === Move === Hiccups: 0(0), FreeDm: 375, MinFreeDm: 375, MaxWait: 0ms Bed compensation in use: none, comp offset 0.000 === MainDDARing === Scheduled moves 0, completed moves 0, StepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === AuxDDARing === Scheduled moves 0, completed moves 0, StepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === Heat === Bed heaters = 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1 === GCodes === Segments left: 0 Movement lock held by null HTTP is idle in state(s) 0 Telnet is idle in state(s) 0 File is idle in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger is idle in state(s) 0 Queue is idle in state(s) 0 LCD is idle in state(s) 0 SBC is idle in state(s) 0 Daemon is idle in state(s) 0 Aux2 is idle in state(s) 0 Autopause is idle in state(s) 0 Code queue is empty. === Network === Slowest loop: 17.56ms; fastest: 0.03ms Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0), 0 sessions Telnet(0), 0 sessions HTTP sessions: 1 of 8 - Ethernet - State: active Error counts: 0 0 0 0 0 Socket states: 5 2 2 2 2 0 0 0 === CAN === Messages queued 79, send timeouts 177, received 0, lost 0, longest wait 0ms for reply type 0, free buffers 47
-
Thanks for your new report. I am fairly certain that the memory protection fault was caused by a hardware error, because there is no way that the instruction at that address should fault, nor any way that the faulting address should be zero. So I suggest we swap your board. Who did you purchase it from?