Duet Maestro random reboot (Hard Fault) during large prints
-
Thanks. Looking at both M122 reports, both crashed at a nonsensical point, suggesting either hardware failure or stack corruption. But both were in the network subsystem at the time. On inspecting the code, I realised that if the W5500 network chip returned a larger value for the amount of data it had than is theoretically possible, that would cause DMA to overflow a buffer, which might lead to stack corruption. There is a chance that this was responsible for both crashes. I have added a check in the 2.05 release.
-
@dc42 Thank you for your response. I got the info from my professor that sometimes our network has some small outtages. Could these be the reason for the problem? I started a print without connected network today to check if the print finishes...
-
@Gizzle said in Duet Maestro random reboot (Hard Fault) during large prints:
@dc42 Thank you for your response. I got the info from my professor that sometimes our network has some small outtages. Could these be the reason for the problem? I started a print without connected network today to check if the print finishes...
Not unless there is a bug in the W5500 chip that causes it to return faulty info when there is a network outage.
I have released firmware 2.05, so please upgrade to that and see if it helps.
-
@dc42 will do that the next time I'm at the printer. Thank you very much for your hard work!
-
@dc42 Updated today to the 2.05 release. The print without ethernet from last week went fine, the one from today on 2.05 and over ethernet aborted again.
Here is the M122 output:M122
=== Diagnostics ===
RepRapFirmware for Duet 2 Maestro version 2.05 running on Duet Maestro 1.0
Board ID: 08DJM-956DU-LL3T0-6J9FL-3S86Q-9T2LP
Used output buffers: 1 of 24 (20 max)
=== RTOS ===
Static ram: 19804
Dynamic ram: 87256 of which 124 recycled
Exception stack ram used: 204
Never used ram: 23684
Tasks: NETWORK(ready,900) HEAT(blocked,1296) MAIN(running,3896) IDLE(ready,160)
Owned mutexes:
=== Platform ===
Last reset 00:03:31 ago, cause: software
Last software reset at 2019-12-18 10:25, reason: Assertion failed, spinning module Platform, available RAM 23440 bytes (slot 1)
Software reset code 0x4090 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x04427000 BFAR 0xe000ed38 SP 0x20000f84 Task 0x5754454e
Stack: 00000f3c 00463a24 0044a949 00000001 200084a8 00000001 00000000 00000000 0044c2e3 200084a8 2000075c 00000000 0044c641 00000000 00000000 2000849c 20000fd8 200084a8 2000075c 20001000 00000000 a5a5a5a5 a5a5a5a5
Error status: 0
Free file entries: 10
SD card 0 detected, interface speed: 15.0MBytes/sec
SD card longest block write time: 0.0ms, max retries 0
MCU temperature: min 30.8, current 30.9, max 32.3
Supply voltage: min 0.0, current 24.2, max 24.3, under voltage events: 0, over voltage events: 0, power good: yes
Driver 0: standstill, read errors 0, write errors 1, ifcount 27, reads 8457, timeouts 0
Driver 1: standstill, read errors 0, write errors 1, ifcount 27, reads 8457, timeouts 0
Driver 2: standstill, read errors 0, write errors 1, ifcount 27, reads 8457, timeouts 0
Driver 3: standstill, read errors 0, write errors 1, ifcount 25, reads 8457, timeouts 0
Driver 4: standstill, read errors 0, write errors 1, ifcount 20, reads 8457, timeouts 0
Driver 5: ok, read errors 0, write errors 0, ifcount 0, reads 0, timeouts 8463
Driver 6: ok, read errors 0, write errors 0, ifcount 0, reads 0, timeouts 8463
Date/time: 2019-12-18 10:29:10
Slowest loop: 1.07ms; fastest: 0.05ms
I2C nak errors 0, send timeouts 0, receive timeouts 0, finishTimeouts 0, resets 0
=== Move ===
Hiccups: 0, FreeDm: 160, MinFreeDm: 160, MaxWait: 0ms
Bed compensation in use: none, comp offset 0.000
=== DDARing ===
Scheduled moves: 0, completed moves: 0, StepErrors: 0, LaErrors: 0, Underruns: 0, 0
=== Heat ===
Bed heaters = 0, chamberHeaters = -1 -1
=== GCodes ===
Segments left: 0
Stack records: 1 allocated, 0 in use
Movement lock held by null
http is idle in state(s) 0
telnet is idle in state(s) 0
file is idle in state(s) 0
serial is idle in state(s) 0
aux is idle in state(s) 0
daemon is idle in state(s) 0
queue is idle in state(s) 0
lcd is idle in state(s) 0
autopause is idle in state(s) 0
Code queue is empty.
=== Network ===
Slowest loop: 7.19ms; fastest: 0.02ms
Responder states: HTTP(0) HTTP(1) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0)
HTTP sessions: 1 of 8
Interface state 5, link 100Mbps full duplex -
That's another strange failure. As the cause may be a hardware failure, I think the time has come to try another Maestro. Please ask your supplier to replace your Maestro under warranty, and tag the one you return with a link to this thread.
-
@dc42 That's funny, because that's the second maestro I am using right now...
That's why I can't really believe that a second one has also the same symptoms... -
@Gizzle said in Duet Maestro random reboot (Hard Fault) during large prints:
@dc42 That's funny, because that's the second maestro I am using right now...
That's why I can't really believe that a second one has also the same symptoms...Just to confirm: you have the same fault on 2 different Maestro boards? That makes a difference.
-
@dc42 Yes, this is the case
-
@Gizzle said in Duet Maestro random reboot (Hard Fault) during large prints:
Updated today to the 2.05 release. The print without ethernet from last week went fine, the one from today on 2.05 and over ethernet aborted again.
When you say "over Ethernet", do you mean that you uploaded the file to SD card over Ethernet, and then printed form SD card?
We've seen a very small number of instances of file corruption when uploading large files over Ethernet. So I am wondering whether file corruption could be the issue - although a corrupt file shouldn't cause the firmware to crash, unless the firmware file itself was corrupted when you uploaded it (RRF3 checks for firmware corruption, but RRF2 doesn't). Here are some suggestions:
- Re-run the file that crashed, in simulation mode, without uploading it again, and see whether the simulation completed successfully.
- Use DWC 2.0.4 and enable the feature to do CRC checking when it uploads files.
- Run one of those prints again, and after it has printed several layers, run M122 and post the results here.
You could also consider switching to RepRapFirmware 3.0RC1 in case it either solves the problem or provides more information, although that would require changes to your config.g file.
-
@dc42 Yes, I upload the gcode and control the printer with the printer connected to the network via ethernet.
If file corruption would be the problem, wouldn't the print also fail during a print of a via ethernet uploaded gcode which was started with the printer being offline over the touchscreen of the printer?I can test your steps this friday. I will report back.
-
@Gizzle said in Duet Maestro random reboot (Hard Fault) during large prints:
If file corruption would be the problem, wouldn't the print also fail during a print of a via ethernet uploaded gcode which was started with the printer being offline over the touchscreen of the printer?
Yes, if you printed the same file from the SD card again, without uploading it again.
-
@dc42 That's what I did several times and it worked.