Duet Maestro random reboot (Hard Fault) during large prints



  • Hey there,

    I am currently doing a project at my university with a Tronxy X5S. My task was to get the thing up and running again. The problem is currently that the print stops during larger prints. Since the thermistor connector for the first hotend was not working as expected on the first Maestro and which I thought could be the problem for the stops, I replaced it with a new one, which fixed the thermistor problem, but not the stops.
    It currently runs the latest firmware.

    I ran M122 and this is the output:
    Static ram: 19772
    Dynamic ram: 86864 of which 124 recycled
    Exception stack ram used: 204
    Never used ram: 24108
    Tasks: NETWORK(ready,764) HEAT(blocked,1296) MAIN(running,3904) IDLE(ready,200)
    Owned mutexes:
    === Platform ===
    Last reset 00:25:16 ago, cause: software
    Last software reset at 2019-11-27 14:28, reason: Hard fault, spinning module Platform, available RAM 23724 bytes (slot 0)
    Software reset code 0x4030 HFSR 0x40000000 CFSR 0x00000001 ICSR 0x04427803 BFAR 0xe000ed38 SP 0x200007e4 Task 0x5754454e
    Stack: 00404541 a5a5a5a4 01000000 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5
    Error status: 0
    Free file entries: 10
    SD card 0 detected, interface speed: 15.0MBytes/sec
    SD card longest block write time: 0.0ms, max retries 0
    MCU temperature: min 25.4, current 25.8, max 28.5
    Supply voltage: min 0.0, current 24.3, max 24.3, under voltage events: 0, over voltage events: 0, power good: yes
    Driver 0: standstill, read errors 0, write errors 1, ifcount 34, reads 60653, timeouts 0
    Driver 1: standstill, read errors 0, write errors 1, ifcount 34, reads 60653, timeouts 0
    Driver 2: standstill, read errors 0, write errors 1, ifcount 38, reads 60653, timeouts 0
    Driver 3: standstill, read errors 0, write errors 1, ifcount 36, reads 60653, timeouts 0
    Driver 4: standstill, read errors 0, write errors 1, ifcount 27, reads 60653, timeouts 0
    Driver 5: ok, read errors 0, write errors 0, ifcount 0, reads 0, timeouts 60659
    Driver 6: ok, read errors 0, write errors 0, ifcount 0, reads 0, timeouts 60659
    Date/time: 2019-11-27 14:53:23
    Slowest loop: 3.53ms; fastest: 0.05ms
    I2C nak errors 0, send timeouts 0, receive timeouts 0, finishTimeouts 0, resets 0
    === Move ===
    Hiccups: 0, FreeDm: 160, MinFreeDm: 160, MaxWait: 0ms
    Bed compensation in use: none, comp offset 0.000
    === DDARing ===
    Scheduled moves: 0, completed moves: 0, StepErrors: 0, LaErrors: 0, Underruns: 0, 0
    === Heat ===
    Bed heaters = 0, chamberHeaters = -1 -1
    === GCodes ===
    Segments left: 0
    Stack records: 1 allocated, 0 in use
    Movement lock held by null
    http is idle in state(s) 0
    telnet is idle in state(s) 0
    file is idle in state(s) 0
    serial is idle in state(s) 0
    aux is idle in state(s) 0
    daemon is idle in state(s) 0
    queue is idle in state(s) 0
    lcd is idle in state(s) 0
    autopause is idle in state(s) 0
    Code queue is empty.
    === Network ===
    Slowest loop: 4.49ms; fastest: 0.02ms
    Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0)
    HTTP sessions: 1 of 8
    Interface state 5, link 100Mbps full duplex

    The problem I have is that I never used a Duet board before and I don’t know what the log means. So I Hope you could help me somehow.
    Thanks in advance!



  • You cut off the first part of the M122, which states what hardware (Maestro) and firmware version you are running. The 'Used output buffers' may also be useful. Please can you post this if you still have the M122 output, or at least what firmware version you are on.

    Last reset 00:25:16 ago, cause: software
    Last software reset at 2019-11-27 14:28, reason: Hard fault, spinning module Platform, available RAM 23724 bytes (slot 0)
    Software reset code 0x4030 HFSR 0x40000000 CFSR 0x00000001 ICSR 0x04427803 BFAR 0xe000ed38 SP 0x200007e4 Task 0x5754454e
    Stack: 00404541 a5a5a5a4 01000000 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5
    Error status: 0

    This looks like the error, but you'll probably need @dc42 to have a look at it. There is a page on error codes here: https://duet3d.dozuki.com/Wiki/Error_codes_and_software_reset_codes
    The software reset code appears to reiterate that it's a 'hardFault', and that the Duet was 'in USB output at the time'.

    Ian



  • Alright, sorry for missing the upper part, here it is:

    27.11.2019, 14:52:58: Connected to 141.75.30.254
    27.11.2019, 14:53:24: M122: === Diagnostics ===
    RepRapFirmware for Duet 2 Maestro version 2.04 running on Duet Maestro 1.0
    Board ID: 08DJM-956DU-LL3T0-6J9FL-3S86Q-9T2LP
    Used output buffers: 3 of 24 (7 max)
    === RTOS ===
    Static ram: 19772
    Dynamic ram: 86864 of which 124 recycled
    Exception stack ram used: 204
    Never used ram: 24108
    Tasks: NETWORK(ready,764) HEAT(blocked,1296) MAIN(running,3904) IDLE(ready,200)
    Owned mutexes:
    === Platform ===
    Last reset 00:25:16 ago, cause: software
    Last software reset at 2019-11-27 14:28, reason: Hard fault, spinning module Platform, available RAM 23724 bytes (slot 0)
    Software reset code 0x4030 HFSR 0x40000000 CFSR 0x00000001 ICSR 0x04427803 BFAR 0xe000ed38 SP 0x200007e4 Task 0x5754454e
    Stack: 00404541 a5a5a5a4 01000000 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5
    Error status: 0
    Free file entries: 10
    SD card 0 detected, interface speed: 15.0MBytes/sec
    SD card longest block write time: 0.0ms, max retries 0
    MCU temperature: min 25.4, current 25.8, max 28.5
    Supply voltage: min 0.0, current 24.3, max 24.3, under voltage events: 0, over voltage events: 0, power good: yes
    Driver 0: standstill, read errors 0, write errors 1, ifcount 34, reads 60653, timeouts 0
    Driver 1: standstill, read errors 0, write errors 1, ifcount 34, reads 60653, timeouts 0
    Driver 2: standstill, read errors 0, write errors 1, ifcount 38, reads 60653, timeouts 0
    Driver 3: standstill, read errors 0, write errors 1, ifcount 36, reads 60653, timeouts 0
    Driver 4: standstill, read errors 0, write errors 1, ifcount 27, reads 60653, timeouts 0
    Driver 5: ok, read errors 0, write errors 0, ifcount 0, reads 0, timeouts 60659
    Driver 6: ok, read errors 0, write errors 0, ifcount 0, reads 0, timeouts 60659
    Date/time: 2019-11-27 14:53:23
    Slowest loop: 3.53ms; fastest: 0.05ms
    I2C nak errors 0, send timeouts 0, receive timeouts 0, finishTimeouts 0, resets 0
    === Move ===
    Hiccups: 0, FreeDm: 160, MinFreeDm: 160, MaxWait: 0ms
    Bed compensation in use: none, comp offset 0.000
    === DDARing ===
    Scheduled moves: 0, completed moves: 0, StepErrors: 0, LaErrors: 0, Underruns: 0, 0
    === Heat ===
    Bed heaters = 0, chamberHeaters = -1 -1
    === GCodes ===
    Segments left: 0
    Stack records: 1 allocated, 0 in use
    Movement lock held by null
    http is idle in state(s) 0
    telnet is idle in state(s) 0
    file is idle in state(s) 0
    serial is idle in state(s) 0
    aux is idle in state(s) 0
    daemon is idle in state(s) 0
    queue is idle in state(s) 0
    lcd is idle in state(s) 0
    autopause is idle in state(s) 0
    Code queue is empty.
    === Network ===
    Slowest loop: 4.49ms; fastest: 0.02ms
    Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0)
    HTTP sessions: 1 of 8
    Interface state 5, link 100Mbps full duplex

    I can clearly say that no USB cable was connected during the print, only the SD Card and the ethernet cable.


  • administrators

    Thanks for your report. It shows that the reset cause was a Hard Fault, which usually indicates a firmware bug. You are running latest stable firmware, so I will look into this tomorrow.


  • administrators

    This is a very strange fault. I can tell that the processor was about to execute a network transfer or had just executed one at the tie of the transfer, however the stack pointer register appears to have been corrupted so the stack trace makes very little sense. Please provide some more information:

    • You said " Since the thermistor connector for the first hotend was not working as expected on the first Maestro and which I thought could be the problem for the stops, I replaced it with a new one, which fixed the thermistor problem, but not the stops." Are you saying that you have used 2 different Duet Maestro boards, and they both had this problem (longer prints stopping)?

    • If you run the same print more than once, does it always stop at exactly the same place, or is there a randomness in how long it prints for before it stops?

    • Please provide your config.g file, and also provide a GCode file that shows the problem. I will run it on the bench and see if I can replicate the problem. Unfortunately, file uploads to this forum don't seem to work properly at present, but you can host the files on Dropbox or Google drive and link to them in your post.

    • Please run another print (without filament if you like) and after it stops, run M122 and post the results again. I want to see if the reset happened at exactly the same place again.



  • @dc42 hey there, thank you for the answer.

    Regarding your questions:

    1. It happened on both boards, yes.
    2. I currently print the gcode a second time, but I need to wait for the staff of my professor to give me informations since I am not at the lab today
    3. Asked the staff for the files and will immediately send them to you
    4. See 2


  • @Gizzle Also, how are you producing the gcode, ie which slicer, or is it custom generated?

    Ian



  • @droftarts Was generated with Cura.



  • Ok, I think I found the problem. I forgot to mention that I had extruder skips. Today I replaced the stock extruder with a Bondtech including a pancake stepper, reduced the current from 1.6A (yes, very high but not done by me) to the maximum of 0.8A, which the stepper is rated for, and now the printer seems to work again. Need to do further tests next week, but I am very confident.

    So my last question is: Could a high temperature of the driver caused by the high amperage lead to a system shutoff?


  • administrators

    @Gizzle said in Duet Maestro random reboot (Hard Fault) during large prints:

    So my last question is: Could a high temperature of the driver caused by the high amperage lead to a system shutoff?

    It certainly shouldn't! If the drivers get too hot, warning messages should appear on the GCode Console of Duet Web Control. Did you see any? This worked last time I tested it. If the drivers go on heating up, they will eventually shut down, which should give rise to a different error message on the console.



  • Unfortunately not. Wasn't connected to the printer during the print. But I'll print a new print and and report back...



  • @dc42 The problem occured today again. Unfortunately the guys in the lab send me an empty log file. But I have the config and a gcode which stopped.

    https://www.dropbox.com/s/7ib6btuo0yumo46/CFFFP_chibigrim-twiesner-3.gcode?dl=0

    https://www.dropbox.com/s/m29xw2cnosvt7os/download.zip?dl=0


  • administrators

    I'm sorry, if I am to look into this then I need the M122 report.



  • @dc42 Somehow my phone didn't show the log properly, so here it is:
    12.12.2019, 07:12:04: Connected to 141.75.30.240
    12.12.2019, 07:12:38: M122: === Diagnostics ===
    RepRapFirmware for Duet 2 Maestro version 2.04 running on Duet Maestro 1.0
    Board ID: 08DJM-956DU-LL3T0-6J9FL-3S86Q-9T2LP
    Used output buffers: 3 of 24 (7 max)
    === RTOS ===
    Static ram: 19772
    Dynamic ram: 86864 of which 124 recycled
    Exception stack ram used: 264
    Never used ram: 24048
    Tasks: NETWORK(ready,764) HEAT(blocked,1296) MAIN(running,3904) IDLE(ready,200)
    Owned mutexes:
    === Platform ===
    Last reset 16:48:54 ago, cause: software
    Last software reset at 2019-12-11 14:23, reason: Hard fault, spinning module Platform, available RAM 23724 bytes (slot 0)
    Software reset code 0x4030 HFSR 0x40000000 CFSR 0x00020000 ICSR 0x04427803 BFAR 0xe000ed38 SP 0x20000ff4 Task 0x5754454e
    Stack: 004057a9 0045c534 20000000 2000847c e000ed01 20001c0c a5a5a5a5 a5a5a5a5 00000001 20008380 bcfbb5af 00000000 004093e5 a5a5a5a5 10000000 e000ed04 20001c0c a5a5a5a5 00409447 a5a5a5a5 00449765 a5a5a5a5 00ffffff
    Error status: 0
    Free file entries: 10
    SD card 0 detected, interface speed: 15.0MBytes/sec
    SD card longest block write time: 0.0ms, max retries 0
    MCU temperature: min 25.3, current 25.6, max 30.6
    Supply voltage: min 0.0, current 24.3, max 24.3, under voltage events: 0, over voltage events: 0, power good: yes
    Driver 0: standstill, read errors 0, write errors 1, ifcount 55, reads 62059, timeouts 0
    Driver 1: standstill, read errors 0, write errors 1, ifcount 55, reads 62059, timeouts 0
    Driver 2: standstill, read errors 0, write errors 1, ifcount 59, reads 62059, timeouts 0
    Driver 3: standstill, read errors 0, write errors 1, ifcount 45, reads 62059, timeouts 0
    Driver 4: standstill, read errors 0, write errors 1, ifcount 27, reads 62059, timeouts 0
    Driver 5: ok, read errors 0, write errors 0, ifcount 0, reads 0, timeouts 62065
    Driver 6: ok, read errors 0, write errors 0, ifcount 0, reads 0, timeouts 62065
    Date/time: 2019-12-12 07:12:37
    Slowest loop: 3.49ms; fastest: 0.05ms
    I2C nak errors 0, send timeouts 0, receive timeouts 0, finishTimeouts 0, resets 0
    === Move ===
    Hiccups: 0, FreeDm: 160, MinFreeDm: 160, MaxWait: 0ms
    Bed compensation in use: none, comp offset 0.000
    === DDARing ===
    Scheduled moves: 0, completed moves: 0, StepErrors: 0, LaErrors: 0, Underruns: 0, 0
    === Heat ===
    Bed heaters = 0, chamberHeaters = -1 -1
    === GCodes ===
    Segments left: 0
    Stack records: 1 allocated, 0 in use
    Movement lock held by null
    http is idle in state(s) 0
    telnet is idle in state(s) 0
    file is idle in state(s) 0
    serial is idle in state(s) 0
    aux is idle in state(s) 0
    daemon is idle in state(s) 0
    queue is idle in state(s) 0
    lcd is idle in state(s) 0
    autopause is idle in state(s) 0
    Code queue is empty.
    === Network ===
    Slowest loop: 4.87ms; fastest: 0.02ms
    Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0)
    HTTP sessions: 1 of 8
    Interface state 5, link 100Mbps full duplex


  • administrators

    Thanks. Looking at both M122 reports, both crashed at a nonsensical point, suggesting either hardware failure or stack corruption. But both were in the network subsystem at the time. On inspecting the code, I realised that if the W5500 network chip returned a larger value for the amount of data it had than is theoretically possible, that would cause DMA to overflow a buffer, which might lead to stack corruption. There is a chance that this was responsible for both crashes. I have added a check in the 2.05 release.



  • @dc42 Thank you for your response. I got the info from my professor that sometimes our network has some small outtages. Could these be the reason for the problem? I started a print without connected network today to check if the print finishes...


  • administrators

    @Gizzle said in Duet Maestro random reboot (Hard Fault) during large prints:

    @dc42 Thank you for your response. I got the info from my professor that sometimes our network has some small outtages. Could these be the reason for the problem? I started a print without connected network today to check if the print finishes...

    Not unless there is a bug in the W5500 chip that causes it to return faulty info when there is a network outage.

    I have released firmware 2.05, so please upgrade to that and see if it helps.



  • @dc42 will do that the next time I'm at the printer. Thank you very much for your hard work!


Log in to reply