Duet 0.6 randomly reboots



  • Hello everyone and all the best for the holidays!

    I have been having this issue since I updated my Duet to the latest firmware (Ver. 1.22). Before that I was on a very old version and was using my Ormerod 1 very sporadically.

    The Duet is seemingly rebooting randomly but often, from 2-3 min to max 20 min.
    I did notice in another thread with similar error that the problem was the SD card and I did change it twice but the problem persists.

    I used the M122 command and got this following.

    === Diagnostics ===
    RepRapFirmware for Duet version 1.22 running on Duet 0.6
    Used output buffers: 5 of 16 (9 max)
    === System ===
    Static ram: 44652
    Dynamic ram: 42068 of which 3392 recycled
    Stack ram used: 136 current, 4504 maximum
    Never used ram: 3688
    === Platform ===
    Last reset 00:00:10 ago, cause: software
    Last software reset at 2018-12-29 19:51, reason: Stuck in spin loop, spinning module Network, available RAM 3136 bytes (slot 3)
    Software reset code 0x4041 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0000080f BFAR 0xe000ed38 SP 0x20087ccc
    Stack: 000c3f2b 5465000a fffffff9 20087b88 ffff8000 0000000f 5465000a 01010101 000ac765 000ac764 61000000 000000c1 00000004 000bd9f9 000b0077 0000003a 00000001 20087d70 2007a934 20087d46 2007aa6c 20087d90 20087d40 0000002a
    Error status: 0
    Free file entries: 10
    SD card 0 detected, interface speed: 21.0MBytes/sec
    SD card longest block write time: 0.0ms, max retries 0
    MCU temperature: min 31.2, current 32.4, max 32.7
    Date/time: 2018-12-29 19:52:13
    Slowest loop: 17.11ms; fastest: 0.09ms
    === Move ===
    Hiccups: 0, StepErrors: 0, LaErrors: 0, FreeDm: 100, MinFreeDm: 100, MaxWait: 0ms, Underruns: 0, 0
    Scheduled moves: 0, completed moves: 0
    Bed compensation in use: none
    Bed probe heights: 0.000 0.000 0.000 0.000 0.000
    === Heat ===
    Bed heaters = 0, chamberHeaters = -1 -1
    === GCodes ===
    Segments left: 0
    Stack records: 1 allocated, 0 in use
    Movement lock held by null
    http is idle in state(s) 0
    telnet is idle in state(s) 0
    file is idle in state(s) 0
    serial is idle in state(s) 0
    aux is idle in state(s) 0
    daemon is idle in state(s) 0
    queue is idle in state(s) 0
    autopause is idle in state(s) 0
    Code queue is empty.
    === Network ===
    Free connections: 15 of 16
    Free transactions: 23 of 24
    Locked: 0, state: 4, listening: 20071c20, 0, 0

    Any insights would be greatly appreciated since my printer is currently unusable...

    Regards,
    Achilles


  • administrators

    Please try firmware 1.23. If the problem persists, post another M122 report after it has rebooted itself.



  • Hello David and thank you so much for getting back to me so fast!

    I noticed after posting that the new firmware version was out and immediately gave it a try.

    Unfortunately I still get the same random behavior.

    The M122 report has some different data on the Software reset code. See below.

    9:36:10 PM
    M122
    === Diagnostics ===
    RepRapFirmware for Duet version 1.23 running on Duet 0.6
    Used output buffers: 3 of 16 (8 max)
    === System ===
    Static ram: 44276
    Dynamic ram: 43092 of which 2744 recycled
    Stack ram used: 136 current, 2700 maximum
    Never used ram: 5492
    === Platform ===
    Last reset 00:00:40 ago, cause: watchdog
    Last software reset at 2018-12-30 21:04, reason: Stuck in spin loop, spinning module Network, available RAM 5236 bytes (slot 3)
    Software reset code 0x4041 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0400f80f BFAR 0xe000ed38 SP 0x20087cd4
    Stack: fffffff9 20087b88 ffff8000 0000000f 6554000a 01010101 000aec19 000aec18 61000000 000000c1 00000004 000c0021 000b252b 0000003a 00000001 20087d70 2007a7fc 20087d46 2007a8dc 20087d90 20087d40 0000002a 00000016 20073df0
    Error status: 0
    Free file entries: 10
    SD card 0 detected, interface speed: 21.0MBytes/sec
    SD card longest block write time: 0.0ms, max retries 0
    MCU temperature: min 32.8, current 34.8, max 35.3
    Date/time: 2018-12-30 21:36:09
    Slowest loop: 11.23ms; fastest: 0.09ms
    I2C nak errors 0, send timeouts 0, receive timeouts 0, finishTimeouts 0
    === Move ===
    Hiccups: 0, StepErrors: 0, LaErrors: 0, FreeDm: 100, MinFreeDm: 100, MaxWait: 0ms, Underruns: 0, 0
    Scheduled moves: 0, completed moves: 0
    Bed compensation in use: none
    Bed probe heights: 0.000 0.000 0.000 0.000 0.000
    === Heat ===
    Bed heaters = 0, chamberHeaters = -1 -1
    === GCodes ===
    Segments left: 0
    Stack records: 1 allocated, 0 in use
    Movement lock held by null
    http is idle in state(s) 0
    telnet is idle in state(s) 0
    file is idle in state(s) 0
    serial is idle in state(s) 0
    aux is idle in state(s) 0
    daemon is idle in state(s) 0
    queue is idle in state(s) 0
    autopause is idle in state(s) 0
    Code queue is empty.
    === Network ===
    Free connections: 15 of 16
    Free transactions: 23 of 24
    Locked: 0, state: 4, listening: 20071c18, 0, 0

    Once again thank very much you for your continuous help on with our 3D printing endeavors!

    Merry Christmas and happy holidays!

    Best regards,
    Achilles

    P.S. I really don't mind if you take your time getting back to me given the holidays.


  • administrators

    Looks like it's hit an assertion failure within LWIP and it's trying to print a debug message to USB. If you attach a PC running YAT or Pronterface to the USB port, you will probably see a message on it when this situation occurs, and it probably won't reboot. Take precautions against USB ground loops (see the wiki page about them) when using a USB connection.

    [Note for me when I look into this: looks like mdns_responder populate_record.constprop.11 is calling pbuf_cat with one of the args being null. It could be that the pbuf allocation in populate_record failed.]


  • administrators

    I've looked into this further. The MDNS responder in lwip version 1 is contributed code and is not up to the standard of the main lwip code. In particular, in multiple places fails to do any error handling when it fails to allocate a buffer. This is producing the assertion failure.

    I looked to see if there are any updates to lwip 1 to address this, but I didn't find any. In lwip 2 it looks like the MDNS responder has been completely rewritten, and it is part of the core code. So there are no MDNS fixes that I can back-port to lwip 1.

    Ideally the network code for the legacy Duets would be changed to use lwip 2 instead of lwip 1, but that would be a large undertaking. So I'm not going to do that because I have lots of Duet 2 and Duet 3 work to do, and I don't receive any remuneration for working on code for the legacy Duets. Unless someone else wants to take that project on, or to add the missing error handling to the mdns source code, I think the best thing to do is to disable mdns. This would mean that you have to use the IP address in your browser, instead of local names such as "Duet.local".



  • I don't use MDNS any way since I have a small local network for my "lab". How do I disable it though?
    I will try the USB output to see if I get anything.

    Thanks!



  • Also I totally understand it is pointless for you to spend so much time integrating lwip 2 for the old hardware. I hope disabling the mDNS works. Otherwise I will try to go to an older FW version. Is the integration of mDNS a recent addition? I wonder why I didn't have this problem in the older firmware.



  • Well after some effort I got the following error in Pronterface. For some reason that didn't come up every time the printer rebooted.

    Assertion "(h != NULL) && (t != NULL) (programmer violates API)" failed at line 752 in ../src/Duet/Lwip/lwip/src/core/pbuf.c


  • administrators

    That's exactly the message I expected so it confirms my diagnosis. I'll disable MDNS support in the next build.



  • Thank you David. Your help is much appreciated!

    I hoped there was an easy way to disable mDNS and not affect every future build for the early duets.

    I have been contemplating on upgrading my Ormerod 1 to a newer Duet but I don't think my old and not so reliable in terms of print quality/success Ormerod 1 is worth the extra money. I already have changed everything else including the aluminum parts and your dual-nozzle Z sensor.

    I wish you have you have a great new year's eve!



  • @dc42 said in Duet 0.6 randomly reboots:

    I'll disable MDNS support in the next build.

    Hello David, hope all is well!

    Any update on that?

    Regards,
    Achilles


  • administrators

    You can try the 2.03 early beta build at https://www.dropbox.com/s/ozuw60nflckpq6c/RepRapFirmware.bin?dl=0 if you like, but it's untested so exercise caution!



  • Thank you, will cautiously proceed and let you know!


  • administrators

    I included a Duet 086/06 build in the 2.03beta1 release too.


 

Looks like your connection to Duet3D was lost, please wait while we try to reconnect.