UNSOLVED Duet3+SBC print stops - DWC unresponsive -



  • I've run into this again today where the machine stopped mid-print and DWC became unresponsive. Heaters and fans remained on. No errors generated in the console or in DWC. Was able to ssh into pi, was able to VNC into pi, DWC completely unresponsive. CAN-FD sync light blinking in time, no error lights on Duet3.

    Required power cycle to regain access to DWC - print was lost.

    How do I troubleshoot this issue - the console log does not appear to be functioning properly as there are no entries.
    Can I connect the Pi to the Duet USB and set up debugging and capture the USB output on the PI?
    I currently do not trust the setup for unattended operation.

    system configuration:
    Duet3 6HC (SBC Pi4 4GB) + 3HC + 4 x Tool boards (with distro board)
    RRF 3.1.1 + DuetPi with updates
    CoreXY
    XYZE is x16 microstepping Interpolated
    U is x4 microstepping interpolated



  • @BryanH
    Hi out of interest if you reload the page does DWC control come back ?...not that this will fix your issue but may help point to it and you may then be able to M122
    Maybe something worth trying if it was to happens again.


  • Moderator

    Post your config.g as well as M122 please.



  • Page reloads did not work.
    Browser restart did not work.
    Browser cache clear and restart did not work.

    Power cycle of entire system required to regain access to DWC as stated in initial post.

    No errors recorded in eventlog.txt

    No errors recorded in console.

    DWC hung again - required reboot of pi to restore access

    m122
    === Diagnostics ===
    RepRapFirmware for Duet 3 MB6HC version 3.1.1 running on Duet 3 MB6HC v0.6 or 1.0 (SBC mode)
    Board ID: 08DJM-956L2-G43S4-6J9DL-3SJ6S-1866H
    Used output buffers: 1 of 40 (14 max)
    === RTOS ===
    Static ram: 154604
    Dynamic ram: 163732 of which 20 recycled
    Exception stack ram used: 520
    Never used ram: 74340
    Tasks: NETWORK(ready,1972) HEAT(blocked,1088) CanReceiv(suspended,3420) CanSender(suspended,1428) CanClock(blocked,1436) TMC(blocked,68) MAIN(running,4936) IDLE(ready,76)
    Owned mutexes:
    === Platform ===
    Last reset 09:06:56 ago, cause: power up
    Last software reset at 2020-07-22 19:13, reason: User, spinning module LinuxInterface, available RAM 74300 bytes (slot 2)
    Software reset code 0x0010 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0444a000 BFAR 0x00000000 SP 0xffffffff Task MAIN
    Error status: 0
    MCU temperature: min 18.1, current 39.6, max 41.1
    Supply voltage: min 23.9, current 24.1, max 24.2, under voltage events: 0, over voltage events: 0, power good: yes
    12V rail voltage: min 11.9, current 12.0, max 12.1, under voltage events: 0
    Driver 0: standstill, reads 26677, writes 27 timeouts 0, SG min/max 0/1023
    Driver 1: standstill, reads 26681, writes 23 timeouts 0, SG min/max 0/55
    Driver 2: standstill, reads 26678, writes 27 timeouts 0, SG min/max 0/1023
    Driver 3: standstill, reads 26678, writes 27 timeouts 0, SG min/max 0/1023
    Driver 4: standstill, reads 26679, writes 27 timeouts 0, SG min/max 0/1023
    Driver 5: standstill, reads 26679, writes 27 timeouts 0, SG min/max 0/1023
    Date/time: 2020-07-25 19:05:43
    Slowest loop: 10.81ms; fastest: 0.14ms
    === Storage ===
    Free file entries: 10
    SD card 0 not detected, interface speed: 37.5MBytes/sec
    SD card longest read time 0.0ms, write time 0.0ms, max retries 0
    === Move ===
    Hiccups: 0(0), FreeDm: 375, MinFreeDm: 322, MaxWait: 3079109ms
    Bed compensation in use: mesh, comp offset 0.008
    === MainDDARing ===
    Scheduled moves: 45996, completed moves: 45996, StepErrors: 0, LaErrors: 0, Underruns: 0, 0 CDDA state: -1
    === AuxDDARing ===
    Scheduled moves: 0, completed moves: 0, StepErrors: 0, LaErrors: 0, Underruns: 0, 0 CDDA state: -1
    === Heat ===
    Bed heaters = 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1
    Heater 0 is on, I-accum = 0.1
    === GCodes ===
    Segments left: 0
    Movement lock held by null
    HTTP* is ready with "M122" in state(s) 0
    Telnet is idle in state(s) 0
    File is idle in state(s) 0
    USB is idle in state(s) 0
    Aux is idle in state(s) 0
    Trigger* is idle in state(s) 0
    Queue is idle in state(s) 0
    LCD is idle in state(s) 0
    SBC is idle in state(s) 0
    Daemon* is idle in state(s) 0
    Aux2 is idle in state(s) 0
    Autopause is idle in state(s) 0
    Code queue is empty.
    === Network ===
    Slowest loop: 2.21ms; fastest: 0.01ms
    Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0), 0 sessions Telnet(0), 0 sessions
    HTTP sessions: 0 of 8

    • Ethernet -
      State: disabled
      Error counts: 0 0 0 0 0
      Socket states: 0 0 0 0 0 0 0 0
      === CAN ===
      Messages sent 175312, longest wait 2ms for type 6011
      === Linux interface ===
      State: 0, failed transfers: 0
      Last transfer: 17ms ago
      RX/TX seq numbers: 6486/19231
      SPI underruns 0, overruns 0
      Number of disconnects: 8
      Buffer RX/TX: 0/0-0
      === Duet Control Server ===
      Duet Control Server v3.1.1
      Code buffer space: 4096
      Configured SPI speed: 8000000 Hz
      Full transfers per second: 29.63

    config.g



  • How are you powering the SBC (Raspberry Pi)?

    I saw similar happenings powering the Pi from the 6HC Main board. Changing the main board jumpers to not power the Pi and using a wall wart to power the Pi separately resolved my issues.



  • I had an issue like this that was caused by a poor ribbon cable connection between the Duet and Pi. It usually required that I reseat the cable before DWC would work again, but It was also somewhat unpredictable.

    I replaced the ribbon cable with some jumper wires and the issue hasn't returned.
    IMG_7832.JPG



  • @Wally Pi is powered by the Duet3 per the documentation.



  • @BryanH Thanks for the details. Suggest changing your main board jumpers and power the Pi independently (separate power source like an AC adapter) to see if that helps resolve.



  • @Wally Thanks for the suggestion but I was really hoping for some diagnostics / troubleshooting. Does anyone know if we can connect the Pi to the Duet3 via USB while in operation to capture the debug output on the USB port? I've already set the DSF / DWC to debug logging and will be trying some increasingly longer prints to see if I can reproduce the issue.


  • Moderator

    Trying to power the pi separately might be worthwhile. It would be interesting to see if that helped or not.

    It could also very likely be an issue with DSF. Duet3 + SBC is still one of the more cutting edge implementations.

    You may be able to get some more info via this technique to enable debugging

    https://duet3d.dozuki.com/Wiki/Getting_Started_With_Duet_3#Section_Monitoring_optional



  • @Phaedrux Thanks for the info - I'm running the debug monitoring now - Found what looks like an out of memory dump in the Pi's logs at or about the time of the last failure - Should I post it here?
    Thanks!


  • Moderator

    @BryanH said in Duet3+SBC print stops - DWC unresponsive -:

    Should I post it here?

    Sure.




  • Moderator

    Thanks. Perhaps @chrishamm will be able to take a look.



  • Great, that's from the kernel log - I'm pulling the syslog to see if there's anything there.




  • administrators

    Unfortunately there is no useful information in the logs you provided. Please start DuetControlServer with debug log level using a terminal on the Pi - just be aware that a print will be terminated once that is closed:

    sudo systemctl stop duetcontrolserver
    sudo /opt/dsf/bin/DuetControlServer -l debug
    

    I need to know what exactly is causing the hang. Did you observe any pattern when the hangs occurred? When it hangs, you should be able to run M122 "DSF" from the G-code console to get a diagnostics output only from DCS.

    If it really runs out of memory at some point, I need to reproduce that. Can you compress your system directory files plus the file you were trying to print?



  • Thank you, I'm running DCS in debug mode - and will be running jobs beginning tomorrow morning with the last jobs to see if I can reproduce the issue.


Log in to reply