SDCard errors - Duet3, various 3.01 betas



  • (I'm posting this in a beta section, but I don't think this is an issue limited to the beta firmware I'm using.)

    I'm using a v1 Duet3 in standalone mode with various RRF 3.01 betas. Twice now in the past couple of weeks I've had aborted prints when the duet claimed (on the paneldue) that it experienced a read error from the SDcard. The prints don't do anything fancy with macros - just plain cura generated gcode.

    The error is "Error: Cannot read file, error code 1"

    The first time this happened it almost acted as if I had completely removed the sdcard. I couldn't even access the duet via DWC. (The second time, I didn't even bother trying.)

    In both cases, simply rebooting the duet3 board (power off/on) "fixed" the problem. The first time, I replaced the sdcard to be sure, but now that I've seen this twice on cards I know to be good, I'm starting to think it might be a board or firmware issue.

    Keeping in mind that I can't access the duet3 from DWC when this failure occurs, if (when) it happens again, is there something I can do from the PanelDue to try and get more diagnostic information to help troubleshoot or resolve the issue?

    (I know that if I were to use the Duet3 with a SBC, this likely wouldn't be a problem, but right now, using the duet3 with a SBC would be a significant DOWNGRADE. I'd lose some paneldue functionality, I'd lose conditional gcode, etc. )

    Thanks
    Gary



  • @garyd9 said in SDCard errors - Duet3, various 3.01 betas:

    PanelDue

    M39 perhaps, (as well as the usual M122 but probably less useful on the paneldue)?


  • administrators

    Can you connect a PC via USB and get a M122 report next time it happens?



  • @dc42 I will try... Can just leave a Raspberry Pi permanently attached via USB? I have one next to the printer to act as a WiFi bridge, camera (and will get connected with DSF eventually.)


  • administrators

    @garyd9 said in SDCard errors - Duet3, various 3.01 betas:

    @dc42 I will try... Can just leave a Raspberry Pi permanently attached via USB? I have one next to the printer to act as a WiFi bridge, camera (and will get connected with DSF eventually.)

    Yes, as long as you avoid the problems of USB ground loops.



  • @dc42 - I'm bringing this thread back "from the dead" as I've had the issue re-occur on a different Duet3 board running standalone with RC7. After a print finished, the console showed the following:

    4/14/2020, 3:59:30 PM	Finished printing file 0:/gcodes/PETG/No Fan/6x Surgical_Mask_Ear_Strain_Relief (faster).gcode, print time was 1h 51m
    Error: Cannot read file, error code 1
    Error: Failed to read footer from G-Code file "0:/gcodes/PETG/No Fan/6x Surgical_Mask_Ear_Strain_Relief (faster).gcode"
    

    I was able to run M122 from DWC that was still attached, and got the following:

    4/14/2020, 4:19:10 PM	m122
    === Diagnostics ===
    RepRapFirmware for Duet 3 MB6HC version 3.01-RC7 running on Duet 3 MB6HC v0.6 or 1.0
    Board ID: 08DJM-956L2-G43S8-6J1FL-3SN6M-1S0GG
    Used output buffers: 1 of 40 (24 max)
    === RTOS ===
    Static ram: 154580
    Dynamic ram: 161480 of which 64 recycled
    Exception stack ram used: 592
    Never used ram: 76500
    Tasks: NETWORK(ready,84) ETHERNET(blocked,424) HEAT(blocked,1084) CanReceiv(suspended,3824) CanSender(suspended,1396) CanClock(blocked,1428) TMC(blocked,72) MAIN(running,1084) IDLE(ready,80)
    Owned mutexes:
    === Platform ===
    Last reset 04:49:08 ago, cause: software
    Last software reset time unknown, reason: User, spinning module GCodes, available RAM 78448 bytes (slot 0)
    Software reset code 0x0003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0444a000 BFAR 0x00000000 SP 0xffffffff Task 0x4e49414d
    Error status: 0
    Free file entries: 10
    SD card 0 detected, interface speed: 25.0MBytes/sec
    SD card longest block write time: 142.7ms, max retries 2
    MCU temperature: min 40.0, current 45.6, max 45.8
    Supply voltage: min 23.2, current 23.3, max 24.3, under voltage events: 0, over voltage events: 0, power good: yes
    12V rail voltage: min 12.0, current 12.1, max 12.2, under voltage events: 0
    Driver 0: standstill, reads 59130, writes 47 timeouts 0, SG min/max 0/1023
    Driver 1: standstill, reads 59130, writes 47 timeouts 0, SG min/max 0/1023
    Driver 2: standstill, reads 59131, writes 47 timeouts 0, SG min/max 0/1023
    Driver 3: standstill, reads 59140, writes 39 timeouts 0, SG min/max 0/1023
    Driver 4: standstill, reads 59168, writes 11 timeouts 0, SG min/max 0/0
    Driver 5: standstill, reads 59169, writes 11 timeouts 0, SG min/max 0/0
    Date/time: 2020-04-14 16:19:07
    Slowest loop: 1167.98ms; fastest: 0.15ms
    === Move ===
    Hiccups: 0(0), FreeDm: 375, MinFreeDm: 299, MaxWait: 1655210ms
    Bed compensation in use: mesh, comp offset 0.000
    === MainDDARing ===
    Scheduled moves: 11, completed moves: 11, StepErrors: 0, LaErrors: 0, Underruns: 0, 30  CDDA state: -1
    === AuxDDARing ===
    Scheduled moves: 0, completed moves: 0, StepErrors: 0, LaErrors: 0, Underruns: 0, 0  CDDA state: -1
    === Heat ===
    Bed heaters = 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1
    Heater 0 is on, I-accum = 0.3
    Heater 1 is on, I-accum = 0.4
    === GCodes ===
    Segments left: 0
    Movement lock held by null
    HTTP is idle in state(s) 0
    Telnet is idle in state(s) 0
    File is idle in state(s) 0
    USB is idle in state(s) 0
    Aux is idle in state(s) 0
    Trigger is idle in state(s) 0
    Queue is idle in state(s) 0
    LCD is idle in state(s) 0
    SBC is idle in state(s) 0
    Daemon is idle in state(s) 0
    Autopause is idle in state(s) 0
    Code queue is empty.
    === Network ===
    Slowest loop: 299.81ms; fastest: 0.03ms
    Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(1) Telnet(0)
    HTTP sessions: 2 of 8
    - Ethernet -
    State: active
    Error counts: 0 0 0 0 0
    Socket states: 2 2 2 2 2 0 0 3
    === CAN ===
    Messages sent 69606, longest wait 0ms for type 0
    === Linux interface ===
    State: 0, failed transfers: 0
    Last transfer: 17348707ms ago
    RX/TX seq numbers: 0/1
    SPI underruns 0, overruns 0
    Number of disconnects: 0
    Buffer RX/TX: 0/0-0
    

    Attempts to navigate into a gcode folder to start a print resulted in DWC complaining that the directory (folder) wasn't found. Oddly, it displayed the contents of the root folder just fine. Similar issue in the macro's section. I tried to "edit" a macro in the root directory of the macros (as displayed in DWC), but got the following error:

    4/14/2020, 4:20:05 PM	Failed to download LEDs Half On
    File 0:/macros/probe_tests/LEDs Half On not found
    

    (What's odd about that is that my "LEDs Half On" macro isn't in a "probe_tests" subdirectory - perhaps DWC was confused about which directory I was in based on an earlier attempt to change subdirectories failing?)

    This was at a somewhat painful time, as I've been rapidly printing plates full of surgical mask ear protector things (to help support the worldwide coronavirus effort.) I tried to manually remove and re-insert the sdcard (hoping that the duet would remount it), but once the card was removed, I got a message in the console and it wouldn't re-mount it. I ended up having to reboot the duet3. That, in turn, required having to unload filament and clean the nozzle (so there was nothing dripping from the nozzle), and recalibrating (all so I could re-load the mesh compensation heightmap - because it doesn't like to load the heightmap after a simple G28.)

    So, in addition to this bug report, I'd also like to request a new gcode command: something to manually re-mount the sdcard without having to reboot the board.



  • It happened again... in the middle of a print. 😞 Oddly, I didn't notice until several minutes after the error occurred, and at that point I was able to read directories, and even download the gcode file (from the duet3 standalone) that supposedly had a read error.

    4/14/2020, 6:47:37 PM	Error: Cannot read file, error code 1
    Cancelled printing file 0:/gcodes/PETG/No Fan/6x Surgical_Mask_Ear_Strain_Relief (faster).gcode, print time was 0h 22m
    

    M122:

    4/14/2020, 6:51:32 PM	m122
    === Diagnostics ===
    RepRapFirmware for Duet 3 MB6HC version 3.01-RC7 running on Duet 3 MB6HC v0.6 or 1.0
    Board ID: 08DJM-956L2-G43S8-6J1FL-3SN6M-1S0GG
    Used output buffers: 1 of 40 (23 max)
    === RTOS ===
    Static ram: 154580
    Dynamic ram: 161272 of which 64 recycled
    Exception stack ram used: 512
    Never used ram: 76788
    Tasks: NETWORK(ready,84) ETHERNET(blocked,444) HEAT(blocked,1100) CanReceiv(suspended,3824) CanSender(suspended,1432) CanClock(blocked,1428) TMC(blocked,72) MAIN(running,4532) IDLE(ready,80)
    Owned mutexes:
    === Platform ===
    Last reset 02:24:25 ago, cause: software
    Last software reset at 2020-04-14 16:27, reason: User, spinning module GCodes, available RAM 76500 bytes (slot 1)
    Software reset code 0x0003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0444a000 BFAR 0x00000000 SP 0xffffffff Task 0x4e49414d
    Error status: 0
    Free file entries: 10
    SD card 0 detected, interface speed: 25.0MBytes/sec
    SD card longest block write time: 0.0ms, max retries 3
    MCU temperature: min 44.8, current 44.9, max 46.0
    Supply voltage: min 23.2, current 24.0, max 24.1, under voltage events: 0, over voltage events: 0, power good: yes
    12V rail voltage: min 12.1, current 12.1, max 12.2, under voltage events: 0
    Driver 0: standstill, reads 25675, writes 2 timeouts 0, SG min/max 0/1023
    Driver 1: standstill, reads 25675, writes 2 timeouts 0, SG min/max 0/1023
    Driver 2: standstill, reads 25675, writes 2 timeouts 0, SG min/max 0/1023
    Driver 3: standstill, reads 25675, writes 2 timeouts 0, SG min/max 0/1023
    Driver 4: standstill, reads 25676, writes 0 timeouts 0, SG min/max not available
    Driver 5: standstill, reads 25676, writes 0 timeouts 0, SG min/max not available
    Date/time: 1970-01-01 00:00:00
    Slowest loop: 1403.46ms; fastest: 0.15ms
    === Move ===
    Hiccups: 0(0), FreeDm: 375, MinFreeDm: 335, MaxWait: 0ms
    Bed compensation in use: mesh, comp offset 0.000
    === MainDDARing ===
    Scheduled moves: 0, completed moves: 26, StepErrors: 0, LaErrors: 0, Underruns: 0, 5  CDDA state: -1
    === AuxDDARing ===
    Scheduled moves: 0, completed moves: 0, StepErrors: 0, LaErrors: 0, Underruns: 0, 0  CDDA state: -1
    === Heat ===
    Bed heaters = 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1
    === GCodes ===
    Segments left: 0
    Movement lock held by null
    HTTP is idle in state(s) 0
    Telnet is idle in state(s) 0
    File is idle in state(s) 0
    USB is idle in state(s) 0
    Aux is idle in state(s) 0
    Trigger is idle in state(s) 0
    Queue is idle in state(s) 0
    LCD is idle in state(s) 0
    SBC is idle in state(s) 0
    Daemon is idle in state(s) 0
    Autopause is idle in state(s) 0
    Code queue is empty.
    === Network ===
    Slowest loop: 263.47ms; fastest: 0.03ms
    Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(1) Telnet(0)
    HTTP sessions: 2 of 8
    - Ethernet -
    State: active
    Error counts: 0 0 0 0 0
    Socket states: 5 2 2 2 2 0 0 3
    === CAN ===
    Messages sent 3884, longest wait 0ms for type 0
    === Linux interface ===
    State: 0, failed transfers: 0
    Last transfer: 8665601ms ago
    RX/TX seq numbers: 0/1
    SPI underruns 0, overruns 0
    Number of disconnects: 0
    Buffer RX/TX: 0/0-0
    

    M39:

    4/14/2020, 6:52:59 PM	m39
    SD card in slot 0: capacity 32.03Gb, free space 30.59Gb, speed 25.00MBytes/sec, cluster size 32kb
    


  • It happened a third time on a different .gcode file. I didn't bother with another M122 dump. Just scroll up to the previous 2 messages for that info. It's the same. While I have seen this error once in a while, I've never seen it three times in a single night. I won't refuse the possibility of the sdcard failing, but it seems odd that it's so intermittent.

    One thing that bothers me is that I've seen these types of failures on a previous duet3 board with a different sdcard. It could be a coincidence, or it might be a firmware (or hardware) issue. That would imply that other people would be hitting the same issue... (assuming they are using the duet3 in stand-alone mode.)

    The card being used now is a 32G Samsung "EVO+" (different from the "EVO Plus") formatted with a cluster size of 32kb.

    The card I was using when I first opened this thread a couple months ago was probably a sandisk ultra or sandisk extreme.

    In all cases, rebooting the duet (or M999) makes the card (and entire .gcode file) readable again (for a while, at least.) I've ordered another pair of SDCards, but they won't be delivered until late Friday. In the meantime, I'm going to try a little 16GB "PNY" microsd card.


  • Moderator

    Are you using the SD card formatter tool?


  • administrators

    @garyd, thanks for your reports.

    I notice that in each M122 report, the SD card is reported as being detected and the interface speed is as expected, but there is a nonzero SD card retries count. This is very unusual. So it appears to me that communication with the SD card is breaking down.

    Some suggestions to help me diagnose this:

    1. Run M122 after each print (or even during prints) to see whether any SD card retries have been recorded. AFAIR the retry count is reset to 0 after running M122.

    2. Next time the problem occurs, try sending M21 to re-mount the card. If that doesn't work, try M22 followed by M21.

    I will look at what action we take when a SD card transfer fails and whether we can do more hardware resetting of the SD card interface when this happens. I will also set up a continuous simulation on one of my machines to see if I can reproduce this.



  • @Phaedrux said in SDCard errors - Duet3, various 3.01 betas:

    Are you using the SD card formatter tool?

    I use the one suggested by the duet wiki. I also turn off the "quick format" option in most cases to ensure every byte on the card is touched (so that any errors are seen during the format.)
    c2678590-e9b2-4c73-a674-34d4741a1767-image.png

    @dc42 said in SDCard errors - Duet3, various 3.01 betas:

    Next time the problem occurs, try sending M21 to re-mount the card. If that doesn't work, try M22 followed by M21.

    Thank you.

    I feel like an idiot. I searched frantically last night for some gcode I could use to remount the SDcard, but didn't find M21/M22. (I searched the gcode list for "mount", "unmount", and "sdcard", but never for "sd card".)

    I might get a chance to test it shortly. I don't have high confidence in that cheap "PNY" SD Card, and I got a request for even more of those mask ear things. (1000 mm tall delta printer, and it's printing hundreds of 1.2mm tall things.)


  • Moderator

    @garyd9 You've got underruns reported in both of the M122 reports:

    === MainDDARing ===
    Scheduled moves: 11, completed moves: 11, StepErrors: 0, LaErrors: 0, Underruns: 0, 30  CDDA state: -1
    

    and

    === MainDDARing ===
    Scheduled moves: 0, completed moves: 26, StepErrors: 0, LaErrors: 0, Underruns: 0, 5  CDDA state: -1
    

    As I've learnt from this thread https://forum.duet3d.com/topic/15421/duet-2-05-memory-leak/22?_=1586962745614 underruns can be symptomatic of an underperforming/failing SD card, unable to keep up supplying the planner with gcode.

    @kazolar I think the first value isn't a warning, just an indication that the lookahead function couldn't do something (not sure what) with the time given. It doesn't slow down the print, but is likely not ideal. The second number is a prepare move underrun, which means that the move could not be prepared in time and so the movement must wait. This is much worse than the first one.

    Another way to test the SD card is to send M122 P104 S[file size in MB], and the response should be between 2 and 2.5Mbytes/sec. For me: Duet 2 WiFi - 2.23Mbytes/sec, Duet Maestro 2.42Mbytes/sec for a 10MB file.

    Ian



  • @droftarts said in SDCard errors - Duet3, various 3.01 betas:

    Another way to test the SD card is to send M122 P104 S[file size in MB]

    That's interesting. I never knew it existed. I don't know if it will help with my particular situation (where the card works fine, but suddenly acts as if it was completely disconnected - until I reset the duet board.)

    Regardless, any tools are better than none. Even if the test seems useless for my situation, it's another diagnostic tool that might help. Thank you.

    I did run it against the cheap PNY card I'm using temporarily, and the response was "SD write speed for 10.0Mbyte file was 2.17Mbytes/sec" If I get a chance to re-insert the Samsung card, I'll run it against that to see if there's any difference. Perhaps if I run it several times for very large sizes, it will re-create the error condition.


  • Moderator

    @garyd9 the only other things I can think of that it might be are a temperamental SD socket, ie poorly soldered, or dirty/ dust on the contacts perhaps.

    Ian



  • @droftarts said in SDCard errors - Duet3, various 3.01 betas:

    @garyd9 the only other things I can think of that it might be are a temperamental SD socket, ie poorly soldered, or dirty/ dust on the contacts perhaps.

    Ian

    Honestly, I'm hoping that when my new microSD card comes Friday, everything will work fine for years.

    (I don't want to say anything about what's going on with the cheap PNY card. I full expect it to fail, and if I express any type of hope regarding it, I might curse that failure into happening before the replacement arrives.)



  • @garyd9 said in SDCard errors - Duet3, various 3.01 betas:

    I full expect it to fail

    Doing a full format, preferably with the sd formatter tool, can help stability, at least for a while. On the other hand Murphys law..



  • @bearer said in SDCard errors - Duet3, various 3.01 betas:

    Doing a full format, preferably with the sd formatter tool, can help stability, at least for a while. On the other hand Murphys law..

    ...

    @garyd9 said in SDCard errors - Duet3, various 3.01 betas:

    I use the one suggested by the duet wiki. I also turn off the "quick format" option in most cases to ensure every byte on the card is touched (so that any errors are seen during the format.)

    As for Murphy's Law, that also dictates that if the copy machine repairman is on the premises (dc42 is watching this thread), that the copy machine will work perfectly.


Log in to reply