SPI connection reset on Duet 3 Mini 5+
-
@sgomes one for @chrishamm
Ian
-
@droftarts Thank you again for your help!
While I can't run a test print right now, I think I may have figured it out, thanks to your earlier hypotheses. And if so, it's a doozy...
I think that the issue is that the extruder wire loom was too close to the X axis stepper motor, and the motor may have been causing EM interference that propagated back to the board, resetting it during certain moves...
I tried manually moving the X axis back and forth while the loom was distant from the motor and I didn't observe a single reset. But if I set it back to where it was (right up against the housing), sure enough, I was easily able to get a reset after a bit.
This would explain why the resets only ever happened during moves, and why they started all of a sudden — I must have moved the electronics box slightly, and with it the loom.
I'll be sure to follow up tomorrow, when I get the chance to run a full test print with the loom away from the X stepper!
-
@droftarts @chrishamm Unfortunately, it looks like I was wrong EM interference on the extruder wires from the motor doesn't seem to be the issue.
The test print has failed with a similar error, in a different location, during the first layer. Log follows:
May 15 08:43:25 CR10S-DHR DuetControlServer[390]: [debug] File: Finished code G1 X257.978 Y248.068 E780.18079 => May 15 08:43:25 CR10S-DHR DuetControlServer[390]: [debug] File: Starting code G1 X257.978 Y258.88 E974.09475 May 15 08:43:25 CR10S-DHR DuetControlServer[390]: [debug] Processing G1 X257.978 Y258.88 E974.09475 May 15 08:43:25 CR10S-DHR DuetControlServer[390]: [debug] File: Finished code G1 X257.978 Y248.969 E780.22852 => May 15 08:43:25 CR10S-DHR DuetControlServer[390]: [debug] File: Finished code M73 P7 R239 => May 15 08:43:25 CR10S-DHR DuetControlServer[390]: [debug] File: Starting code G1 X257.978 Y259.781 E974.14248 May 15 08:43:25 CR10S-DHR DuetControlServer[390]: [debug] File: Starting code G1 X42.022 Y43.824 E990.32117 May 15 08:43:25 CR10S-DHR DuetControlServer[390]: [debug] Processing G1 X257.978 Y259.781 E974.14248 May 15 08:43:25 CR10S-DHR DuetControlServer[390]: [debug] Processing G1 X42.022 Y43.824 E990.32117 May 15 08:43:25 CR10S-DHR DuetControlServer[390]: [debug] File: Sent G1 X257.978 Y258.88 E974.09475, remaining space 120, needed 48 May 15 08:43:25 CR10S-DHR DuetControlServer[390]: [debug] File: Sent G1 X257.978 Y259.781 E974.14248, remaining space 72, needed 48 May 15 08:43:25 CR10S-DHR DuetControlServer[390]: [debug] File: Sent G1 X42.022 Y43.824 E990.32117, remaining space 24, needed 48 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X45.398 Y36.388 E796.15429 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X45.071 Y36.388 E796.17161 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code M73 P8 R239 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X44.548 Y36.44 E796.19945 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X257.978 Y249.87 E812.18886 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X257.978 Y250.771 E812.23659 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X43.841 Y36.633 E828.27901 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X43.341 Y36.901 E828.30906 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X43.266 Y36.96 E828.31412 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X257.978 Y251.672 E844.39958 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X257.978 Y252.573 E844.44731 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X42.791 Y37.386 E860.56835 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X42.711 Y37.47 E860.57449 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X42.416 Y37.912 E860.60264 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X257.978 Y253.474 E876.75177 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X257.978 Y254.375 E876.7995 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code M73 P8 R238 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X42.146 Y38.543 E892.96886 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X42.058 Y38.958 E892.99133 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X42.029 Y39.327 E893.01094 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X257.978 Y255.276 E909.18907 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X257.978 Y256.177 E909.2368 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X42.022 Y40.22 E925.41549 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X42.022 Y41.121 E925.46322 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X257.978 Y257.078 E941.64191 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X257.978 Y257.979 E941.68964 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X42.022 Y42.022 E957.86833 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X42.022 Y42.923 E957.91606 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code M73 P8 R237 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X257.978 Y258.88 E974.09475 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X257.978 Y259.781 E974.14248 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] File: Cancelled code G1 X42.022 Y43.824 E990.32117 May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [warn] SPI connection has been reset May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [info] Aborted job file May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [info] Starting macro file config.g on channel Trigger May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] Trigger: Starting code M550 P"CR10S-DHR" (macro code) May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] Trigger: Starting code M905 P"2024-05-15" S"08:43:29" (macro code) May 15 08:43:29 CR10S-DHR DuetControlServer[390]: [debug] Trigger: Starting code ; Configuration file for RepRapFirmware on Duet 3 Mini 5+ WiFi (macro code)
-
@sgomes Can you please share the M122 output after the reset? Your first dump doesn't show an unexpected reset reason or a disconnect event.
What kind of microSD card are you using in the SBC? If it is only a standard microSD card, consider replacing it with an A1- or A2-rated microSD card.
-
@chrishamm Thank you, @chrishamm!
Here's the M122 dump, although it doesn't look very useful either to my untrained eye:
M122 === Diagnostics === RepRapFirmware for Duet 3 Mini 5+ version 3.5.1 (2024-04-19 14:41:25) running on Duet 3 Mini5plus WiFi (SBC mode) Board ID: 75ZD9-UQ6KL-K65J0-409N0-3J02Z-H2B5T Used output buffers: 1 of 40 (17 max) === RTOS === Static ram: 103232 Dynamic ram: 106468 of which 0 recycled Never used RAM 31716, free system stack 208 words Tasks: SBC(2,ready,2.2%,431) HEAT(3,nWait 1,0.0%,353) Move(4,nWait 6,0.0%,355) CanReceiv(6,nWait 1,0.0%,940) CanSender(5,nWait 7,0.0%,336) CanClock(7,delaying,0.0%,334) TMC(4,delaying,0.8%,111) MAIN(2,running,95.1%,762) IDLE(0,ready,0.9%,30) AIN(4,delaying,0.8%,265), total 100.0% Owned mutexes: HTTP(MAIN) === Platform === Last reset 00:00:29 ago, cause: reset button Last software reset at 2024-05-13 16:39, reason: User, Gcodes spinning, available RAM 31716, slot 1 Software reset code 0x6003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x00000000 BFAR 0xe000ed38 SP 0x00000000 Task SBC Freestk 0 n/a Error status: 0x00 MCU revision 3, ADC conversions started 30188, completed 30187, timed out 0, errs 0 MCU temperature: min 37.1, current 37.2, max 38.3 Supply voltage: min 12.1, current 12.1, max 12.1, under voltage events: 0, over voltage events: 0, power good: yes Heap OK, handles allocated/used 0/0, heap memory allocated/used/recyclable 0/0/0, gc cycles 0 Events: 0 queued, 0 completed Driver 0: standstill, SG min 184, read errors 0, write errors 1, ifcnt 36, reads 1550, writes 14, timeouts 0, DMA errors 0, CC errors 0 Driver 1: standstill, SG min 176, read errors 0, write errors 1, ifcnt 35, reads 1550, writes 14, timeouts 0, DMA errors 0, CC errors 0 Driver 2: standstill, SG min 2, read errors 0, write errors 1, ifcnt 31, reads 1550, writes 14, timeouts 0, DMA errors 0, CC errors 0 Driver 3: standstill, SG min 0, read errors 0, write errors 1, ifcnt 30, reads 1551, writes 13, timeouts 0, DMA errors 0, CC errors 0 Driver 4: standstill, SG min 2, read errors 0, write errors 1, ifcnt 31, reads 1550, writes 14, timeouts 0, DMA errors 0, CC errors 0 Driver 5: not present Driver 6: not present Date/time: 2024-05-15 08:43:58 Cache data hit count 67000041 Slowest loop: 3.95ms; fastest: 0.15ms === Storage === Free file entries: 20 SD card 0 not detected, interface speed: 0.0MBytes/sec SD card longest read time 0.0ms, write time 0.0ms, max retries 0 === Move === DMs created 83, segments created 0, maxWait 0ms, bed compensation in use: none, height map offset 0.000, max steps late 0, min interval 0, bad calcs 0, ebfmin 0.00, ebfmax 0.00 no step interrupt scheduled Moves shaped first try 0, on retry 0, too short 0, wrong shape 0, maybepossible 0 === DDARing 0 === Scheduled moves 0, completed 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === DDARing 1 === Scheduled moves 0, completed 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === Heat === Bed heaters 0 -1 -1 -1, chamber heaters -1 -1 -1 -1, ordering errs 0 === GCodes === Movement locks held by null, null HTTP* is doing "M122" in state(s) 0 Telnet is idle in state(s) 0 File is idle in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger* is idle in state(s) 0 Queue is idle in state(s) 0 LCD is idle in state(s) 0 SBC is idle in state(s) 0 Daemon is idle in state(s) 0 Aux2 is idle in state(s) 0 Autopause is idle in state(s) 0 File2 is idle in state(s) 0 Queue2 is idle in state(s) 0 Q0 segments left 0, axes/extruders owned 0x0000000 Code queue 0 is empty Q1 segments left 0, axes/extruders owned 0x0000000 Code queue 1 is empty === CAN === Messages queued 265, received 0, lost 0, errs 140194, boc 0 Longest wait 0ms for reply type 0, peak Tx sync delay 0, free buffers 26 (min 26), ts 150/0/0 Tx timeouts 0,0,149,0,0,114 last cancelled message type 30 dest 127 === SBC interface === Transfer state: 5, failed transfers: 0, checksum errors: 0 RX/TX seq numbers: 28084/2593 SPI underruns 0, overruns 0 State: 5, disconnects: 0, timeouts: 0 total, 0 by SBC, IAP RAM available 0x0d3f4 Buffer RX/TX: 0/0-0, open files: 0 === Duet Control Server === Duet Control Server version 3.5.1 (2024-04-19 16:20:35, 32-bit) HTTP+Executed: > Executing M122 Code buffer space: 4096 Configured SPI speed: 8000000Hz, TfrRdy pin glitches: 14 Full transfers per second: 3.23, max time between full transfers: 932.0ms, max pin wait times: 880.8ms/15.0ms Codes per second: 0.02 Maximum length of RX/TX data transfers: 4436/1204
As for the SD card, it's a 16GB SanDisk Edge, with a class 10, A1 rating, identical to this one.
-
@sgomes said in SPI connection reset on Duet 3 Mini 5+:
Configured SPI speed: 8000000Hz, TfrRdy pin glitches: 14
@chrishamm Is the number of TfrRdy pin glitches significant? I had a quick look at some other M122 reports in SBC mode, and they all report 0 for this.
Ian
-
@sgomes @droftarts I find it rather odd that the disconnect counters do not show anything, if there had been a protocol error, at least one of the counters should have been increased. Did you reset the board straight after the SPI connection reset? Or do you have a physical reset button connected anywhere? I'd suggest to disconnect that (if present) or check if the physical reset button does not make intermittent contact, because a reset button press is reported as the board's last reset reason.
If that does not change anything, you could replace the ribbon cable and/or shield it, and check if that helps.
-
@chrishamm said in SPI connection reset on Duet 3 Mini 5+:
@sgomes @droftarts Did you reset the board straight after the SPI connection reset?
No, I'm afraid all I did was run M122 once I got a chance. The resets happen on their own, and as far as I can tell they're simultaneous with the fault. Or, to put it a different way, I suppose the fault manifests as a reset, with similar logs to the ones I've shared here every time.
Or do you have a physical reset button connected anywhere? I'd suggest to disconnect that (if present) or check if the physical reset button does not make intermittent contact, because a reset button press is reported as the board's last reset reason.
I don't have a physical reset button, no; I didn't even know that it was possible to hook one up. I've never used the one built into the board either, since I control everything via the SBC.
If that does not change anything, you could replace the ribbon cable and/or shield it, and check if that helps.
I'm using the stock ribbon cable that came with the Duet 3 Mini 5+. If I'm to try replacing it, should I get a short, standard 26-pin ribbon cable, or should I look for something more specific?
I'm happy to try some shielding as well, but I don't have any know-how there. Should I wrap some sort of wire mesh around the ribbon cable and earth said mesh?
-
@sgomes And the built-in reset button next to the USB jack has enough clearance?
-
@chrishamm said in SPI connection reset on Duet 3 Mini 5+:
@sgomes And the built-in reset button next to the USB jack has enough clearance?
Yes, there should plenty of clearance. I have the board mounted vertically with that side on the bottom, but I did leave a 2mm gap between the board and the case floor.
Could the reset button get triggered because of EM interference? Just in case, I've rerouted a few wires inside the electronics box and will attempt some more test prints later today.
-
@droftarts @chrishamm Good news! I just managed to finish a 4.5hr print, which came out flawlessly I'd never been able to print that long without a failure, so I'm hopeful that's the end of that!
Edit: And I've now finished a 12hr print as well!
Perhaps the wires I moved (for the 5V inductive probe, IIRC) were somehow triggering the reset button through EM interference?
Oh, and for good measure, I ran M122 after the print finished, and it reported
TfrRdy pin glitches: 67
. Hopefully that won't be a problem in the future, but I suppose I can always try a new ribbon cable if it comes to that.Thank you both for your help!
-