Mysterious Motion Pauses, SPI Timeout Reset
-
Setup is two 6XDs with two 1LCs in SBC mode, all on 3.4.6. A print job was running, and I noticed the machine pause during printing moves when it shouldn't have paused. At the same time, the web interface disconnected on networked devices, and the Pi stopped responding to SSH and FTP connection attempts. I pulled up a terminal on the Pi's display and checked htop, but I didn't see excessive CPU or memory usage (around 1.2 average CPU utilization on a 4 core Pi, and only 1.38 GB/8 GB).
Shortly after, while trying to adjust a tool offset on the Pi, an SPI reset occurred. I then grabbed the M122 output:
=== Diagnostics === RepRapFirmware for Duet 3 MB6XD version 3.4.6 (2023-07-21 14:11:58) running on Duet 3 MB6XD v1.01 or later (SBC mode) Board ID: 0JD2M-999AL-D25SW-6JKD0-3S06R-95ZB1 Used output buffers: 6 of 40 (40 max) === RTOS === Static ram: 151692 Dynamic ram: 71880 of which 204 recycled Never used RAM 123224, free system stack 114 words Tasks: SBC(ready,59.5%,452) HEAT(notifyWait,16.6%,321) Move(notifyWait,106.8%,214) CanReceiv(notifyWait,67.6%,771) CanSender(notifyWait,2.0%,323) CanClock(delaying,6.8%,347) MAIN(running,247.2%,1045) IDLE(ready,9.3%,29), total 515.7% Owned mutexes: HTTP(MAIN) === Platform === Last reset 456:55:18 ago, cause: power up Last software reset details not available Error status: 0x04 Aux0 errors 0,4,0 Step timer max interval 82634 MCU temperature: min 38.4, current 48.2, max 50.7 Supply voltage: min 26.4, current 26.8, max 27.0, under voltage events: 0, over voltage events: 0, power good: yes 12V rail voltage: min 12.1, current 12.1, max 12.2, under voltage events: 0 Heap OK, handles allocated/used 99/46, heap memory allocated/used/recyclable 2048/1816/1072, gc cycles 27520 Events: 17 queued, 17 completed Driver 0: ok Driver 1: ok Driver 2: ok Driver 3: ok Driver 4: ok Driver 5: ok Date/time: 2024-01-23 13:49:37 Slowest loop: 3020.57ms; fastest: 0.03ms === Storage === Free file entries: 10 SD card 0 not detected, interface speed: 37.5MBytes/sec SD card longest read time 0.0ms, write time 0.0ms, max retries 0 === Move === DMs created 125, segments created 34, maxWait 655507868ms, bed compensation in use: none, comp offset 0.000 === MainDDARing === Scheduled moves 109087, completed 109087, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 499], CDDA state -1 === AuxDDARing === Scheduled moves 0, completed 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === Heat === Bed heaters 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamber heaters -1 -1 -1 -1, ordering errs 0 === GCodes === Segments left: 0 Movement lock held by null HTTP* is doing "M122" in state(s) 0 Telnet is idle in state(s) 0 File* is idle in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger* is idle in state(s) 0 Queue* is idle in state(s) 0 LCD is idle in state(s) 0 SBC* is idle in state(s) 0 Daemon* is idle in state(s) 0 0, running macro Aux2 is idle in state(s) 0 Autopause* is idle in state(s) 0 Code queue is empty === Filament sensors === Extruder 0: no data received Extruder 1: no data received === CAN === Messages queued 16646947, received 87191044, lost 0, boc 0 Longest wait 4ms for reply type 6013, peak Tx sync delay 702, free buffers 50 (min 48), ts 8224593/8224592/0 Tx timeouts 0,0,0,0,0,0 === SBC interface === Transfer state: 5, failed transfers: 0, checksum errors: 0 RX/TX seq numbers: 61718/4807 SPI underruns 0, overruns 0 State: 5, disconnects: 1, timeouts: 1 total, 1 by SBC, IAP RAM available 0x2b584 Buffer RX/TX: 0/0-0, open files: 0 === Duet Control Server === Duet Control Server v3.4.6 Daemon: Executing macro daemon.g, started by system > Next stack level Code buffer space: 4096 Configured SPI speed: 8000000Hz, TfrRdy pin glitches: 15 Full transfers per second: 39.40, max time between full transfers: 109.5ms, max pin wait times: 99.2ms/50.6ms Codes per second: 0.97 Maximum length of RX/TX data transfers: 4664/1732
Config file here: https://github.com/HartSmart-Products/HSP1-I-SD-Image/blob/main/sys/config.g
As it may be relevant, a daemon.g file is present on this machine that is not in the repo:
var daemon_sleep = 2 if !exists(global.heater_timer) global heater_timer = 0 while true if (state.status == "idle" || state.status == "simulating") var initial_time = state.upTime var heaters_active = false while iterations < #heat.bedHeaters if heat.bedHeaters[iterations] !=-1 if heat.heaters[iterations].state == "active" || heat.heaters[iterations].state == "standby" set var.heaters_active = true if global.heater_timer >= global.bed_heater_timeout M140 S-273.1 while iterations < #tools var current_tool = iterations while true if iterations >= #tools[var.current_tool].heaters break if heat.heaters[tools[var.current_tool].heaters[iterations]].state == "active" || heat.heaters[tools[var.current_tool].heaters[iterations]].state == "standby" set var.heaters_active = true if global.heater_timer >= global.tool_heater_timeout M568 P{var.current_tool} A0 if var.heaters_active var elapsed_time = state.upTime - var.initial_time if var.elapsed_time < 0 set var.elapsed_time = 0 set global.heater_timer = global.heater_timer + var.daemon_sleep + var.elapsed_time else set global.heater_timer = 0 else set global.heater_timer = 0 G4 S{var.daemon_sleep}
-
@curieos Try exchanging the microSD card with an A2-rated card. The disconnect was caused by the SBC timing out, which is typically caused by excessive load or a too slow/failing SD card. You also have a couple of TfrRdy glitches, so it might be a good idea to check again if all the metal parts are properly grounded.
-
@chrishamm I believe we have an A1 rated card installed currently. According to this thread (slightly old now) on the pi forums, not all A2/A1 cards are created equally, and the Pi 4 can't take full advantage of an A2 card. In either case, we can definitely get a faster card, but the one we have installed isn't that bad, according to this diagnostic I ran today.
Raspberry Pi Diagnostics - version 0.9 Wed Jan 24 09:08:23 2024 Test : SD Card Speed Test Run 1 prepare-file;0;0;23970;46 seq-write;0;0;19889;38 rand-4k-write;0;0;3433;858 rand-4k-read;9212;2303;0;0 Sequential write speed 19889 KB/sec (target 10000) - PASS Random write speed 858 IOPS (target 500) - PASS Random read speed 2303 IOPS (target 1500) - PASS Test PASS
@chrishamm said in Mysterious Motion Pauses, SPI Timeout Reset:
You also have a couple of TfrRdy glitches, so it might be a good idea to check again if all the metal parts are properly grounded.
Are you referring to frame components?
-
@curieos Do you have any other software on the Pi? It sounds like a load spike which messed with DSF causing the disconnect.
dmesg
orjournalctl
might reveal something, too.Yes, I mean the metal frame. You're getting a few TfrRdy pin glitches and checking if everything is grounded may help.
-
@chrishamm ustreamer, the PrintWatch plugin, and Onboard are all running. We had some issues with the PrintWatch plugin causing load spikes, but I've been talking with the dev team there and those issues should be resolved. We have another, identical, machine with the same apps and plugins that's been fine. This is the first time we've had this issue since the old PrintWatch plugin.
The one minute average in htop was roughly 1.2-1.5 at the time of the disconnect (I opened the terminal before the disconnect to check the load when I noticed the machine pausing). I saw an occasional 1 core load spike from the PrintWatch plugin (every 10 seconds it captures a frame of the camera stream to send out and renders a low quality preview for the web UI), but not anything approaching a full load situation.
I restarted the Pi since the network connection dropped, so unfortunately I can't grab the logs from those services.
I'll double check the gantry grounding.
edit: I checked, and the gantry does have continuity with the frame, and the frame has continuity with the chassis of the power supplies, and they have continuity with earth ground. Granted, the continuity of the gantry is through the motion components. If this is a problem I can get a ground wire run to the gantry and carriages.
-
@curieos said in Mysterious Motion Pauses, SPI Timeout Reset:
I restarted the Pi since the network connection dropped, so unfortunately I can't grab the logs from those services.
Can you access the pi's log folder in
/var/log
?can you check
cat /var/log/syslog | grep SPI
or
cat /var/log/daemon.log | grep SPI
if it will return a line like
Jan 20 01:39:48 <Hostname> DuetControlServer[16087]: [warn] SPI connection has been reset
can you go to the time in the logfile and post the lines before the
SPI connection has been reset
or just upload the whole logfile e.g. syslog or daemon.log -
@curieos You may also find that disabling WiFi power saving improves things, see https://github.com/Duet3D/DuetSoftwareFramework/wiki/SBC-Setup-Guide#disable-wifi-power-saving-on-the-raspberry-pi
-
@timschneider I'll grab that as soon as I can. That machine is offline for an unrelated issue ATM.
@chrishamm I'll try that in addition to the SD card and discrete ground wiring. We actually just got in a SanDisk Extreme PRO card rated A2 yesterday, and I already have the previous A1 card cloned onto it.
I did notice that it says "unless you are using a recent DuetPi build (after 06/07/2022), it is recommended to disable it". We are running what I believe is the latest build (2023-09-06), is this already done/would it make a difference?
-
@curieos
Gotta run it?