Dead driver or dying board?
-
@dc42 m18/m17 doesn't fix it -- it does allow me to move the axis freely by hand, so stepper does release. E-stop also disables the stepper. When I try to home the axis, the stepper on driver 5 locks up again. Only the power cycle clears the gremlins (seemingly for a rather short period of time now). I see no physical issues near the driver, I didn't take the board out to examine under my micro soldering scope, but visual inspection in situ doesn't raise any suspicions.
Yes after I paused the printer after the last time the problem occurred M122 said all drivers were OK, which is why this is weird.Can you suggest how I can add this driver back into config where it's not part of kinematics. Can it be the 3rd stepper of my Y axis? Does it need it's own end stop? I can plug in a spare stepper -- have it sit on the side during the print and watch for it to lock up. Putting it back into the kinematics is not ideal for a test, it's a nema 23 with a lot of holding torque. My Y gantries are designed to allow for some flex to self square, but this behavior is very violent.
More to the point -- how isolated is the driver. If it's worst case - am I losing any performance or risking any issue by using an expansion board for the Y axis. I figured its better to move both Y steppers to the same expansion board. More specifically can I run it as is? Or is the main board going to degrade more. I've not had any overheating issues, all boards have adequately cooling and stay in the 30c reported temp range.
-
@kazolar you can add the driver back as a 3rd Y axis driver, however if your two existing Y axis drivers have separate endstops then you would need a 3rd endstop.
Yes it's better to put both Y drivers on an expansion board (the same one) rather than just one of them. When endstops are triggered there is a small latency before drivers on expansion boards are stopped, so they may overshoot very slightly. They are then reverted to the position they had when the endstop trigger was detected. So when a single endstop is used, if only one driver is on an expansion board then only that one will overshoot and revert slightly.
Another reason to drive both motors from the same board is so that in the event of a CAN bus failure or a board reset, the motors don't move out of sync.
My guess is that either the driver is being affected by heat, or there is a bad solder joint in that area that is affected by heat.
-
@dc42 Yes that's what I did I moved Y axis entirely to the expansion board both Y steppers and end stops on the same expansion board. I recall reading that recommendation before that a paired axis should be on the same expansion board. Last time the error occurred was after the printer had been of for at least an hour while I re-wired/routed the problematic stepper/driver. So the printer hadn't been on for very long for heat to become a problem. The curious part is previously Y axis was using driver 0 and driver 5, which are next to each other driver 0 was fine, and 5 was the one that had problems. If it's localized to an area, it's really specific. As I said only thing I saw in m122, and saw yesterday while printer was printing normally was low count of timeouts on drivers -- all expansion boards showed 0 for timeouts. Are timeouts a normal thing? Should I just run the printer as is until another driver fails and replace the main board then? I'm afraid, if I wait long enough - I may get burned by tariff nonsense.
-
@kazolar it's normal to see zero driver timeouts, although you might see one timeout after VIN is powered up.
-
@dc42 I see non-zero timeouts during normal operation on my main board. My expansion boards show zeros. During the print yesterday I was seeing timeouts of 8-33 when I was polling the main board. Expansion boards said 0. Oddly enough the 1 axis that is on the main board, x. Showed stalled briefly then went back to ok, even through things were moving fine and nothing was wrong. Is this a sign of something more serious? 19 hour print finished and came out perfect. With zsp the bed mesh is working beautifully. I took a zoomed in picture of the board, and see nothing suspicious.
-
@dc42 here is typical m122 from main board
m122
=== Diagnostics ===
RepRapFirmware for Duet 3 MB6HC version 3.6.0-rc.1 (2025-02-28 15:00:13) running on Duet 3 MB6HC v1.01 (standalone mode)
Board ID: 08DJM-956BA-NA3TJ-6J1F4-3S06T-KV8UT
Used output buffers: 3 of 40 (36 max)
=== RTOS ===
Static ram: 137420
Dynamic ram: 135436 of which 0 recycled
Never used RAM 58824, free system stack 130 words
Tasks: NETWORK(1,ready,32.9%,180) ETHERNET(5,nWait 7,0.2%,307) LASER(5,nWait 7,0.7%,167) HEAT(3,nWait 6,0.0%,331) Move(4,nWait 6,0.3%,213) TMC(4,nWait 6,3.2%,341) CanReceiv(6,nWait 1,0.2%,770) CanSender(5,nWait 7,0.0%,327) CanClock(7,delaying,0.0%,348) MAIN(1,running,62.4%,500) IDLE(0,ready,0.0%,29) USBD(3,blocked,0.0%,149), total 100.0%
Owned mutexes:
=== Platform ===
Last reset 01:31:08 ago, cause: power up
Last software reset at 2058-01-24 11:14, reason: HardFault, none spinning, available RAM 222236, slot 2
Software reset code 0x4073 HFSR 0x40000000 CFSR 0x00080000 ICSR 0x00000803 BFAR 0x00000000 SP 0x20415c78 Task MAIN Freestk 2306 ok
Stack: 001d37c0 0000fffe 20412118 00000000 00000000 0049a7ff 0049a7fe a1000000 00000000 ffffffff 00000000 00000000 204137f0 20413608 ffffffff 00000000 2041221c 0049a867 00000050 00000058 00000000 00497a07 20429738 0040055d 00000050 20429728 00000000
=== Storage ===
Free file entries: 19
SD card 0 detected, interface speed: 25.0MBytes/sec
SD card longest read time 2.6ms, write time 3.3ms, max retries 0
=== Move ===
Segments created 229, maxWait 767ms, bed comp in use: mesh, height map offset 0.000, hiccups added 0/0 (0.00/30.23ms), max steps late 0, ebfmin 0.00, ebfmax 0.00
Pos req/act/dcf: 28643.00/28940/-0.17 15681.00/16154/-0.67 419.00/418/0.98 61485.00/61485/0.00 63990.00/63990/0.00 63995.00/63995/-0.00 -3405.00/-3405/0.00
Next step interrupt due in 28 ticks, disabled
Driver 0: standstill, SG min n/a, mspos 8, reads 29077, writes 242 timeouts 23
Driver 1: standstill, SG min n/a, mspos 136, reads 29077, writes 242 timeouts 23
Driver 2: stalled, SG min 0, mspos 367, reads 29077, writes 242 timeouts 23
Driver 3: standstill, SG min n/a, mspos 88, reads 29077, writes 242 timeouts 23
Driver 4: standstill, SG min n/a, mspos 24, reads 29077, writes 242 timeouts 23
Driver 5: standstill, SG min n/a, mspos 8, reads 29077, writes 242 timeouts 23
Phase step loop runtime (us): min=0, max=150, frequency (Hz): min=501, max=15957
=== Heat ===
Bed heaters 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamber heaters -1 -1 -1 -1 -1 -1 -1 -1, ordering errs 0
Heater 0 is on, I-accum = 0.1
Heater 1 is on, I-accum = 0.0
=== GCodes ===
Movement locks held by null, null
HTTP is idle in state(s) 0
Telnet is idle in state(s) 0
File is idle in state(s) 3
USB is idle in state(s) 0
Aux is idle in state(s) 0
Trigger is idle in state(s) 0
Queue is idle in state(s) 0
LCD is idle in state(s) 0
SBC is idle in state(s) 0
Daemon is idle in state(s) 0
Aux2 is idle in state(s) 0
Autopause is idle in state(s) 0
File2 is idle in state(s) 0
Queue2 is idle in state(s) 0
=== CAN ===
Messages queued 84006, received 233772, lost 0, ignored 0, errs 0, boc 0
Longest wait 0ms for reply type 0, peak Tx sync delay 568, free buffers 50 (min 46), ts 9233/9233/0
Tx timeouts 0,0,0,0,0,0
=== Network ===
Slowest loop: 22.09ms; fastest: 0.03ms
Responder states: MQTT(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0)
HTTP sessions: 4 of 8
=== Multicast handler ===
Responder is inactive, messages received 0, responses 0
= Ethernet =
Interface state: active
Error counts: 0 0 0 0 0 0
Socket states: 6 6 6 2 2 0 0 0 0 -
@kazolar was that M122 report taken after the problem with driver 5 occurred, or before?
-
@dc42 before, this is "normal" for me, when everything is printing fine.
-
@dc42 Same thing happened on another driver. So Looks like the board is failing
M569 P0.3 R1 produces nothing5/22/2025, 2:11:42 PM M569 P0.3
Drive 3 runs forwards, active high enable, timing fast, mode spreadCycle, ccr 0x10024, toff 4, tblank 2, thigh 200 (375.0 mm/sec), gs=39, iRun=31, iHold=21, current=990.234, hstart/hend/hdec 2/0/0, pos 296Power cycled
M569 P0.0
Drive 0 runs forwards, active low enable, timing fast, mode spreadCycle, ccr 0x10024, toff 4, tblank 2, thigh 200 (375.0 mm/sec), gs=79, iRun=31, iHold=21, current=2005.859, hstart/hend/hdec 2/0/0, pos 200Current was specified as 2000 in both cases -- something is messing up the current
-
@dc42 I figured it out -- the current sent to the stepper falls DRASTICALLY. This is reproducible even on drivers which are acting normally. I am running ldo nema 17s on X carriage steppers. I noticed all of a sudden one of the carriages can't home. It's trying, but it feels like it's basically running with a fraction of the current -- same behavior if I were to set the current to 0.5 or less. I tried M569 commands nothing is printed. I notice driver 2 is working fine, but when given 2 amps, the carriage is rather easy to move. So I gave it 2.5 these LDOs max is 2.8, I've run them at 2.5 on my voron. Even at 2.5 the stepper can be moved by hand with a bit more force. I then powercycled the machine, and 2.5 is now stepper is rock solid. This feels like the board driver current reg is failing? or Something of that ilk.