Duet3D Logo Duet3D
    • Tags
    • Documentation
    • Order
    • Register
    • Login

    Dead driver or dying board?

    Scheduled Pinned Locked Moved
    Duet Hardware and wiring
    4
    30
    617
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • dc42undefined
      dc42 administrators @kazolar
      last edited by dc42

      @kazolar PS - here's another test you can do, if you can provoke the problem without damaging your machine:

      1. After power up and before the driver stops working, send: M569.2 P5 R1. The response will probably be 0x00000005.
      2. Send M569.2 P5 R1 V7 followed by M569.2 P5 R1. The response should be 0x00000000.
      3. Provoke the problem.
      4. Send M569.2 P5 R1 again and report the response.

      Another useful piece of information would be to know whether sending M18 Y followed by M17 Y gets it working again without a power cycle. This will mark the U axis as not homed so normal movement won't work but homing Y should.

      [EDIT: replaced incorrect M569 commands by M569.2]

      Duet WiFi hardware designer and firmware engineer
      Please do not ask me for Duet support via PM or email, use the forum
      http://www.escher3d.com, https://miscsolutions.wordpress.com

      1 Reply Last reply Reply Quote 0
      • kazolarundefined
        kazolar @dc42
        last edited by kazolar

        @dc42 m18/m17 doesn't fix it -- it does allow me to move the axis freely by hand, so stepper does release. E-stop also disables the stepper. When I try to home the axis, the stepper on driver 5 locks up again. Only the power cycle clears the gremlins (seemingly for a rather short period of time now). I see no physical issues near the driver, I didn't take the board out to examine under my micro soldering scope, but visual inspection in situ doesn't raise any suspicions.
        Yes after I paused the printer after the last time the problem occurred M122 said all drivers were OK, which is why this is weird.

        Can you suggest how I can add this driver back into config where it's not part of kinematics. Can it be the 3rd stepper of my Y axis? Does it need it's own end stop? I can plug in a spare stepper -- have it sit on the side during the print and watch for it to lock up. Putting it back into the kinematics is not ideal for a test, it's a nema 23 with a lot of holding torque. My Y gantries are designed to allow for some flex to self square, but this behavior is very violent.

        More to the point -- how isolated is the driver. If it's worst case - am I losing any performance or risking any issue by using an expansion board for the Y axis. I figured its better to move both Y steppers to the same expansion board. More specifically can I run it as is? Or is the main board going to degrade more. I've not had any overheating issues, all boards have adequately cooling and stay in the 30c reported temp range.

        dc42undefined 1 Reply Last reply Reply Quote 0
        • dc42undefined
          dc42 administrators @kazolar
          last edited by

          @kazolar you can add the driver back as a 3rd Y axis driver, however if your two existing Y axis drivers have separate endstops then you would need a 3rd endstop.

          Yes it's better to put both Y drivers on an expansion board (the same one) rather than just one of them. When endstops are triggered there is a small latency before drivers on expansion boards are stopped, so they may overshoot very slightly. They are then reverted to the position they had when the endstop trigger was detected. So when a single endstop is used, if only one driver is on an expansion board then only that one will overshoot and revert slightly.

          Another reason to drive both motors from the same board is so that in the event of a CAN bus failure or a board reset, the motors don't move out of sync.

          My guess is that either the driver is being affected by heat, or there is a bad solder joint in that area that is affected by heat.

          Duet WiFi hardware designer and firmware engineer
          Please do not ask me for Duet support via PM or email, use the forum
          http://www.escher3d.com, https://miscsolutions.wordpress.com

          kazolarundefined 1 Reply Last reply Reply Quote 0
          • kazolarundefined
            kazolar @dc42
            last edited by

            @dc42 Yes that's what I did I moved Y axis entirely to the expansion board both Y steppers and end stops on the same expansion board. I recall reading that recommendation before that a paired axis should be on the same expansion board. Last time the error occurred was after the printer had been of for at least an hour while I re-wired/routed the problematic stepper/driver. So the printer hadn't been on for very long for heat to become a problem. The curious part is previously Y axis was using driver 0 and driver 5, which are next to each other driver 0 was fine, and 5 was the one that had problems. If it's localized to an area, it's really specific. As I said only thing I saw in m122, and saw yesterday while printer was printing normally was low count of timeouts on drivers -- all expansion boards showed 0 for timeouts. Are timeouts a normal thing? Should I just run the printer as is until another driver fails and replace the main board then? I'm afraid, if I wait long enough - I may get burned by tariff nonsense.

            dc42undefined 1 Reply Last reply Reply Quote 0
            • dc42undefined
              dc42 administrators @kazolar
              last edited by

              @kazolar it's normal to see zero driver timeouts, although you might see one timeout after VIN is powered up.

              Duet WiFi hardware designer and firmware engineer
              Please do not ask me for Duet support via PM or email, use the forum
              http://www.escher3d.com, https://miscsolutions.wordpress.com

              kazolarundefined 1 Reply Last reply Reply Quote 0
              • kazolarundefined
                kazolar @dc42
                last edited by

                @dc42 I see non-zero timeouts during normal operation on my main board. My expansion boards show zeros. During the print yesterday I was seeing timeouts of 8-33 when I was polling the main board. Expansion boards said 0. Oddly enough the 1 axis that is on the main board, x. Showed stalled briefly then went back to ok, even through things were moving fine and nothing was wrong. Is this a sign of something more serious? 19 hour print finished and came out perfect. With zsp the bed mesh is working beautifully. I took a zoomed in picture of the board, and see nothing suspicious.

                kazolarundefined 1 Reply Last reply Reply Quote 0
                • kazolarundefined
                  kazolar @kazolar
                  last edited by

                  @dc42 here is typical m122 from main board
                  m122
                  === Diagnostics ===
                  RepRapFirmware for Duet 3 MB6HC version 3.6.0-rc.1 (2025-02-28 15:00:13) running on Duet 3 MB6HC v1.01 (standalone mode)
                  Board ID: 08DJM-956BA-NA3TJ-6J1F4-3S06T-KV8UT
                  Used output buffers: 3 of 40 (36 max)
                  === RTOS ===
                  Static ram: 137420
                  Dynamic ram: 135436 of which 0 recycled
                  Never used RAM 58824, free system stack 130 words
                  Tasks: NETWORK(1,ready,32.9%,180) ETHERNET(5,nWait 7,0.2%,307) LASER(5,nWait 7,0.7%,167) HEAT(3,nWait 6,0.0%,331) Move(4,nWait 6,0.3%,213) TMC(4,nWait 6,3.2%,341) CanReceiv(6,nWait 1,0.2%,770) CanSender(5,nWait 7,0.0%,327) CanClock(7,delaying,0.0%,348) MAIN(1,running,62.4%,500) IDLE(0,ready,0.0%,29) USBD(3,blocked,0.0%,149), total 100.0%
                  Owned mutexes:
                  === Platform ===
                  Last reset 01:31:08 ago, cause: power up
                  Last software reset at 2058-01-24 11:14, reason: HardFault, none spinning, available RAM 222236, slot 2
                  Software reset code 0x4073 HFSR 0x40000000 CFSR 0x00080000 ICSR 0x00000803 BFAR 0x00000000 SP 0x20415c78 Task MAIN Freestk 2306 ok
                  Stack: 001d37c0 0000fffe 20412118 00000000 00000000 0049a7ff 0049a7fe a1000000 00000000 ffffffff 00000000 00000000 204137f0 20413608 ffffffff 00000000 2041221c 0049a867 00000050 00000058 00000000 00497a07 20429738 0040055d 00000050 20429728 00000000
                  === Storage ===
                  Free file entries: 19
                  SD card 0 detected, interface speed: 25.0MBytes/sec
                  SD card longest read time 2.6ms, write time 3.3ms, max retries 0
                  === Move ===
                  Segments created 229, maxWait 767ms, bed comp in use: mesh, height map offset 0.000, hiccups added 0/0 (0.00/30.23ms), max steps late 0, ebfmin 0.00, ebfmax 0.00
                  Pos req/act/dcf: 28643.00/28940/-0.17 15681.00/16154/-0.67 419.00/418/0.98 61485.00/61485/0.00 63990.00/63990/0.00 63995.00/63995/-0.00 -3405.00/-3405/0.00
                  Next step interrupt due in 28 ticks, disabled
                  Driver 0: standstill, SG min n/a, mspos 8, reads 29077, writes 242 timeouts 23
                  Driver 1: standstill, SG min n/a, mspos 136, reads 29077, writes 242 timeouts 23
                  Driver 2: stalled, SG min 0, mspos 367, reads 29077, writes 242 timeouts 23
                  Driver 3: standstill, SG min n/a, mspos 88, reads 29077, writes 242 timeouts 23
                  Driver 4: standstill, SG min n/a, mspos 24, reads 29077, writes 242 timeouts 23
                  Driver 5: standstill, SG min n/a, mspos 8, reads 29077, writes 242 timeouts 23
                  Phase step loop runtime (us): min=0, max=150, frequency (Hz): min=501, max=15957
                  === Heat ===
                  Bed heaters 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamber heaters -1 -1 -1 -1 -1 -1 -1 -1, ordering errs 0
                  Heater 0 is on, I-accum = 0.1
                  Heater 1 is on, I-accum = 0.0
                  === GCodes ===
                  Movement locks held by null, null
                  HTTP is idle in state(s) 0
                  Telnet is idle in state(s) 0
                  File is idle in state(s) 3
                  USB is idle in state(s) 0
                  Aux is idle in state(s) 0
                  Trigger is idle in state(s) 0
                  Queue is idle in state(s) 0
                  LCD is idle in state(s) 0
                  SBC is idle in state(s) 0
                  Daemon is idle in state(s) 0
                  Aux2 is idle in state(s) 0
                  Autopause is idle in state(s) 0
                  File2 is idle in state(s) 0
                  Queue2 is idle in state(s) 0
                  === CAN ===
                  Messages queued 84006, received 233772, lost 0, ignored 0, errs 0, boc 0
                  Longest wait 0ms for reply type 0, peak Tx sync delay 568, free buffers 50 (min 46), ts 9233/9233/0
                  Tx timeouts 0,0,0,0,0,0
                  === Network ===
                  Slowest loop: 22.09ms; fastest: 0.03ms
                  Responder states: MQTT(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0)
                  HTTP sessions: 4 of 8
                  === Multicast handler ===
                  Responder is inactive, messages received 0, responses 0
                  = Ethernet =
                  Interface state: active
                  Error counts: 0 0 0 0 0 0
                  Socket states: 6 6 6 2 2 0 0 0 0

                  dc42undefined 1 Reply Last reply Reply Quote 0
                  • dc42undefined
                    dc42 administrators @kazolar
                    last edited by

                    @kazolar was that M122 report taken after the problem with driver 5 occurred, or before?

                    Duet WiFi hardware designer and firmware engineer
                    Please do not ask me for Duet support via PM or email, use the forum
                    http://www.escher3d.com, https://miscsolutions.wordpress.com

                    kazolarundefined 2 Replies Last reply Reply Quote 0
                    • kazolarundefined
                      kazolar @dc42
                      last edited by

                      @dc42 before, this is "normal" for me, when everything is printing fine.

                      kazolarundefined 1 Reply Last reply Reply Quote 0
                      • kazolarundefined
                        kazolar @dc42
                        last edited by kazolar

                        @dc42 Same thing happened on another driver. So Looks like the board is failing
                        M569 P0.3 R1 produces nothing

                        5/22/2025, 2:11:42 PM M569 P0.3
                        Drive 3 runs forwards, active high enable, timing fast, mode spreadCycle, ccr 0x10024, toff 4, tblank 2, thigh 200 (375.0 mm/sec), gs=39, iRun=31, iHold=21, current=990.234, hstart/hend/hdec 2/0/0, pos 296

                        Power cycled
                        M569 P0.0
                        Drive 0 runs forwards, active low enable, timing fast, mode spreadCycle, ccr 0x10024, toff 4, tblank 2, thigh 200 (375.0 mm/sec), gs=79, iRun=31, iHold=21, current=2005.859, hstart/hend/hdec 2/0/0, pos 200

                        Current was specified as 2000 in both cases -- something is messing up the current

                        1 Reply Last reply Reply Quote 0
                        • kazolarundefined
                          kazolar @kazolar
                          last edited by

                          @dc42 I figured it out -- the current sent to the stepper falls DRASTICALLY. This is reproducible even on drivers which are acting normally. I am running ldo nema 17s on X carriage steppers. I noticed all of a sudden one of the carriages can't home. It's trying, but it feels like it's basically running with a fraction of the current -- same behavior if I were to set the current to 0.5 or less. I tried M569 commands nothing is printed. I notice driver 2 is working fine, but when given 2 amps, the carriage is rather easy to move. So I gave it 2.5 these LDOs max is 2.8, I've run them at 2.5 on my voron. Even at 2.5 the stepper can be moved by hand with a bit more force. I then powercycled the machine, and 2.5 is now stepper is rock solid. This feels like the board driver current reg is failing? or Something of that ilk.

                          1 Reply Last reply Reply Quote 0
                          • kazolarundefined
                            kazolar
                            last edited by

                            Lots of timeouts.
                            Driver 0: standstill, SG min 0, mspos 216, reads 51266, writes 3696 timeouts 364
                            Driver 1: standstill, SG min 0, mspos 72, reads 51266, writes 3696 timeouts 364
                            Driver 2: ok, SG min 0, mspos 600, reads 51266, writes 3696 timeouts 364
                            Driver 3: standstill, SG min n/a, mspos 8, reads 51277, writes 3685 timeouts 364
                            Driver 4: standstill, SG min 0, mspos 936, reads 51266, writes 3696 timeouts 364
                            Driver 5: standstill, SG min n/a, mspos 8, reads 51277, writes 3685 timeouts 364
                            Phase step loop runtime (us): min=0, max=192, frequency (Hz): min=492, max=46875

                            And this is when the issue isn't happening
                            Updated the firmware to latest rc3, didn't help.

                            Reproduced the problem on 2 drivers now and other drivers show reduced current when the problem occurs. The drivers which are faulty drop current down 1 amp -- and the steppers I'm using need more than that to move.

                            Ordered a new board, clearly this one is not long for this world. Gonna get this print done, and will replace it over the weekend. Considering moving to SBC mode as well during the swap.
                            Will also be switching to 40-48v (need to do the math which is best for my steppers)

                            dc42undefined 1 Reply Last reply Reply Quote 0
                            • dc42undefined
                              dc42 administrators @kazolar
                              last edited by dc42

                              @kazolar this doesn't sound like a hardware fault to me. What M906 idle current setting do you use? If motor current is reducing on all drivers when this happens, perhaps the firmware is setting all motors to idle current incorrectly.

                              Duet WiFi hardware designer and firmware engineer
                              Please do not ask me for Duet support via PM or email, use the forum
                              http://www.escher3d.com, https://miscsolutions.wordpress.com

                              kazolarundefined 1 Reply Last reply Reply Quote 0
                              • kazolarundefined
                                kazolar @dc42
                                last edited by

                                @dc42 idle is 50% hasn't changed in 7 years I've had this printer. When the issue occurs. Restart cant even fix the current. Manually sending m906 isn't reflected in actual current. It seemed like now 2 drivers are pulling all current down. I was checking during the print yesterday and left over current was holding correctly. I ran 100s of hours with this config and Firmware. Only recent change was addition of szp. Nothing changed in drive config (until I had to stop using 2 of the drivers on the main board) Firmware reboot + reinstall would expect to restore any current settings, all the timeouts don't inspire confidence. Problems are only on the main board all expansions and toolboards are fine. Timeouts and current issues are main board only. Certainly looks like it's dying one driver at a time.

                                dc42undefined 1 Reply Last reply Reply Quote 0
                                • dc42undefined
                                  dc42 administrators @kazolar
                                  last edited by dc42

                                  @kazolar if the idle current percentage is 50% then that would explain a drop in current from 2A to 1A. Have you really run 100s of hours using 3.6.0-rc.3 firmware before this issue started occurring?

                                  You might like to change the idle current percentage to 100% to make sure that the issue isn't that the firmware is incorrectly changing drivers to idle current setting.

                                  Duet WiFi hardware designer and firmware engineer
                                  Please do not ask me for Duet support via PM or email, use the forum
                                  http://www.escher3d.com, https://miscsolutions.wordpress.com

                                  kazolarundefined 1 Reply Last reply Reply Quote 0
                                  • kazolarundefined
                                    kazolar @dc42
                                    last edited by kazolar

                                    @dc42 I had run 100s of hours on rc1. I had zero issues, i.e only thing i changed was add szp and that is connected to a toolboard, and basically proceeded to print another set of long prints and the issues started. Also issues started gradually. First it could go 15 hrs before there was a problem, then 5, then 3, then 1. Rc3 was updated just yesterday. Also when the problem occurs, it says it's 2 amps, but I can move the carriage by hand and the stepper skips, if I power cycle, it reports 2 amps again, and now the carriage would have belts slip before the stepper skips. Also the current problem on the main board only and expansion boards go in an out of idle and maintain current. I.e 2 amps is 2 amps. I reproduced it, i have 1 carriage on main board, one on expansion, both say 2 amps, 1 i can move by hand, one i cant. Hence why the main board seemed sus. I guess if the problem re-occurs with the 1.02 version of hardware (i am switching all to 48v capable) then we can hunt for idle or current issue as it connect to firmware. I have a big project coming up, we'll see. Board is of warranty, seems something not isolated to 1 driver is malfunctioning.

                                    PS. If for whatever reason you want to investigate the main board, I won't be re-selling it, so it would go to e-waste, so if you want to cover shipping to UK, you can have it. All my expansion boards are perfectly functional, so I will try to sell them - since I'm switching to 1.02 versions to use 40-48v everywhere.

                                    dc42undefined 1 Reply Last reply Reply Quote 0
                                    • dc42undefined
                                      dc42 administrators @kazolar
                                      last edited by dc42

                                      @kazolar M906 reports the configured current. The actual current may be lower than reported by M906 when the motors are not moving, either because of driver standstill current reduction, or because of RRF idle detection.

                                      Did you ever run the test i described in this post https://forum.duet3d.com/post/355556 ? That checks for an unlikely error condition that the firmware doesn't check for.

                                      Duet WiFi hardware designer and firmware engineer
                                      Please do not ask me for Duet support via PM or email, use the forum
                                      http://www.escher3d.com, https://miscsolutions.wordpress.com

                                      kazolarundefined 1 Reply Last reply Reply Quote 0
                                      • kazolarundefined
                                        kazolar @dc42
                                        last edited by

                                        @dc42 yes M569 P5 R1 or in my case M569 P0.5 R1 shows no response -- I had the same issue reproduced on driver 3 also, I tried M569 P3 R1 and M569 P0.3 R1 -- no response. Tried on rc1 and rc3, no response.

                                        dc42undefined 1 Reply Last reply Reply Quote 0
                                        • dc42undefined
                                          dc42 administrators @kazolar
                                          last edited by dc42

                                          @kazolar why are you using the R1 parameter with M569? That parameter is for use with external drivers, and should not produce a response.

                                          OK, my bad, I made a mistake in my post. The command to use is M569.2 not M569.

                                          Duet WiFi hardware designer and firmware engineer
                                          Please do not ask me for Duet support via PM or email, use the forum
                                          http://www.escher3d.com, https://miscsolutions.wordpress.com

                                          kazolarundefined 1 Reply Last reply Reply Quote 0
                                          • kazolarundefined
                                            kazolar @dc42
                                            last edited by kazolar

                                            @dc42 ok, I plugged the 2nd carriage into driver 3 -- it's set for 2 amps, but I can move it by hand easily
                                            This is what I have in config
                                            M906 Y1600:1600 U1600:1600 X2000 V2000 W2000 A2000 Z2000:2000:2000:2000 E750:750:750:750 I50 ; set motor currents (mA) and motor idle factor in per cent
                                            5/23/2025, 1:18:38 PM M569 P0.3

                                            Drive 3 runs forwards, active low enable, timing fast, mode spreadCycle, ccr 0x10024, toff 4, tblank 2, thigh 200 (375.0 mm/sec), gs=39, iRun=31, iHold=21, current=990.234, hstart/hend/hdec 2/0/0, pos 232
                                            It didn't apply -- it was ignored. When I had V mapped to driver 0, it was reporting 2 amps.

                                            5/23/2025, 1:19:19 PM M906 V2000
                                            5/23/2025, 1:19:26 PM M906 P0.3
                                            Drive 3 runs forwards, active low enable, timing fast, mode spreadCycle, ccr 0x10024, toff 4, tblank 2, thigh 200 (375.0 mm/sec), gs=79, iRun=31, iHold=21, current=2005.859, hstart/hend/hdec 2/0/0, pos 232

                                            5/23/2025, 2:06:08 PM M569.2 P0.3 R1
                                            Register 0x01 value 0x00000005

                                            M569.2 P0.3 R1 V7

                                            M569.2 P0.3 R1
                                            Register 0x01 value 0x00000000

                                            I'll try to reproduce it, but it requires a print of some sort, I'll try doing a number of test cycles with V axis mapped to driver 3

                                            dc42undefined jay_s_ukundefined 4 Replies Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Unless otherwise noted, all forum content is licensed under CC-BY-SA