Duet3D Logo Duet3D
    • Tags
    • Documentation
    • Order
    • Register
    • Login

    Dead driver or dying board?

    Scheduled Pinned Locked Moved
    Duet Hardware and wiring
    3
    15
    206
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • kazolarundefined
      kazolar
      last edited by kazolar

      I have a config with a lot of steppers -- double idex, and 4 z steppers. I am running duet 3 with 6 steppers and 2 3 stepper expansion boards and 4 toolboards. I've had this config for a couple of years -- using duet 3 electronics. Recently I've had an odd issue. My primary Y gantry - which runs on 2 nema 23s started acting weird, or more specifically the right Y stepper which is on driver 5 of the main board randomly decides to stop moving. It locks up -- i.e the other Y stepper is trying to force the gantry and fighting it.
      So enable is on, but clearly step is not doing anything. When this happens, there are no errors from the driver in the console or DWC. M122 also shows no missed steps, just some driver timeouts, but those are there when things are fine, but that's it. It's getting worse over time first -- printer ran for 15+ hours with no issues, then 4 hrs, then ~2 hrs. E-stop doesn't fix it, but power cycle does, temporarily.
      I ruled out connection issues (I redid the stepper connector and wiring).
      I have a spare 3 driver expansion board, so I moved both Y steppers to that board - along with the Y end stops. Completed a long 19 hour print with no issue. If driver 5 on the main board has developed an issue, should I be expecting more issues with the main duet 3 board, or can I just cross driver 5 off and run with the expansion for the Y axis? Are timeouts from drivers on mainboard normal? Should I be looking to replace the main board (it's v 1.01)

      Edited for length

      o_lampeundefined 1 Reply Last reply Reply Quote 0
      • o_lampeundefined
        o_lampe @kazolar
        last edited by

        @kazolar I ruined my CNC frame with a similar problem. One Y motor stopped and skipped steps, the other went on.
        To avoid that once and for all, you should add an "anti racking" mechanism.
        Maybe couple both motors (dual shaft motors preferably) with a shaft extension or do what is common in big machines: add a cord/wire to the crossbeam to eliminate racking.
        Check out the first few pages of my hashPrinter thread, where I used that method a lot. There are also some links to my Youtube channel, where I demonstrate the anti racking effect.

        PS: If you want more people to read your wall of text, you should edit it for better readability.

        kazolarundefined 1 Reply Last reply Reply Quote 0
        • kazolarundefined
          kazolar @o_lampe
          last edited by

          @o_lampe thank you for the tips, I kinda have to live with the decoupled approach as I dialed in each Y end stop to perfectly square the gantry, so having the steppers coupled together would defeat the purpose. Also with each gantry running 2 carriages, there is no room to do that type of kinematic. The gantry actually is designed to allow for a few degrees of flex to self square when homing. I updated my rambling to be more specific to the issue I'm asking about. Which is more to the point (now I know that running the machine with the expansion board controlling the Y axis with no issue) does the failing(failed) driver 5 signify the main board is on it's way out?

          dc42undefined 2 Replies Last reply Reply Quote 0
          • dc42undefined
            dc42 administrators @kazolar
            last edited by

            @kazolar the issue is likely to be confined to that driver only.

            When the driver stops working, are you sure that M122 still reports the driver status as OK? Does the soldering of driver 5 and the components around it look OK?

            Duet WiFi hardware designer and firmware engineer
            Please do not ask me for Duet support via PM or email, use the forum
            http://www.escher3d.com, https://miscsolutions.wordpress.com

            kazolarundefined 1 Reply Last reply Reply Quote 0
            • dc42undefined
              dc42 administrators @kazolar
              last edited by dc42

              @kazolar PS - here's another test you can do, if you can provoke the problem without damaging your machine:

              1. After power up and before the driver stops working, send: M569 P5 R1. The response will probably be 0x00000005.
              2. Send M569 P5 R1 V7 followed by M569 P5 R1. The response should be 0x00000000.
              3. Provoke the problem.
              4. Send M569 P5 R1 again and report the response.

              Another useful piece of information would be to know whether sending M18 Y followed by M17 Y gets it working again without a power cycle. This will mark the U axis as not homed so normal movement won't work but homing Y should.

              Duet WiFi hardware designer and firmware engineer
              Please do not ask me for Duet support via PM or email, use the forum
              http://www.escher3d.com, https://miscsolutions.wordpress.com

              1 Reply Last reply Reply Quote 0
              • kazolarundefined
                kazolar @dc42
                last edited by kazolar

                @dc42 m18/m17 doesn't fix it -- it does allow me to move the axis freely by hand, so stepper does release. E-stop also disables the stepper. When I try to home the axis, the stepper on driver 5 locks up again. Only the power cycle clears the gremlins (seemingly for a rather short period of time now). I see no physical issues near the driver, I didn't take the board out to examine under my micro soldering scope, but visual inspection in situ doesn't raise any suspicions.
                Yes after I paused the printer after the last time the problem occurred M122 said all drivers were OK, which is why this is weird.

                Can you suggest how I can add this driver back into config where it's not part of kinematics. Can it be the 3rd stepper of my Y axis? Does it need it's own end stop? I can plug in a spare stepper -- have it sit on the side during the print and watch for it to lock up. Putting it back into the kinematics is not ideal for a test, it's a nema 23 with a lot of holding torque. My Y gantries are designed to allow for some flex to self square, but this behavior is very violent.

                More to the point -- how isolated is the driver. If it's worst case - am I losing any performance or risking any issue by using an expansion board for the Y axis. I figured its better to move both Y steppers to the same expansion board. More specifically can I run it as is? Or is the main board going to degrade more. I've not had any overheating issues, all boards have adequately cooling and stay in the 30c reported temp range.

                dc42undefined 1 Reply Last reply Reply Quote 0
                • dc42undefined
                  dc42 administrators @kazolar
                  last edited by

                  @kazolar you can add the driver back as a 3rd Y axis driver, however if your two existing Y axis drivers have separate endstops then you would need a 3rd endstop.

                  Yes it's better to put both Y drivers on an expansion board (the same one) rather than just one of them. When endstops are triggered there is a small latency before drivers on expansion boards are stopped, so they may overshoot very slightly. They are then reverted to the position they had when the endstop trigger was detected. So when a single endstop is used, if only one driver is on an expansion board then only that one will overshoot and revert slightly.

                  Another reason to drive both motors from the same board is so that in the event of a CAN bus failure or a board reset, the motors don't move out of sync.

                  My guess is that either the driver is being affected by heat, or there is a bad solder joint in that area that is affected by heat.

                  Duet WiFi hardware designer and firmware engineer
                  Please do not ask me for Duet support via PM or email, use the forum
                  http://www.escher3d.com, https://miscsolutions.wordpress.com

                  kazolarundefined 1 Reply Last reply Reply Quote 0
                  • kazolarundefined
                    kazolar @dc42
                    last edited by

                    @dc42 Yes that's what I did I moved Y axis entirely to the expansion board both Y steppers and end stops on the same expansion board. I recall reading that recommendation before that a paired axis should be on the same expansion board. Last time the error occurred was after the printer had been of for at least an hour while I re-wired/routed the problematic stepper/driver. So the printer hadn't been on for very long for heat to become a problem. The curious part is previously Y axis was using driver 0 and driver 5, which are next to each other driver 0 was fine, and 5 was the one that had problems. If it's localized to an area, it's really specific. As I said only thing I saw in m122, and saw yesterday while printer was printing normally was low count of timeouts on drivers -- all expansion boards showed 0 for timeouts. Are timeouts a normal thing? Should I just run the printer as is until another driver fails and replace the main board then? I'm afraid, if I wait long enough - I may get burned by tariff nonsense.

                    dc42undefined 1 Reply Last reply Reply Quote 0
                    • dc42undefined
                      dc42 administrators @kazolar
                      last edited by

                      @kazolar it's normal to see zero driver timeouts, although you might see one timeout after VIN is powered up.

                      Duet WiFi hardware designer and firmware engineer
                      Please do not ask me for Duet support via PM or email, use the forum
                      http://www.escher3d.com, https://miscsolutions.wordpress.com

                      kazolarundefined 1 Reply Last reply Reply Quote 0
                      • kazolarundefined
                        kazolar @dc42
                        last edited by

                        @dc42 I see non-zero timeouts during normal operation on my main board. My expansion boards show zeros. During the print yesterday I was seeing timeouts of 8-33 when I was polling the main board. Expansion boards said 0. Oddly enough the 1 axis that is on the main board, x. Showed stalled briefly then went back to ok, even through things were moving fine and nothing was wrong. Is this a sign of something more serious? 19 hour print finished and came out perfect. With zsp the bed mesh is working beautifully. I took a zoomed in picture of the board, and see nothing suspicious.

                        kazolarundefined 1 Reply Last reply Reply Quote 0
                        • kazolarundefined
                          kazolar @kazolar
                          last edited by

                          @dc42 here is typical m122 from main board
                          m122
                          === Diagnostics ===
                          RepRapFirmware for Duet 3 MB6HC version 3.6.0-rc.1 (2025-02-28 15:00:13) running on Duet 3 MB6HC v1.01 (standalone mode)
                          Board ID: 08DJM-956BA-NA3TJ-6J1F4-3S06T-KV8UT
                          Used output buffers: 3 of 40 (36 max)
                          === RTOS ===
                          Static ram: 137420
                          Dynamic ram: 135436 of which 0 recycled
                          Never used RAM 58824, free system stack 130 words
                          Tasks: NETWORK(1,ready,32.9%,180) ETHERNET(5,nWait 7,0.2%,307) LASER(5,nWait 7,0.7%,167) HEAT(3,nWait 6,0.0%,331) Move(4,nWait 6,0.3%,213) TMC(4,nWait 6,3.2%,341) CanReceiv(6,nWait 1,0.2%,770) CanSender(5,nWait 7,0.0%,327) CanClock(7,delaying,0.0%,348) MAIN(1,running,62.4%,500) IDLE(0,ready,0.0%,29) USBD(3,blocked,0.0%,149), total 100.0%
                          Owned mutexes:
                          === Platform ===
                          Last reset 01:31:08 ago, cause: power up
                          Last software reset at 2058-01-24 11:14, reason: HardFault, none spinning, available RAM 222236, slot 2
                          Software reset code 0x4073 HFSR 0x40000000 CFSR 0x00080000 ICSR 0x00000803 BFAR 0x00000000 SP 0x20415c78 Task MAIN Freestk 2306 ok
                          Stack: 001d37c0 0000fffe 20412118 00000000 00000000 0049a7ff 0049a7fe a1000000 00000000 ffffffff 00000000 00000000 204137f0 20413608 ffffffff 00000000 2041221c 0049a867 00000050 00000058 00000000 00497a07 20429738 0040055d 00000050 20429728 00000000
                          === Storage ===
                          Free file entries: 19
                          SD card 0 detected, interface speed: 25.0MBytes/sec
                          SD card longest read time 2.6ms, write time 3.3ms, max retries 0
                          === Move ===
                          Segments created 229, maxWait 767ms, bed comp in use: mesh, height map offset 0.000, hiccups added 0/0 (0.00/30.23ms), max steps late 0, ebfmin 0.00, ebfmax 0.00
                          Pos req/act/dcf: 28643.00/28940/-0.17 15681.00/16154/-0.67 419.00/418/0.98 61485.00/61485/0.00 63990.00/63990/0.00 63995.00/63995/-0.00 -3405.00/-3405/0.00
                          Next step interrupt due in 28 ticks, disabled
                          Driver 0: standstill, SG min n/a, mspos 8, reads 29077, writes 242 timeouts 23
                          Driver 1: standstill, SG min n/a, mspos 136, reads 29077, writes 242 timeouts 23
                          Driver 2: stalled, SG min 0, mspos 367, reads 29077, writes 242 timeouts 23
                          Driver 3: standstill, SG min n/a, mspos 88, reads 29077, writes 242 timeouts 23
                          Driver 4: standstill, SG min n/a, mspos 24, reads 29077, writes 242 timeouts 23
                          Driver 5: standstill, SG min n/a, mspos 8, reads 29077, writes 242 timeouts 23
                          Phase step loop runtime (us): min=0, max=150, frequency (Hz): min=501, max=15957
                          === Heat ===
                          Bed heaters 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamber heaters -1 -1 -1 -1 -1 -1 -1 -1, ordering errs 0
                          Heater 0 is on, I-accum = 0.1
                          Heater 1 is on, I-accum = 0.0
                          === GCodes ===
                          Movement locks held by null, null
                          HTTP is idle in state(s) 0
                          Telnet is idle in state(s) 0
                          File is idle in state(s) 3
                          USB is idle in state(s) 0
                          Aux is idle in state(s) 0
                          Trigger is idle in state(s) 0
                          Queue is idle in state(s) 0
                          LCD is idle in state(s) 0
                          SBC is idle in state(s) 0
                          Daemon is idle in state(s) 0
                          Aux2 is idle in state(s) 0
                          Autopause is idle in state(s) 0
                          File2 is idle in state(s) 0
                          Queue2 is idle in state(s) 0
                          === CAN ===
                          Messages queued 84006, received 233772, lost 0, ignored 0, errs 0, boc 0
                          Longest wait 0ms for reply type 0, peak Tx sync delay 568, free buffers 50 (min 46), ts 9233/9233/0
                          Tx timeouts 0,0,0,0,0,0
                          === Network ===
                          Slowest loop: 22.09ms; fastest: 0.03ms
                          Responder states: MQTT(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0)
                          HTTP sessions: 4 of 8
                          === Multicast handler ===
                          Responder is inactive, messages received 0, responses 0
                          = Ethernet =
                          Interface state: active
                          Error counts: 0 0 0 0 0 0
                          Socket states: 6 6 6 2 2 0 0 0 0

                          dc42undefined 1 Reply Last reply Reply Quote 0
                          • dc42undefined
                            dc42 administrators @kazolar
                            last edited by

                            @kazolar was that M122 report taken after the problem with driver 5 occurred, or before?

                            Duet WiFi hardware designer and firmware engineer
                            Please do not ask me for Duet support via PM or email, use the forum
                            http://www.escher3d.com, https://miscsolutions.wordpress.com

                            kazolarundefined 2 Replies Last reply Reply Quote 0
                            • kazolarundefined
                              kazolar @dc42
                              last edited by

                              @dc42 before, this is "normal" for me, when everything is printing fine.

                              kazolarundefined 1 Reply Last reply Reply Quote 0
                              • kazolarundefined
                                kazolar @dc42
                                last edited by kazolar

                                @dc42 Same thing happened on another driver. So Looks like the board is failing
                                M569 P0.3 R1 produces nothing

                                5/22/2025, 2:11:42 PM M569 P0.3
                                Drive 3 runs forwards, active high enable, timing fast, mode spreadCycle, ccr 0x10024, toff 4, tblank 2, thigh 200 (375.0 mm/sec), gs=39, iRun=31, iHold=21, current=990.234, hstart/hend/hdec 2/0/0, pos 296

                                Power cycled
                                M569 P0.0
                                Drive 0 runs forwards, active low enable, timing fast, mode spreadCycle, ccr 0x10024, toff 4, tblank 2, thigh 200 (375.0 mm/sec), gs=79, iRun=31, iHold=21, current=2005.859, hstart/hend/hdec 2/0/0, pos 200

                                Current was specified as 2000 in both cases -- something is messing up the current

                                1 Reply Last reply Reply Quote 0
                                • kazolarundefined
                                  kazolar @kazolar
                                  last edited by

                                  @dc42 I figured it out -- the current sent to the stepper falls DRASTICALLY. This is reproducible even on drivers which are acting normally. I am running ldo nema 17s on X carriage steppers. I noticed all of a sudden one of the carriages can't home. It's trying, but it feels like it's basically running with a fraction of the current -- same behavior if I were to set the current to 0.5 or less. I tried M569 commands nothing is printed. I notice driver 2 is working fine, but when given 2 amps, the carriage is rather easy to move. So I gave it 2.5 these LDOs max is 2.8, I've run them at 2.5 on my voron. Even at 2.5 the stepper can be moved by hand with a bit more force. I then powercycled the machine, and 2.5 is now stepper is rock solid. This feels like the board driver current reg is failing? or Something of that ilk.

                                  1 Reply Last reply Reply Quote 0
                                  • First post
                                    Last post
                                  Unless otherwise noted, all forum content is licensed under CC-BY-SA