• Tags
  • Documentation
  • Order
  • Register
  • Login
Duet3D Logo Duet3D
  • Tags
  • Documentation
  • Order
  • Register
  • Login

CAN bus anomalies with 6HC and 3HC

Scheduled Pinned Locked Moved Solved
Duet Hardware and wiring
6
52
3.0k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • undefined
    adammhaile
    last edited by 4 Apr 2022, 14:27

    I've got a custom IDEX printer I'm building that uses a 6HC main board and 3HC expansion.
    The 6HC runs the left side of the machine plus the 3 Z steppers. The 3HC runs the right side of the machine. The Y axis has dual motors with independent homing switches - so when doing the Y homing it depends on both these boards being in completely sync.

    I've started to occasionally have a weird problem that I think is CAN bus related but I'll admit it's been difficult to replicate reliably. Most typically, if I turn the machine on and then don't touch it for 15-45 minutes and THEN try to home it, the Y axis will jam up because only the left side (with the 6HC) is being driven.

    The last couple of times this has happen I noticed that instead of the 3HC "diag" LED blinking in sync with the 6HC it was blinking twice as fast. I cannot find anything in the documentation that indicates why it would be blinking out of sync with the 6HC other than that it blinks rapidly when booting up.

    Except, it's not booting up - I can query it with M122 B1 and it returns results. It's even updating the right tool thermistor value in real time - not actively being heated, it just normally fluctuates a few tenths of a degree.

    For example, the below output is from while the 3HC is blinking rapidly:

    M122 b1
    Diagnostics for board 1:
    Duet EXP3HC firmware version 3.4.0 (2022-03-15 08:59:40)
    Bootloader ID: not available
    All averaging filters OK
    Never used RAM 158416, free system stack 200 words
    Tasks: Move(notifyWait,0.0%,160) HEAT(notifyWait,0.0%,92) CanAsync(notifyWait,0.0%,69) CanRecv(notifyWait,0.0%,82) CanClock(notifyWait,0.0%,71) TMC(notifyWait,7.7%,99) MAIN(running,90.8%,345) IDLE(ready,0.0%,39) AIN(delaying,1.4%,263), total 100.0%
    Last reset 00:20:45 ago, cause: software
    Last software reset data not available
    Driver 0: pos 0, 160.0 steps/mm,standstill, SG min 0, mspos 8, reads 55548, writes 16 timeouts 0, steps req 0 done 0
    Driver 1: pos 0, 160.0 steps/mm,standstill, SG min 0, mspos 8, reads 55549, writes 16 timeouts 0, steps req 0 done 0
    Driver 2: pos 0, 397.0 steps/mm,standstill, SG min 0, mspos 8, reads 55549, writes 16 timeouts 0, steps req 0 done 0
    Moves scheduled 0, completed 0, in progress 0, hiccups 0, step errors 0, maxPrep 0, maxOverdue 0, maxInc 0, mcErrs 0, gcmErrs 0
    Peak sync jitter 0/0, peak Rx sync delay 0, resyncs 0/0, no step interrupt scheduled
    VIN voltage: min 24.2, current 24.2, max 24.3
    V12 voltage: min 12.1, current 12.2, max 12.2
    MCU temperature: min 35.3C, current 35.5C, max 36.6C
    Last sensors broadcast 0x00000004 found 1 12 ticks ago, 0 ordering errs, loop time 0
    CAN messages queued 25308, send timeouts 0, received 11153, lost 0, free buffers 37, min 37, error reg 0
    dup 0, oos 0/0/0/0, bm 0, wbm 0, rxMotionDelay 0
    === Filament sensors ===
    Interrupt 5726621 to 0us, poll 1 to 141us
    Driver 2: no filament

    Here's the 6HC diagnostics at the same time:

    M122 b0
    === Diagnostics ===
    RepRapFirmware for Duet 3 MB6HC version 3.4.0 (2022-03-15 18:57:24) running on Duet 3 MB6HC v1.01 or later (SBC mode)
    Board ID: 08DJM-956BA-NA3TN-6J1FG-3S86T-TUBUS
    Used output buffers: 1 of 40 (14 max)
    === RTOS ===
    Static ram: 151000
    Dynamic ram: 68768 of which 0 recycled
    Never used RAM 128048, free system stack 200 words
    Tasks: SBC(ready,0.6%,474) HEAT(notifyWait,0.0%,321) Move(notifyWait,0.0%,352) CanReceiv(notifyWait,0.0%,772) CanSender(notifyWait,0.0%,374) CanClock(delaying,0.0%,339) TMC(notifyWait,7.8%,92) MAIN(running,91.5%,1180) IDLE(ready,0.0%,30), total 100.0%
    Owned mutexes: HTTP(MAIN)
    === Platform ===
    Last reset 00:20:43 ago, cause: power up
    Last software reset details not available
    Error status: 0x00
    Aux1 errors 0,0,0
    Step timer max interval 136
    MCU temperature: min 42.2, current 42.3, max 42.6
    Supply voltage: min 23.9, current 24.0, max 24.0, under voltage events: 0, over voltage events: 0, power good: yes
    12V rail voltage: min 12.1, current 12.1, max 12.2, under voltage events: 0
    Heap OK, handles allocated/used 99/46, heap memory allocated/used/recyclable 2048/812/220, gc cycles 0
    Events: 0 queued, 0 completed
    Driver 0: standstill, SG min n/a, mspos 8, reads 36075, writes 0 timeouts 0
    Driver 1: standstill, SG min n/a, mspos 8, reads 36075, writes 0 timeouts 0
    Driver 2: standstill, SG min n/a, mspos 8, reads 36075, writes 0 timeouts 0
    Driver 3: standstill, SG min n/a, mspos 8, reads 36075, writes 0 timeouts 0
    Driver 4: standstill, SG min n/a, mspos 8, reads 36074, writes 0 timeouts 0
    Driver 5: standstill, SG min n/a, mspos 8, reads 36074, writes 0 timeouts 0
    Date/time: 2022-04-04 03:38:44
    Slowest loop: 1.34ms; fastest: 0.06ms
    === Storage ===
    Free file entries: 10
    SD card 0 not detected, interface speed: 37.5MBytes/sec
    SD card longest read time 0.0ms, write time 0.0ms, max retries 0
    === Move ===
    DMs created 125, segments created 0, maxWait 0ms, bed compensation in use: none, comp offset 0.000
    === MainDDARing ===
    Scheduled moves 0, completed 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1
    === AuxDDARing ===
    Scheduled moves 0, completed 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1
    === Heat ===
    Bed heaters 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamber heaters -1 -1 -1 -1, ordering errs 0
    Heater 1 is on, I-accum = 0.0
    === GCodes ===
    Segments left: 0
    Movement lock held by null
    HTTP* is doing "M122 B0" in state(s) 0
    Telnet is idle in state(s) 0
    File is idle in state(s) 0
    USB is idle in state(s) 0
    Aux is idle in state(s) 0
    Trigger* is idle in state(s) 0
    Queue is idle in state(s) 0
    LCD is idle in state(s) 0
    SBC is idle in state(s) 0
    Daemon is idle in state(s) 0
    Aux2 is idle in state(s) 0
    Autopause is idle in state(s) 0
    Code queue is empty
    === Filament sensors ===
    Extruder 0 sensor: ok
    Extruder 1 sensor: no filament
    === CAN ===
    Messages queued 1647, received 3747, lost 0, boc 0
    Longest wait 0ms for reply type 0, peak Tx sync delay 6, free buffers 50 (min 50), ts 915/915/0
    Tx timeouts 0,0,0,0,0,0
    === SBC interface ===
    Transfer state: 4, failed transfers: 0, checksum errors: 0
    RX/TX seq numbers: 50228/50228
    SPI underruns 0, overruns 0
    State: 5, disconnects: 0, timeouts: 0, IAP RAM available 0x2b880
    Buffer RX/TX: 0/0-0, open files: 0
    === Duet Control Server ===
    Duet Control Server v3.4.0
    Code buffer space: 4096
    Configured SPI speed: 8000000Hz, TfrRdy pin glitches: 0
    Full transfers per second: 38.97, max time between full transfers: 40.1ms, max pin wait times: 27.0ms/2.3ms
    Codes per second: 0.01
    Maximum length of RX/TX data transfers: 3521/804
    1 Reply Last reply Reply Quote 0
    • undefined
      Phaedrux Moderator
      last edited by 5 Apr 2022, 20:47

      Can you share your full config.g and the results of M98 P"config.g" please?

      Z-Bot CoreXY Build | Thingiverse Profile

      undefined 1 Reply Last reply 5 Apr 2022, 23:52 Reply Quote 0
      • undefined
        adammhaile @Phaedrux
        last edited by 5 Apr 2022, 23:52

        @phaedrux said in CAN bus anomalies with 6HC and 3HC:

        Can you share your full config.g and the results of M98 P"config.g" please?

        There's a lot to share, so I'll just link to it on GitHub if that's ok:
        https://github.com/adammhaile/Rancor/tree/f48014f7924d1a6958722ad6c128a8c379bcafbb/sd/sys
        That's to a specific commit so nothing will change there from what I have right now.

        undefined 1 Reply Last reply 6 Apr 2022, 09:54 Reply Quote 0
        • undefined
          dc42 administrators @adammhaile
          last edited by 6 Apr 2022, 09:54

          @adammhaile I think we will need to replace your 3HC board. However, before we do that, please try updating the bootloader on the 3HC. See https://docs.duet3d.com/en/User_manual/RepRapFirmware/Updating_bootloader.

          Duet WiFi hardware designer and firmware engineer
          Please do not ask me for Duet support via PM or email, use the forum
          http://www.escher3d.com, https://miscsolutions.wordpress.com

          undefined 1 Reply Last reply 6 Apr 2022, 11:54 Reply Quote 0
          • undefined
            adammhaile @dc42
            last edited by 6 Apr 2022, 11:54

            @dc42 said in CAN bus anomalies with 6HC and 3HC:

            @adammhaile I think we will need to replace your 3HC board. However, before we do that, please try updating the bootloader on the 3HC. See https://docs.duet3d.com/en/User_manual/RepRapFirmware/Updating_bootloader.

            Oh wow - I had figured it was something stupid that I did in the wiring or config. What do you think is wrong with the 3HC board?

            I was able to update the bootloader and, sure enough, it actually shows a bootloader ID now - I hadn't noticed that was missing before.
            Here's the diagnostics output:

            M122 b1
            Diagnostics for board 1:
            Duet EXP3HC firmware version 3.4.0 (2022-03-15 08:59:40)
            Bootloader ID: SAME5x bootloader version 2.4 (2021-12-10)
            All averaging filters OK
            Never used RAM 158416, free system stack 206 words
            Tasks: Move(notifyWait,0.0%,160) HEAT(notifyWait,0.0%,108) CanAsync(notifyWait,0.0%,69) CanRecv(notifyWait,0.0%,82) CanClock(notifyWait,0.0%,71) TMC(notifyWait,7.6%,99) MAIN(running,91.0%,345) IDLE(ready,0.0%,39) AIN(delaying,1.3%,263), total 100.0%
            Last reset 00:00:12 ago, cause: software
            Last software reset data not available
            Driver 0: pos 0, 160.0 steps/mm,standstill, SG min 0, mspos 8, reads 48060, writes 16 timeouts 0, steps req 0 done 0
            Driver 1: pos 0, 160.0 steps/mm,standstill, SG min 0, mspos 8, reads 48061, writes 16 timeouts 0, steps req 0 done 0
            Driver 2: pos 0, 397.0 steps/mm,standstill, SG min 0, mspos 8, reads 48061, writes 16 timeouts 0, steps req 0 done 0
            Moves scheduled 0, completed 0, in progress 0, hiccups 0, step errors 0, maxPrep 0, maxOverdue 0, maxInc 0, mcErrs 0, gcmErrs 0
            Peak sync jitter 2/8, peak Rx sync delay 174, resyncs 0/0, no step interrupt scheduled
            VIN voltage: min 24.2, current 24.2, max 24.2
            V12 voltage: min 12.2, current 12.2, max 12.2
            MCU temperature: min 33.4C, current 33.6C, max 33.6C
            Last sensors broadcast 0x00000004 found 1 31 ticks ago, 0 ordering errs, loop time 0
            CAN messages queued 216, send timeouts 0, received 115, lost 0, free buffers 37, min 37, error reg 0
            dup 0, oos 0/0/0/0, bm 0, wbm 0, rxMotionDelay 0
            === Filament sensors ===
            Interrupt 5726621 to 0us, poll 1 to 114us
            Driver 2: no filament

            Let me know if there's anything else you need from me.
            Thanks!

            undefined 1 Reply Last reply 6 Apr 2022, 14:46 Reply Quote 0
            • undefined
              dc42 administrators @adammhaile
              last edited by 6 Apr 2022, 14:46

              @adammhaile does the 3HC still have the same problem, responding to M122 but the status LED not flashing in sync with the main board?

              Duet WiFi hardware designer and firmware engineer
              Please do not ask me for Duet support via PM or email, use the forum
              http://www.escher3d.com, https://miscsolutions.wordpress.com

              undefined 1 Reply Last reply 6 Apr 2022, 14:50 Reply Quote 0
              • undefined
                adammhaile @dc42
                last edited by 6 Apr 2022, 14:50

                @dc42 said in CAN bus anomalies with 6HC and 3HC:

                does the 3HC still have the same problem, responding to M122 but the status LED not flashing in sync with the main board?

                Not currently, no. I have not seen it happen again since I initially reported the issue, even before I updated the bootloader this morning. I'm stumped as to why - the machine had been working fine. Then for a couple of days I kept getting the issue with the light blinking out of sync and the motors on that side not working. Then I reported it and have not seen it happen again since.

                Is there a typical reason for the light flashing out of sync like that? I'm certainly happy it works now but would be great to nail down why it did that so it doesn't risk it happening again.

                undefined 1 Reply Last reply 6 Apr 2022, 14:54 Reply Quote 0
                • undefined
                  dc42 administrators @adammhaile
                  last edited by dc42 4 Jun 2022, 14:57 6 Apr 2022, 14:54

                  @adammhaile if communication is working but the led is not flashing in sync, that means the software frequency locked loop that keeps the clock on the 3HC synchronised with the 6XD was not able to maintain sync. That could mean that the crystal oscillator on the 3HC or on the 6HC drifts too much with temperature.

                  The peak sync jitter in the M122 B1 report can give a clue that this is happening. The values in your most recent report are minimum +2 and maximum +8 which are OK.

                  Duet WiFi hardware designer and firmware engineer
                  Please do not ask me for Duet support via PM or email, use the forum
                  http://www.escher3d.com, https://miscsolutions.wordpress.com

                  undefined 1 Reply Last reply 6 Apr 2022, 15:03 Reply Quote 0
                  • undefined
                    adammhaile @dc42
                    last edited by 6 Apr 2022, 15:03

                    @dc42 said in CAN bus anomalies with 6HC and 3HC:

                    The peak sync jitter in the M122 B1 report can give a clue that this is happening. The values in your most recent report are minimum +2 and maximum +8 which are OK.

                    Interesting... so the jitter values on the 3HC from when it was flashing out of sync (it was actually doing that at the time of that diagnostic) were 0 and 0... are those still ok?
                    I would assume that means no jitter - which is odd given it was out of sync at the time.

                    undefined 1 Reply Last reply 6 Apr 2022, 15:27 Reply Quote 0
                    • undefined
                      dc42 administrators @adammhaile
                      last edited by 6 Apr 2022, 15:27

                      @adammhaile when it's out of sync there are no jitter values available, so it reports 0/0. The jitter values just before it lost sync would have been interesting!

                      Duet WiFi hardware designer and firmware engineer
                      Please do not ask me for Duet support via PM or email, use the forum
                      http://www.escher3d.com, https://miscsolutions.wordpress.com

                      undefined 2 Replies Last reply 6 Apr 2022, 17:12 Reply Quote 0
                      • undefined
                        adammhaile @dc42
                        last edited by 6 Apr 2022, 17:12

                        @dc42 said in CAN bus anomalies with 6HC and 3HC:

                        The jitter values just before it lost sync would have been interesting!

                        Any way to log those in real time in case it happens again?

                        Any recommendations as next steps? You seemed like a replacement was a good idea before but now I'm not sure... Just don't want this happening randomly again in the future. It's a big machine so a random fail mid-print could be hugely wasteful.

                        1 Reply Last reply Reply Quote 0
                        • undefined
                          adammhaile @dc42
                          last edited by 7 Apr 2022, 22:54

                          @dc42 Is it possible it's also either the 6HC or maybe the CAN bus cable?
                          If it's only possible the issue was caused by the 3HC I'd almost rather just replace it immediately in the interest making sure it doesn't happen in the future... getting ready to bring the machine to MRRF so I only have so much time to make sure it's running reliably.

                          1 Reply Last reply Reply Quote 0
                          • undefined
                            Phaedrux Moderator
                            last edited by 8 Apr 2022, 00:35

                            I'll see what DC42 thinks about 3hc replacement whether it's worthwhile or not.

                            Z-Bot CoreXY Build | Thingiverse Profile

                            undefined 1 Reply Last reply 8 Apr 2022, 00:35 Reply Quote 0
                            • undefined
                              adammhaile @Phaedrux
                              last edited by 8 Apr 2022, 00:35

                              @phaedrux said in CAN bus anomalies with 6HC and 3HC:

                              I'll see what DC42 thinks about 3hc replacement whether it's worthwhile or not.

                              Thanks!

                              1 Reply Last reply Reply Quote 0
                              • undefined
                                adammhaile
                                last edited by 10 Apr 2022, 04:08

                                @phaedrux said in CAN bus anomalies with 6HC and 3HC:

                                I'll see what DC42 thinks about 3hc replacement whether it's worthwhile or not.

                                @Phaedrux @dc42 a bit of an update...
                                <sigh> This is getting frustrating.
                                Early on in building this machine I had 1 instance where mid-print it just stopped. DWC status page showed the "Print Again?" button and after digging into the duetcontrolserver logs I found a SPI connection has been reset message.
                                I dug into the forums and found lots of mentions of grounding/static issues. So I went to work making sure everything was fully grounded - which it is now. Everything is grounded to the frame and I can even confirm full continuity between ground and the nozzle, all motors, pulleys, etc. Pretty much anything metal is grounded.
                                I even went to the point of tying 24V negative to ground so that the "ground" on all the electronics would be actually grounded.

                                I thought that problem had been solved as I had never seen it again. And then the CAN bus issue started happening. Except that within a day of reporting it here that stopped and all seemed to be going fine. Until today when I got the SPI connection has been reset message twice and the CAN bus issue once 😡

                                Fist SPI issue was when I was just testing out some new meta commands and noticed that DWC disconnected. CAN bus problem happened in a similar scenario - while I was setting up some new scripting - and then when I went to home for a new print the Y axis bound up because only the left motor was driven. And then just now it stopped again mid print with the SPI issue again.

                                I think the most frustrating thing about both of these is that there's no real logic to when or why they happen. But I'm about to throw both these boards in the trash.
                                While this is certainly not my first time dealing with a complicated electronic system (I've been building my own computers, printers, plotters, CNC machines, and designing my own PCBs for a long, long time).
                                I even had a friend, who is an electrical engineer that specializes in failure analysis, take a look and he couldn't come up with anything I've done that seemed wrong.

                                I very much do not trust either the 6HC or 3HC at this point - I feel like my only options at this point are replace both boards or find a completely different controller that meets my needs. I'm not trying to be threatening here or anything, I'm just supremely frustrated.

                                undefined 1 Reply Last reply 10 Apr 2022, 07:30 Reply Quote 0
                                • undefined
                                  dc42 administrators @adammhaile
                                  last edited by 10 Apr 2022, 07:30

                                  @adammhaile I'm sorry to hear that you are still having problems. Did you run a M122 report after the SPI connection reset message? If the 6HC reset then this may help us determine why. It may not be too late to run one now.

                                  Duet WiFi hardware designer and firmware engineer
                                  Please do not ask me for Duet support via PM or email, use the forum
                                  http://www.escher3d.com, https://miscsolutions.wordpress.com

                                  undefined 1 Reply Last reply 10 Apr 2022, 12:10 Reply Quote 0
                                  • undefined
                                    adammhaile @dc42
                                    last edited by 10 Apr 2022, 12:10

                                    @dc42 said in CAN bus anomalies with 6HC and 3HC:

                                    @adammhaile I'm sorry to hear that you are still having problems. Did you run a M122 report after the SPI connection reset message? If the 6HC reset then this may help us determine why. It may not be too late to run one now.

                                    Unfortunately, no - the first time it happened I SSHed into the Pi to confirm it was the SPI error message but forgot to run the diagnostic. The second time it happened I tried but DWC just didn't respond, even after it said it was connected again.
                                    Is there a way to run such commands from the pi terminal directly?

                                    Also - new datapoint... I just started up the same print it failed on last night again and I wanted to move the bed down so I could clear it of the failed print. Since I couldn't home it first, I do what I normally do and issued M564 H0 to let me move without homing... except nothing happened. The console showed that the command had been run but the axis didn't unlock for movement in DWC.

                                    BTW - since the first time the SPI issue happened and now I've also replaced the Pi itself with a brand new one. Same exact model (Pi 4 w/ 4GB RAM).

                                    That print is running now - if it SPI or CAN bus fails again I will update here ASAP.

                                    undefined 1 Reply Last reply 11 Apr 2022, 00:52 Reply Quote 0
                                    • undefined
                                      adammhaile @adammhaile
                                      last edited by 11 Apr 2022, 00:52

                                      @dc42 Update: Ran prints all day. Nothing. I was even screen recording DWC and a couple different camera angles to see the exact moment. This is what's most frustrating - it'll happen the moment I get comfortable with it again 😕

                                      undefined 1 Reply Last reply 11 Apr 2022, 10:12 Reply Quote 0
                                      • undefined
                                        chrishamm administrators @adammhaile
                                        last edited by 11 Apr 2022, 10:12

                                        @adammhaile If you see occasional SPI connection resets, please consider reflashing your microSD card. See here why it could help.

                                        Duet software engineer

                                        undefined 1 Reply Last reply 11 Apr 2022, 12:33 Reply Quote 0
                                        • undefined
                                          adammhaile @chrishamm
                                          last edited by adammhaile 4 Nov 2022, 13:07 11 Apr 2022, 12:33

                                          @chrishamm said in CAN bus anomalies with 6HC and 3HC:

                                          @adammhaile If you see occasional SPI connection resets, please consider reflashing your microSD card. See here why it could help.

                                          I originally flashed it quite awhile ago and it's running buster, not bullseye.
                                          Granted, I have run apt upgrade a few times since - could it still be affected?

                                          pi@rancor:~ $ cat /etc/os-release
                                          PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
                                          NAME="Raspbian GNU/Linux"
                                          VERSION_ID="10"
                                          VERSION="10 (buster)"
                                          VERSION_CODENAME=buster
                                          ID=raspbian
                                          ID_LIKE=debian
                                          HOME_URL="http://www.raspbian.org/"
                                          SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
                                          BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"

                                          @chrishamm Update: I found the image I used the last time I did a clean re-flash and it was on Feb 15th, 2022 with 2021-07-12-DuetPi-lite.img

                                          I also realize I should probably also note the few things I've done with that image:

                                          • dsf was given a user directory and the ability to login to that account. This is so that I can SSH to the Pi and directly edit the files in the sys directory. I do this so that I can use VS Code's remote features and have multiple files open at a time. It's SO much faster than going through DWC when you have a lot of edits to make - and I have an extensive system of conditional logic for tool and filament management.
                                          • It's running isc-dhcp-server (dhcpd) to provide an IP to another Pi in the printer that's running android and driving a large touch screen that displays DWC.
                                          • It's running a slightly modified version of the webcamd mjpg-streamer service from OctoPi for on-board camera streaming. I did this before the motion camera plugin was available. And even after I was never able to get it to serve up the stream larger than 640x480. Since my previous solution worked I just went back to that.
                                          1 Reply Last reply Reply Quote 0
                                          9 out of 52
                                          • First post
                                            9/52
                                            Last post
                                          Unless otherwise noted, all forum content is licensed under CC-BY-SA