Duet3D Logo Duet3D
    • Tags
    • Documentation
    • Order
    • Register
    • Login

    CAN bus anomalies with 6HC and 3HC

    Scheduled Pinned Locked Moved Solved
    Duet Hardware and wiring
    6
    52
    2.9k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • adammhaileundefined
      adammhaile @adammhaile
      last edited by

      @dc42 Update: Ran prints all day. Nothing. I was even screen recording DWC and a couple different camera angles to see the exact moment. This is what's most frustrating - it'll happen the moment I get comfortable with it again πŸ˜•

      chrishammundefined 1 Reply Last reply Reply Quote 0
      • chrishammundefined
        chrishamm administrators @adammhaile
        last edited by

        @adammhaile If you see occasional SPI connection resets, please consider reflashing your microSD card. See here why it could help.

        Duet software engineer

        adammhaileundefined 1 Reply Last reply Reply Quote 0
        • adammhaileundefined
          adammhaile @chrishamm
          last edited by adammhaile

          @chrishamm said in CAN bus anomalies with 6HC and 3HC:

          @adammhaile If you see occasional SPI connection resets, please consider reflashing your microSD card. See here why it could help.

          I originally flashed it quite awhile ago and it's running buster, not bullseye.
          Granted, I have run apt upgrade a few times since - could it still be affected?

          pi@rancor:~ $ cat /etc/os-release
          PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
          NAME="Raspbian GNU/Linux"
          VERSION_ID="10"
          VERSION="10 (buster)"
          VERSION_CODENAME=buster
          ID=raspbian
          ID_LIKE=debian
          HOME_URL="http://www.raspbian.org/"
          SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
          BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
          

          @chrishamm Update: I found the image I used the last time I did a clean re-flash and it was on Feb 15th, 2022 with 2021-07-12-DuetPi-lite.img

          I also realize I should probably also note the few things I've done with that image:

          • dsf was given a user directory and the ability to login to that account. This is so that I can SSH to the Pi and directly edit the files in the sys directory. I do this so that I can use VS Code's remote features and have multiple files open at a time. It's SO much faster than going through DWC when you have a lot of edits to make - and I have an extensive system of conditional logic for tool and filament management.
          • It's running isc-dhcp-server (dhcpd) to provide an IP to another Pi in the printer that's running android and driving a large touch screen that displays DWC.
          • It's running a slightly modified version of the webcamd mjpg-streamer service from OctoPi for on-board camera streaming. I did this before the motion camera plugin was available. And even after I was never able to get it to serve up the stream larger than 640x480. Since my previous solution worked I just went back to that.
          1 Reply Last reply Reply Quote 0
          • adammhaileundefined
            adammhaile
            last edited by

            @chrishamm @dc42 @Phaedrux

            At this point I'm ready to just suck it up and buy new controllers unless you are willing to RMA these boards. I'm beyond frustrated.

            I was able to get both issues to happen again - at the same time.
            I was running yet another simple test print and out of nowhere it just stopped dead... unlike other times though, the connection to the system never really came back. Eventually DWC on the built in Android screen on the machine came back enough to display "SPI connection has been reset"

            But otherwise I could not remotely access the machine at all. Could not get to DWC from my any other system on my network and could not SSH into the Pi.

            I fortunately have a screen on the Duet Pi and was able to connect a keyboard and run a couple things before rebooting the system.

            One note about the DCS log below - the SPI reset seems to happen over and over again.

            At the bottom you will see diagnostics for the 6HC and logs from DuetControlServer when the event happened. I was unable to get M122 to output the 3HC diagnostics - it just returned error every time.

            Not only did the SPI comms issue occur but when it did the LED on the 3HC was blinking rapidly - sadly that's all I could tell because, as noted, I was unable to grab diagnostics from it - it just seemed completely disconnected.

            One new thing of note: I think this is only happening when I use the right tool - the one that's using the 3HC. To recap from previous: The 3HC controls the T1 extruder, T1 X axis (the U axis), and the right side Y motor. So even if T1 isn't being used the 3HC is always involved at least with the Y motor. But it seems to only happen when I'm running jobs either with both tools or only with T1. No idea what that means - hopefully it will make sense to you.

            6HC Diagnostics - this was captured about 3-4 minutes after all went to hell. It took me that long to gain access to the system and figure out how to run M122 from the pi terminal.

            === Diagnostics ===
            RepRapFirmware for Duet 3 MB6HC version 3.4.0 (2022-03-15 18:57:24) running on Duet 3 MB6HC v1.01 or later (SBC mode)
            Board ID: 08DJM-956BA-NA3TN-6J1FG-3S86T-TUBUS
            Used output buffers: 1 of 40 (15 max)
            === RTOS ===
            Static ram: 151000
            Dynamic ram: 69008 of which 0 recycled
            Never used RAM 127280, free system stack 114 words
            Tasks: SBC(ready,0.4%,438) HEAT(suspended,0.0%,321) TMC(notifyWait,8.0%,58) MAIN(running,91.6%,1147) IDLE(ready,0.0%,30), total 100.0%
            Owned mutexes: HTTP(MAIN)
            === Platform ===
            Last reset 05:22:14 ago, cause: power up
            Last software reset details not available
            Error status: 0x00
            Aux1 errors 0,0,0
            Step timer max interval 127
            MCU temperature: min 46.0, current 46.0, max 46.0
            Supply voltage: min 23.9, current 23.9, max 23.9, under voltage events: 0, over voltage events: 0, power good: yes
            12V rail voltage: min 12.1, current 12.1, max 12.1, under voltage events: 0
            Heap OK, handles allocated/used 99/52, heap memory allocated/used/recyclable 2048/1620/986, gc cycles 5
            Events: 0 queued, 0 completed
            Driver 0: standstill, SG min n/a, mspos 184, reads 12908, writes 0 timeouts 0
            Driver 1: standstill, SG min n/a, mspos 504, reads 12907, writes 0 timeouts 0
            Driver 2: standstill, SG min n/a, mspos 8, reads 12907, writes 0 timeouts 0
            Driver 3: standstill, SG min n/a, mspos 152, reads 12908, writes 0 timeouts 0
            Driver 4: standstill, SG min n/a, mspos 152, reads 12908, writes 0 timeouts 0
            Driver 5: standstill, SG min n/a, mspos 152, reads 12908, writes 0 timeouts 0
            Date/time: 2022-04-11 18:51:59
            Slowest loop: 1.55ms; fastest: 0.05ms
            === Storage ===
            Free file entries: 10
            SD card 0 not detected, interface speed: 37.5MBytes/sec
            SD card longest read time 0.0ms, write time 0.0ms, max retries 0
            === Move ===
            DMs created 125, segments created 22, maxWait 0ms, bed compensation in use: mesh, comp offset 0.000
            === MainDDARing ===
            Scheduled moves 45757, completed 45757, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1
            === AuxDDARing ===
            Scheduled moves 0, completed 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1
            === Heat ===
            Bed heaters 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamber heaters -1 -1 -1 -1, ordering errs 0
            === GCodes ===
            Segments left: 0
            Movement lock held by null
            HTTP* is doing "M122 B0" in state(s) 0
            Telnet is idle in state(s) 0
            File* is idle in state(s) 0
            USB is idle in state(s) 0
            Aux is idle in state(s) 0
            Trigger* is idle in state(s) 0
            Queue* is idle in state(s) 0
            LCD is idle in state(s) 0
            SBC is idle in state(s) 0
            Daemon is idle in state(s) 0
            Aux2 is idle in state(s) 0
            Autopause is idle in state(s) 0
            Code queue is empty
            === Filament sensors ===
            Extruder 0 sensor: ok
            Extruder 1 sensor: no filament
            === CAN ===
            Disabled
            Longest wait 0ms for reply type 0, peak Tx sync delay 0, free buffers 50 (min 49), ts 0/0/0
            Tx timeouts 0,0,0,0,0,0
            === SBC interface ===
            Transfer state: 4, failed transfers: 0, checksum errors: 0
            RX/TX seq numbers: 41225/1471
            SPI underruns 0, overruns 0
            State: 5, disconnects: 12, timeouts: 12, IAP RAM available 0x2b880
            Buffer RX/TX: 0/0-0, open files: 0
            === Duet Control Server ===
            Duet Control Server v3.4.0
            Code buffer space: 4096
            Configured SPI speed: 8000000Hz, TfrRdy pin glitches: 0
            Full transfers per second: 36.63, max time between full transfers: 4566.7ms, max pin wait times: 26.1ms/0.3ms
            Codes per second: 0.13
            Maximum length of RX/TX data transfers: 3868/1520
            

            DuetControlServer logs - the M800 is just a custom macro that runs for various print events. It sends a serial message to an external arduino that plays some audio.

            Apr 11 18:40:38 rancor DuetControlServer[370]: [info] Finished macro file M800.g
            Apr 11 18:41:20 rancor DuetControlServer[370]: [info] Starting macro file M800.g on channel File
            Apr 11 18:41:20 rancor DuetControlServer[370]: [info] Finished macro file M800.g
            Apr 11 18:41:53 rancor DuetControlServer[370]: [info] Starting macro file M800.g on channel File
            Apr 11 18:41:53 rancor DuetControlServer[370]: [info] Finished macro file M800.g
            Apr 11 18:42:30 rancor DuetControlServer[370]: [info] System time has been changed
            Apr 11 18:42:30 rancor DuetControlServer[370]: [warn] SPI connection has been reset
            Apr 11 18:42:30 rancor DuetControlServer[370]: [warn] Trigger: Out-of-order reply: ''
            Apr 11 18:42:30 rancor DuetControlServer[370]: [info] Aborted job file
            Apr 11 18:42:55 rancor DuetControlServer[370]: [info] System time has been changed
            Apr 11 18:42:55 rancor DuetControlServer[370]: [warn] SPI connection has been reset
            Apr 11 18:42:55 rancor DuetControlServer[370]: [warn] Trigger: Out-of-order reply: ''
            Apr 11 18:43:10 rancor DuetControlServer[370]: [info] System time has been changed
            Apr 11 18:43:10 rancor DuetControlServer[370]: [warn] SPI connection has been reset
            Apr 11 18:43:10 rancor DuetControlServer[370]: [warn] Trigger: Out-of-order reply: ''
            Apr 11 18:43:19 rancor DuetControlServer[370]: [info] System time has been changed
            Apr 11 18:43:28 rancor DuetControlServer[370]: [info] System time has been changed
            Apr 11 18:43:37 rancor DuetControlServer[370]: [info] System time has been changed
            Apr 11 18:43:47 rancor DuetControlServer[370]: [info] System time has been changed
            Apr 11 18:43:56 rancor DuetControlServer[370]: [info] System time has been changed
            Apr 11 18:44:05 rancor DuetControlServer[370]: [info] System time has been changed
            Apr 11 18:44:05 rancor DuetControlServer[370]: [warn] SPI connection has been reset
            Apr 11 18:44:05 rancor DuetControlServer[370]: [warn] Trigger: Out-of-order reply: ''
            Apr 11 18:44:14 rancor DuetControlServer[370]: [info] System time has been changed
            Apr 11 18:44:23 rancor DuetControlServer[370]: [info] System time has been changed
            Apr 11 18:44:32 rancor DuetControlServer[370]: [info] System time has been changed
            Apr 11 18:44:41 rancor DuetControlServer[370]: [info] System time has been changed
            Apr 11 18:44:50 rancor DuetControlServer[370]: [info] System time has been changed
            Apr 11 18:44:59 rancor DuetControlServer[370]: [info] System time has been changed
            Apr 11 18:44:59 rancor DuetControlServer[370]: [warn] SPI connection has been reset
            Apr 11 18:45:00 rancor DuetControlServer[370]: [warn] Trigger: Out-of-order reply: ''
            Apr 11 18:45:09 rancor DuetControlServer[370]: [info] System time has been changed
            Apr 11 18:45:18 rancor DuetControlServer[370]: [info] System time has been changed
            Apr 11 18:45:27 rancor DuetControlServer[370]: [info] System time has been changed
            Apr 11 18:45:36 rancor DuetControlServer[370]: [info] System time has been changed
            Apr 11 18:45:45 rancor DuetControlServer[370]: [info] System time has been changed
            Apr 11 18:45:54 rancor DuetControlServer[370]: [info] System time has been changed
            Apr 11 18:45:54 rancor DuetControlServer[370]: [warn] SPI connection has been reset
            Apr 11 18:45:54 rancor DuetControlServer[370]: [warn] Trigger: Out-of-order reply: ''
            Apr 11 18:46:03 rancor DuetControlServer[370]: [info] System time has been changed
            Apr 11 18:46:12 rancor DuetControlServer[370]: [info] System time has been changed
            Apr 11 18:46:22 rancor DuetControlServer[370]: [info] System time has been changed
            Apr 11 18:46:31 rancor DuetControlServer[370]: [info] System time has been changed
            
            chrishammundefined 1 Reply Last reply Reply Quote 0
            • Phaedruxundefined
              Phaedrux Moderator
              last edited by

              Please send an email toΒ warranty@duet3d.comΒ and CC your reseller. Include a link to this forum thread and the details of your original purchase. You'll receive a reply with a form to fill out.

              Of course we will continue to try and understand and resolve the issue.

              Sorry for the inconvenience and thank you for your patience.

              Z-Bot CoreXY Build | Thingiverse Profile

              adammhaileundefined 1 Reply Last reply Reply Quote 0
              • adammhaileundefined
                adammhaile @Phaedrux
                last edited by

                @phaedrux Done. Will handle the form as soon as I get it.
                Thank you πŸ™‚

                1 Reply Last reply Reply Quote 0
                • chrishammundefined
                  chrishamm administrators @adammhaile
                  last edited by

                  @adammhaile Thanks for the log. You have lots of "System time has been changed" messages in there which indicates an I/O or CPU overload on the SBC that can cause frequent timeouts - in detail, the application on the SBC (DCS) fails to get CPU time from the Linux kernel frequently enough so timeouts are a likely consequence.

                  If you can confirm the CPU usage is normal on the SBC, please consider replacing your SD card with an A-rated microSD card which is better suited for concurrent IO. That should eliminate those messages, too.

                  Duet software engineer

                  adammhaileundefined 1 Reply Last reply Reply Quote 0
                  • adammhaileundefined
                    adammhaile @chrishamm
                    last edited by

                    @chrishamm Interesting...
                    I've been using one of these microSD cards which is typical for me on the Pi and especially for one that is in a setup like this where "properly" shutting it down each time is not easy.

                    291b8c82-345f-4d80-83fa-519bb1f700f9-image.png

                    I noticed in the docs mention of an SD card speed test, which I ran but I'm thinking that it is only meant for a card mounted in the Duet, not the SBC... because... well, these are horrible numbers:

                    4/12/2022, 8:35:48 AM	M122 P104 S5
                    Testing SD card write speed...
                    4/12/2022, 8:36:26 AM	SD write speed for 5.0Mbyte file was 0.13Mbytes/sec
                    4/12/2022, 8:36:26 AM	Testing SD card read speed...
                    4/12/2022, 8:43:50 AM	SD read speed for 5.0Mbyte file was 0.01Mbytes/sec
                    

                    As for CPU usage - Note: this is a Pi 4 w/ 4GB RAM. No overclock.

                    This is at machine idle - just on, no job running:
                    b0dccbef-df8d-4e07-a7e3-ff2871c638d0-image.png

                    This is during the text at the bottom of a benchy - so tons of tiny moves:
                    a2227053-d240-43c8-a323-0098553a36bb-image.png

                    This is a few seconds after the last, with an mjpg_streamer camera stream started:
                    8e467840-59ed-479f-9b4b-8d540cd4269d-image.png

                    chrishammundefined 1 Reply Last reply Reply Quote 0
                    • chrishammundefined
                      chrishamm administrators @adammhaile
                      last edited by

                      @adammhaile The CPU usage looks OK but I agree the SD test is pretty disappointing. I've been using these SanDisk Extreme 64GB A2 cards and overwrote all of them countless times for DuetPi tests and they're still perfectly fine.

                      I'm still happy with the Samsung SSDs I have but I cannot say much about the quality of their microSD cards.

                      Duet software engineer

                      adammhaileundefined 1 Reply Last reply Reply Quote 0
                      • adammhaileundefined
                        adammhaile @chrishamm
                        last edited by

                        @chrishamm said in CAN bus anomalies with 6HC and 3HC:

                        The CPU usage looks OK but I agree the SD test is pretty disappointing.

                        I still am confused by those results - If I run a perf test from the Pi command line (using agnostic) I'm getting 45MB/s writes and 60+ reads.

                        @chrishamm said in CAN bus anomalies with 6HC and 3HC:

                        I've been using these SanDisk Extreme 64GB A2 cards

                        Ha! I ordered 2 of those this morning πŸ™‚

                        Do you think I would be safe simply cloning my existing SD to thew new one or should I start from scratch?

                        1 Reply Last reply Reply Quote 0
                        • Phaedruxundefined
                          Phaedrux Moderator
                          last edited by

                          I would back up the configs and start with a fresh DuetPi image, at least for testing. If you have more customizations you wish to preserve cloning the card should be an option.

                          Z-Bot CoreXY Build | Thingiverse Profile

                          adammhaileundefined 1 Reply Last reply Reply Quote 0
                          • adammhaileundefined
                            adammhaile @Phaedrux
                            last edited by

                            @phaedrux said in CAN bus anomalies with 6HC and 3HC:

                            I would back up the configs and start with a fresh DuetPi image, at least for testing. If you have more customizations you wish to preserve cloning the card should be an option.

                            Ok, will do that for now then.

                            1 Reply Last reply Reply Quote 0
                            • adammhaileundefined
                              adammhaile
                              last edited by

                              @phaedrux @chrishamm @dc42
                              I've got to remove these boards and send them back to Filastruder - anything else you want me to try before I do that?

                              T3P3Tonyundefined 1 Reply Last reply Reply Quote 0
                              • T3P3Tonyundefined
                                T3P3Tony administrators @adammhaile
                                last edited by

                                @adammhaile when so the new SD cards arrive? Would it be too disruptive to ask you to test with one of those.

                                I realise that you have been plugging away at this issue for a while so if you can't wait then I understand.

                                www.duet3d.com

                                adammhaileundefined 1 Reply Last reply Reply Quote 0
                                • adammhaileundefined
                                  adammhaile @T3P3Tony
                                  last edited by adammhaile

                                  @t3p3tony said in CAN bus anomalies with 6HC and 3HC:

                                  when so the new SD cards arrive? Would it be too disruptive to ask you to test with one of those.

                                  No problem - they arrive today. I can likely give it a shot tonight.

                                  adammhaileundefined 1 Reply Last reply Reply Quote 1
                                  • adammhaileundefined
                                    adammhaile @adammhaile
                                    last edited by adammhaile

                                    @chrishamm @T3P3Tony I don't think I trust the M122 P104 SD card test.... I'm using the new recommended SD card I've I've tested on my desktop at over 140MB/s but when running the diagnostic speed test I get the same exact results as before.
                                    Honestly the fact that it's the exact same speed every time makes me feel like it's a bottleneck elsewhere - likely in the diagnostics code.
                                    Especially given that I'm consistently able to upload gcode files at ~15MB/s no problem.
                                    Though.... maybe this is part of the problem... The 15MB/s upload is through DWC which would be direct to the Pi.
                                    But the diagnostics SD write test is running from the 6HC control board itself - so maybe that bottleneck is the SPI bus and that's causing my problems?

                                    Is there anywhere I could purchase a new 6HC ribbon cable? Wondering if I should replace that too - though can't find anything that I'm sure is correct.

                                    It's late now but I will run some print tests in the morning to see if I can cause any other fails, before I pack up the boards to ship back to Filastruder.

                                    chrishammundefined 1 Reply Last reply Reply Quote 0
                                    • Phaedruxundefined
                                      Phaedrux Moderator
                                      last edited by

                                      You could measure the continuity and resistance on the ribbon cable, that would tell us if it's acceptable or not.

                                      Z-Bot CoreXY Build | Thingiverse Profile

                                      1 Reply Last reply Reply Quote 0
                                      • chrishammundefined
                                        chrishamm administrators @adammhaile
                                        last edited by

                                        @adammhaile is this with the new SanDisk card?

                                        Duet software engineer

                                        adammhaileundefined 1 Reply Last reply Reply Quote 0
                                        • adammhaileundefined
                                          adammhaile @chrishamm
                                          last edited by

                                          @chrishamm said in CAN bus anomalies with 6HC and 3HC:

                                          is this with the new SanDisk card?

                                          Yes. Same card recommended above.

                                          chrishammundefined 1 Reply Last reply Reply Quote 0
                                          • chrishammundefined
                                            chrishamm administrators @adammhaile
                                            last edited by

                                            @adammhaile Please check if the disconnects persist with the new card. If they do, I'll be happy to share a new firmware build that tells us whether the timeout is caused by the SBC or by Reprapfirmware. We've got another trace but I cannot comment on that one yet.

                                            Duet software engineer

                                            adammhaileundefined 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Unless otherwise noted, all forum content is licensed under CC-BY-SA