Duet3D Logo Duet3D
    • Tags
    • Documentation
    • Order
    • Register
    • Login

    Crashes during printing - "SPI connection has been reset".

    Scheduled Pinned Locked Moved
    General Discussion
    4
    24
    1.5k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • jbjhjmundefined
      jbjhjm
      last edited by jbjhjm

      Tried to find out more on the Pi side, but as it disconnected from network permanently, I cannot ssh onto it without a full restart, which will likely wipe relevant temp data.

      yeah. Not much left to analyze.
      For better preparation for a next crash I have tweaked log settings:

      • /opt/dsf/conf/config.json -> logLevel = debug
      • /etc/systemd/journald.conf -> Storage=persistent
      chrishammundefined 1 Reply Last reply Reply Quote 0
      • jbjhjmundefined
        jbjhjm
        last edited by jbjhjm

        Updated to the latest beta 4. Reading through the changelog I don't think it will change anything regarding the SPI connection issue.

        Any ideas how to continue debugging next time it happens?
        What technical reason is there for the SPI connection failure message to appear?

        1 Reply Last reply Reply Quote 0
        • chrishammundefined
          chrishamm administrators @jbjhjm
          last edited by

          @jbjhjm Those log settings are far from ideal and in fact I got the same symptoms with persistent logging and long prints as well. The reason is that systemd flushes lots and lots of messages to the SD card in regular intervals, which probably stalls IO access and/or the stdout line at some point (due to the massive amount of log messages; more than 3x the regular G-code file length per print). Either reset LogLevel back to "info" or change the journald storage to "volatile".

          When DCS becomes unresponsive at some point (probably during a full SPI transfer and longer than 500ms), RRF thinks the Pi lost communication so it invalidates everything.

          If the resets persist with the standard log level, try out a different SD card and/or reduce IO load on the Pi as far as possible.

          Duet software engineer

          jbjhjmundefined 1 Reply Last reply Reply Quote 0
          • jbjhjmundefined
            jbjhjm @chrishamm
            last edited by

            @chrishamm oh dear, thanks for the warning. Will revert the log settings to be more lightweight.
            I did only tweak these yesterday, so all crashes until now happened with standard log settings.
            I'm using the SD card shipped with the 6HC, but can try to get a different one.

            So your guess is the error appears because the Pi is too busy to respond to RRF in time?
            Besides the RRF communication it handles streaming camera data. I can try to lower fps/resolution.

            chrishammundefined 1 Reply Last reply Reply Quote 0
            • chrishammundefined
              chrishamm administrators @jbjhjm
              last edited by

              @jbjhjm Yes, I think so. If the logging provider hangs during SPI transfers, it's likely to reset the connection state.

              Duet software engineer

              1 Reply Last reply Reply Quote 1
              • jbjhjmundefined
                jbjhjm
                last edited by

                Experienced a crash again, this time it's been different though.
                Duet suddenly stated the print was 100% done. No other errors.

                I was not able to do a M115 / M122 this time, but the Raspi still has persistent logging enabled.
                I will check these logs later and see if anything useful is in there.

                One thing making me suspicious is I'm having a terrible lot of network disconnects.
                Whenever a print fails, DWC is offline too, and outside of erronous situations DWC/Webcam sometimes reacts very slow too.

                I'll do some investigation on how to monitor CPU and network load on the Pi.

                T3P3Tonyundefined 1 Reply Last reply Reply Quote 0
                • T3P3Tonyundefined
                  T3P3Tony administrators @jbjhjm
                  last edited by

                  @jbjhjm its also worth ensuring you are giving the Pi enough voltage. look for undervoltage events in the logs

                  www.duet3d.com

                  jbjhjmundefined 1 Reply Last reply Reply Quote 0
                  • jbjhjmundefined
                    jbjhjm @T3P3Tony
                    last edited by

                    @t3p3tony thank you, will do!

                    I applied a number of changes today, let's see how they work out:

                    • using 3.4.0b5 now
                    • disabled logging as suggested by @chrishamm - my yesterday's print logged a whopping 1.5 GB. 😵
                    • reduced webcam resolution
                    • installed htop to track CPU/Mem performance
                    • [external] modified my wifi setup; a wifi repeater was causing network performance issues. It would not surprise me if it also affected DWC / pi performance

                    htop stats say that 70-90 % of CPU load is caused by a chromium process. Unfortunately chromium always runs many parallel processes so it is difficult to investigate what this is actually doing.

                    T3P3Tonyundefined 1 Reply Last reply Reply Quote 0
                    • T3P3Tonyundefined
                      T3P3Tony administrators @jbjhjm
                      last edited by

                      @jbjhjm if you are not running DWC local to the Pi then you can obviously not run chromium at all. If you are running DWC on that Pi then see what its at with only DWC open

                      www.duet3d.com

                      jbjhjmundefined 1 Reply Last reply Reply Quote 0
                      • jbjhjmundefined
                        jbjhjm @T3P3Tony
                        last edited by

                        @t3p3tony no voltage issues reported so far by vcgencmd.
                        I'm not sure I understand what you meant with your last comment.

                        DWC is running on the Pi (at least as far as I understand Duet's SPC mode, all that is handled by the Pi while the mainboard only handles printing and reports back values?).
                        DWC also is the only opened chromium tab.
                        Nontheless chromium runs a bunch of different processes.
                        In Time/CPU columns you can see though that there is just one chromium process that uses lots of CPU time.

                        b686affd-ce45-475b-a89b-fb58d995947a-image.png

                        T3P3Tonyundefined 1 Reply Last reply Reply Quote 0
                        • T3P3Tonyundefined
                          T3P3Tony administrators @jbjhjm
                          last edited by

                          @jbjhjm I mean are you running the Pi headless and connecting via a network interface on the Pi to the webserver, or do you have a screen connected to a pi and running DWC in a browser on the Pi?

                          www.duet3d.com

                          jbjhjmundefined 1 Reply Last reply Reply Quote 0
                          • jbjhjmundefined
                            jbjhjm @T3P3Tony
                            last edited by

                            @t3p3tony ah now I get you. It's both, the pi has a permanently attached screen, and I often access DWC through network too for more complex tasks and if I'm not in the same room.

                            1 Reply Last reply Reply Quote 0
                            • jbjhjmundefined
                              jbjhjm
                              last edited by jbjhjm

                              ok it seems that the pi's network connection has again crashed just a few minutes ago; it's still listed in the routers active devices list, responds to pings, but DWC does not load anymore. The print is still being executed though.
                              So I checked what happened on the touch panel: Chrome displayed a white page + a note that it has crashed and if it should reload.
                              Now this is weird: I dismissed that message, exited fullscreen and then saw another instance of Chrome running below the crashed instance!
                              I have no clue why it is there. I did not tweak the startup routine provided by duetPi.
                              After closing the crashed chrome window, it seems that the network connection was recreated too...
                              Nothing really useful in journal (re-enabled logging hoping to hunt down networking issues). Just way too many network connection losses and reconnects. This is related to the bad wifi that I still have to improve. Disallowed auto-switching frequency bands and 2.4/5Ghz, hopefully that will make the connection a bit more robust.

                              T3P3Tonyundefined 1 Reply Last reply Reply Quote 0
                              • T3P3Tonyundefined
                                T3P3Tony administrators @jbjhjm
                                last edited by

                                @jbjhjm I hope @chrishamm has some ideas about what to look for in the logs as a cause of this.

                                www.duet3d.com

                                jbjhjmundefined 1 Reply Last reply Reply Quote 0
                                • jbjhjmundefined
                                  jbjhjm @T3P3Tony
                                  last edited by

                                  @t3p3tony when my next print is completed I'll do a full restart and check chrome status right after, if two instances are running and such.
                                  If someone can point me into the right direction for finding the duetPi startup script, I'll check if there's anything unusual going on.

                                  Attached bootlog.txt by the way.
                                  Don't know enough about raspi + linux to spot anything useful unfortunately.

                                  jbjhjmundefined 1 Reply Last reply Reply Quote 0
                                  • jbjhjmundefined
                                    jbjhjm @jbjhjm
                                    last edited by

                                    The duplicate chromium seems to be related to beta5.
                                    Just did a full restart and the screen showed a crashed chromium window right away.
                                    This has never happened before so I'm quite sure it has to do with beta5.
                                    Opened a but report to discuss this separately.
                                    https://forum.duet3d.com/topic/25542/3-4-b5-bug-chromium-crashes-on-startup-sbc

                                    T3P3Tonyundefined 1 Reply Last reply Reply Quote 0
                                    • T3P3Tonyundefined
                                      T3P3Tony administrators @jbjhjm
                                      last edited by

                                      @jbjhjm its not really crashed as such as I outlined in the other thread, rather its showing that chrome was not shutdown properly. I will leave discussion of that to the tother thread, but the huge number of chrome tasks is unusual and I am not seeing those.

                                      www.duet3d.com

                                      jbjhjmundefined 1 Reply Last reply Reply Quote 0
                                      • jbjhjmundefined
                                        jbjhjm @T3P3Tony
                                        last edited by

                                        @t3p3tony chrome shutdown/crash is fixed by the solution proposed in other topic!
                                        About the number of processes, that's my fault. I just noticed that htop was showing not only processes but every thread too. I do still see a dozen processes but that is not unusual for chrome.

                                        a9b58230-96a3-475c-9b3d-fbdcf8a3d1f6-image.png

                                        T3P3Tonyundefined 1 Reply Last reply Reply Quote 0
                                        • T3P3Tonyundefined
                                          T3P3Tony administrators @jbjhjm
                                          last edited by

                                          @jbjhjm ok, so we still need to see if you have SPI disconnect errors now.

                                          www.duet3d.com

                                          jbjhjmundefined 1 Reply Last reply Reply Quote 0
                                          • jbjhjmundefined
                                            jbjhjm @T3P3Tony
                                            last edited by

                                            @t3p3tony I will let you know if anything new occurs. Maybe b5 and the tweaked raspi settings helped to make it go away. As the error did not occur often in the past, I'll continue and observe for some days.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Unless otherwise noted, all forum content is licensed under CC-BY-SA