Duet3D Logo Duet3D
    • Tags
    • Documentation
    • Order
    • Register
    • Login

    [SBC mode] Random errors in macros with 3.5.2

    Scheduled Pinned Locked Moved
    Gcode meta commands
    3
    15
    649
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • dc42undefined
      dc42 administrators @crpalmer
      last edited by

      @crpalmer are you running in standalone or SBC mode?

      Duet WiFi hardware designer and firmware engineer
      Please do not ask me for Duet support via PM or email, use the forum
      http://www.escher3d.com, https://miscsolutions.wordpress.com

      crpalmerundefined 1 Reply Last reply Reply Quote 0
      • crpalmerundefined
        crpalmer @dc42
        last edited by

        @dc42 sbc for both machines.

        1 Reply Last reply Reply Quote 0
        • chrishammundefined
          chrishamm administrators @crpalmer
          last edited by

          @crpalmer I ran your macro from a print file 10 times in a row with v3.5.2 and it completed every time as expected. Can you please share a print file and files /sys/homexy-if-needed.g and /sys/retractprobe-forced.g as well?

          Duet software engineer

          crpalmerundefined 1 Reply Last reply Reply Quote 0
          • crpalmerundefined
            crpalmer @chrishamm
            last edited by

            @chrishamm / @dc42 The two machines I am running 3.5.2 configs are:

            https://github.com/crpalmer/3d-printing/tree/master/veho
            https://github.com/crpalmer/3d-printing/tree/master/ender5

            For whatever reason, it happens much more frequently on "ender5" (don't be confused by the name, all that is left of the original ender-5 is the frame). It doesn't happen all the time and I'm starting to wonder if it happens more frequently when it has been running for awhile (perhaps some kind of subtle memory corruption?). It did happen again yesterday:

            9/10/2024, 6:27:28 AM	Error: in file prepare-to-print.g line 32: unknown variable 'probing_temp'
            9/10/2024, 6:27:28 AM	Error: in file prepare-to-print.g line 34: unknown variable 'probing_temp'
            9/10/2024, 6:27:12 AM	Warning: Macro file has been started on channel File but none was requested
            

            and like before after the print finished (including M2 in the end.gcode) the machine thought that the print was still running and it was impossible to cancel the print. I end up just doing an Emergency Stop when this happens.

            chrishammundefined 1 Reply Last reply Reply Quote 0
            • chrishammundefined
              chrishamm administrators @crpalmer
              last edited by

              @crpalmer Can you please upload a demo G-code job as well? Then I could prepare a bench setup and attempt to reproduce it.

              Duet software engineer

              crpalmerundefined 2 Replies Last reply Reply Quote 0
              • crpalmerundefined
                crpalmer @chrishamm
                last edited by

                @chrishamm Here's a gcode file that I was running at around the time I last saw the error. I can't be sure it's the one I was running but the exact object probably doesn't matter given that the error happens before the print itself starts.

                YACS - PLA - 0.6mm.gcode

                1 Reply Last reply Reply Quote 0
                • crpalmerundefined
                  crpalmer @chrishamm
                  last edited by

                  @chrishamm I was just browsing the "random restarts on mini" thread and wondered if my problems could be related. However, both of my machines are running Duet 3 Mini 5+ (Ethernet not WiFi).

                  Both machines have CAN boards attached. The one that encounters this error more frequently is using a EXP3HC and the other a TOOL1LC.

                  chrishammundefined 1 Reply Last reply Reply Quote 0
                  • chrishammundefined
                    chrishamm administrators @crpalmer
                    last edited by

                    @crpalmer I think I've got a bug fix ready which will be part of v3.5.3.

                    Duet software engineer

                    crpalmerundefined 1 Reply Last reply Reply Quote 0
                    • crpalmerundefined
                      crpalmer @chrishamm
                      last edited by

                      @chrishamm Thank you but unfortunately I think that's not the (entire?) fix. Was it this commit that you were hoping fixed the problem?

                      I find that the error occurs more frequently on one of my machines so I upgraded it to 3.5.3 to test. I just had the error occur when starting a print. FYI:

                      I have 3 machines that were running 3.5.2 and I'm seeing errors with very different frequencies on these machines. The one I upgraded to 3.5.3 has the error occur 10% of the time. That machine has a EXP3HC board that runs a Y motor an U motor and an extruder. The second machine runs a toolboard and I see the error here less frequently (I don't have a percentage estimate but definitely less frequently). The last machine has yet to report an error and it doesn't use any CAN expansion boards.

                      I don't see anything interesting in the differences in the SBC. All three machines are running Raspberry PI 4 boards (the two that have had the errors are Rev 1.5 and the one that hasn't is 1.4). The most machine that most commonly has errors and the one that has never had an error are running Raspbian 10 and the one other is Raspbian 12.

                      9/21/2024, 4:12:58 PM	Error: in file prepare-to-print.g line 32: unknown variable 'probing_temp'
                      9/21/2024, 4:12:58 PM	Error: in file prepare-to-print.g line 34: unknown variable 'probing_temp'
                      9/21/2024, 4:12:43 PM	Warning: Macro file has been started on channel File but none was requested
                      

                      0bd6c715-d16f-4f9d-b770-31f96b66ac72-image.png

                      chrishammundefined 1 Reply Last reply Reply Quote 0
                      • chrishammundefined
                        chrishamm administrators @crpalmer
                        last edited by

                        @crpalmer That was only one of the commits for the full fix, there was at least one more in DSF, too. So far I've been unable to reproduce that issue again since I made those changes. If you can come up with a simpler and reliable way to reproduce this problem, I can have a look at it again. Unfortunately, I can't reproduce your full setup here and AFAIK we haven't got similar reports from other customers either.

                        Duet software engineer

                        crpalmerundefined 1 Reply Last reply Reply Quote 0
                        • crpalmerundefined
                          crpalmer @chrishamm
                          last edited by

                          @chrishamm This problem is most definitely not fixed. Yesterday I had the error occur twice. FWIW, random and explained behaviour by the software used to control industrial machines is something that I think should still be investigated -- even if it's difficult.

                          Whenever I get the random error message, I click "Emergency Stop" in the interface and restart the job. I do so just to be on the safe side and since I know I'll have to click emergency stop eventually anyway because the print never terminates, and the interface fails to let me cancel it.

                          Yesterday, the error occurred in 2 prints jobs in a row (with an emergency stop in between). Memory corruption, while still a possibility, sounds very unlikely.

                          I did try to find a reproducible example by having macros running macros running macros, but I was not able to reproduce anything that doesn't involve actually running print jobs. One difference in my tests and the actual print jobs is that some macro invocations are generated by M401 / M402 gcodes. I see in the RRF code that when system macros are run, it takes not of that (runningSystemMacro). Is there any difference there that could be part of the problem?

                          chrishammundefined 1 Reply Last reply Reply Quote 0
                          • chrishammundefined
                            chrishamm administrators @crpalmer
                            last edited by

                            @crpalmer I suspect something closes one of your macro files prematurely - that might explain those error messages. I need to have a look at this again provided I can reproduce it somehow.

                            Meanwhile you could enable debug logging by changing the log level in /opt/dsf/conf/config.json from info to debug and share the corresponding journal when it happens again. If it is what I think it is, it might be possible to reduce the frequency of this issue by reducing the number of nested macros.

                            I've also logged this issue here: https://github.com/Duet3D/DuetSoftwareFramework/issues/205 Note that I'm not available next week but I'm happy to have a look at this again when I'm back.

                            chrishamm created this issue in Duet3D/DuetSoftwareFramework

                            closed [Bug]: Macro files on certain systems remain open leading to stack corruption #205

                            Duet software engineer

                            crpalmerundefined 2 Replies Last reply Reply Quote 0
                            • crpalmerundefined
                              crpalmer @chrishamm
                              last edited by

                              @chrishamm Thanks.

                              I changed the log level to debug and reboot the PI to be sure everything was restarted. I see debug messages from the DuetControlServer and DuetWebServer in /var/log/syslog. If the error happens again (I fear the debug logging is going to change the timing that causes this to happen) should I attach the output of running "grep -i duet /var/log/syslog" to the github bug report?

                              (Let me know next week when you're back)

                              1 Reply Last reply Reply Quote 0
                              • crpalmerundefined
                                crpalmer @chrishamm
                                last edited by

                                @chrishamm I have a syslog that has the same gcode run twice in a row, once with no errors and then once with the errors. Hopefully the syslog is the "journal" that you were referring to. Either way, it's good to know that the error still occurs with the debug logging enabled.

                                1 Reply Last reply Reply Quote 0
                                • First post
                                  Last post
                                Unless otherwise noted, all forum content is licensed under CC-BY-SA