Duet3D Logo Duet3D
    • Tags
    • Documentation
    • Order
    • Register
    • Login

    Deadlock reading the object model from a plugin

    Scheduled Pinned Locked Moved
    Plugins for DWC and DSF
    3
    16
    615
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Ant1undefined
      Ant1
      last edited by

      Hello,

      I am developping a interception plugin on my duet, and I am experiencing occasional deadlocks when reading from the object model, which cause some prints to hang forever.

      More details:

      • My plugin intercepts some GCode meta-commands and must retrieve a sensor value using a call to CommandConnection.get_object_model() from the dsf-python package.
      • Most of the time, this just works, but sometimes, it hangs forever, causing the print to never resume because it is stuck in my plugin.
      • When it happened, I checked the logs using sudo journalctl -u duetcontrolserver -r, and I got the following suspicious lines:
      Oct 04 09:02:35 alpha DuetControlServer[29146]: [warn] Resending packet #0 (request GetObjectModel)
      Oct 04 09:02:35 alpha DuetControlServer[29146]: [warn] Resending packet #0 (request GetObjectModel)
      Oct 04 09:02:35 alpha DuetControlServer[29146]: [warn] Resending packet #0 (request GetObjectModel)
      Oct 04 09:02:35 alpha DuetControlServer[29146]: [warn] Resending packet #0 (request GetObjectModel)
      Oct 04 09:02:35 alpha DuetControlServer[29146]: [warn] Resending packet #0 (request GetObjectModel)
      Oct 04 09:02:35 alpha DuetControlServer[29146]: [warn] Resending packet #0 (request GetObjectModel)
      Oct 04 09:02:35 alpha DuetControlServer[29146]: [warn] Resending packet #0 (request GetObjectModel)
      Oct 04 09:02:35 alpha DuetControlServer[29146]: [warn] Resending packet #0 (request GetObjectModel)
      Oct 04 09:02:35 alpha DuetControlServer[29146]: [warn] Resending packet #0 (request GetObjectModel)
      Oct 04 09:02:35 alpha DuetControlServer[29146]: [warn] Resending packet #0 (request GetObjectModel)
      Oct 04 09:02:35 alpha DuetControlServer[29146]: [warn] Resending packet #0 (request GetObjectModel)
      

      I suspect that the command to retrieve the object model is stuck in a deadlock, which sounds like a synchronization issue.

      I have tried to call CommandConnection.sync_object_model() before retrieving the object model, but the deadlock also occured once in that case.

      Do you have any ideas on how to solve this issue ?
      Do you think that I am doing something wrong, or could it be an issue in the firmware ?
      In that case, should I update it to the latest version (3.5.3) ?

      Thank you very much for your help!

      Antoine

      droftartsundefined chrishammundefined 2 Replies Last reply Reply Quote 0
      • droftartsundefined
        droftarts administrators @Ant1
        last edited by

        @Ant1 I think you will need @chrishamm to have a look at this issue.

        Ian

        Bed-slinger - Mini5+ WiFi/1LC | RRP Fisher v1 - D2 WiFi | Polargraph - D2 WiFi | TronXY X5S - 6HC/Roto | CNC router - 6HC | Tractus3D T1250 - D2 Eth

        1 Reply Last reply Reply Quote 0
        • chrishammundefined
          chrishamm administrators @Ant1
          last edited by

          @Ant1 It sounds like the object model response is too long so RRF fails to send it over. A plugin can only request the object model from DCS, which keeps a copy of it for all other applications, so I don't think it's the dsf-python call that is responsible for that problem.

          Please do update to the latest version (3.5.3) because I'm quite sure we had to implement a different query method for tools[] and move.axes[] for large configurations.

          Duet software engineer

          Ant1undefined 1 Reply Last reply Reply Quote 0
          • Ant1undefined
            Ant1 @chrishamm
            last edited by

            @chrishamm Okay I will update my machines and see if it solves the issue. Thanks!

            Otherwise, is there a way to only get a specific key of the object model that prevents full updates to be propagated to the raspberry PI ? Maybe this would help reducing too long responses.

            chrishammundefined 1 Reply Last reply Reply Quote 0
            • chrishammundefined
              chrishamm administrators @Ant1
              last edited by

              @Ant1 No, only plugins can subscribe to specific parts of the object model. Please let me know if this is still an issue with 3.5.3 and if it is, please share your configuration as well.

              Duet software engineer

              Ant1undefined 1 Reply Last reply Reply Quote 0
              • Ant1undefined
                Ant1 @chrishamm
                last edited by

                @chrishamm Okay, good to know. I've made the update to 3.5.3 on one of my machines this week, and it seems to be working for now. I will make an update if I still observe issues.

                Thanks for your help!

                Ant1undefined 1 Reply Last reply Reply Quote 0
                • Ant1undefined
                  Ant1 @Ant1
                  last edited by

                  @Ant1 Hey! Unfortunately the issue just happened again today 😕

                  Here is information with regard to my configuration:
                  FirmwareVersion.png
                  config.g

                  And here are some information about the plugin that I am developping. I only included the interception part, but if you want to see the other files I can share.
                  plugin.json
                  intercept.py

                  chrishammundefined 1 Reply Last reply Reply Quote 0
                  • chrishammundefined
                    chrishamm administrators @Ant1
                    last edited by

                    @Ant1 Sorry to hear that. Did you see the same Resending packet #0 (request GetObjectModel) messages this time, too? Your config doesn't look terribly complex, so I find it surprising that you're running out of output buffer space - that's the only reason for that particular log message.

                    You could try to disable PanelDue for testing purposes and check if the problem persists then. PanelDue, networking, and SBC interfaces share the same output buffer pool.

                    Btw, you should cancel and discard the code being intercepted if flush returns false, else the code action may be still executed even though the underlying code or (macro) file is already cancelled.

                    Duet software engineer

                    Ant1undefined 1 Reply Last reply Reply Quote 0
                    • Ant1undefined
                      Ant1 @chrishamm
                      last edited by Ant1

                      @chrishamm
                      Yes, it was the same issue with Resending package #0.
                      Okay I will check if disabling the PanelDue works.
                      And thank you for the tip, I will add that line of code to the plugin.

                      Ant1undefined 1 Reply Last reply Reply Quote 0
                      • Ant1undefined
                        Ant1 @Ant1
                        last edited by

                        @Ant1 Oh and by the way, we also observed a new, more concerning issue with the plugin. When resuming the GCode execution after a custom command has been intercepted by my plugin, the printer shifts everything vertically, which basically makes the print fail.

                        I have checked the GCode, and there is always an instruction G1 Z... after the call to the macro that gets intercepted, so the machine should move to a specific height. But instead, it moves 1 or 2mm higher than that and it prints in the air.

                        We have checked, and this issue only happens when the plugin is activated. Could it also be some kind of synchronization issue between the plugin and the duet ? I am kinda lost on this one...

                        chrishammundefined 1 Reply Last reply Reply Quote 0
                        • chrishammundefined
                          chrishamm administrators @Ant1
                          last edited by

                          @Ant1 I can't see what happens in your control class but maybe you open a second command connection and send instructions to a different channel instead of the channel where the code is intercepted. If you have relative positioning on the channel where the code is intercepted from (e.g. DWC -> HTTP channel), absolute positioning may be still set on the default code channel for commands (SBC channel).

                          I generally recommend you use only the intercept connection to send further commands before resuming regular execution. Btw, there seems to be an issue with blocking message boxes that I still need to fix. I've already logged it here:
                          https://github.com/Duet3D/DuetSoftwareFramework/issues/207

                          chrishamm created this issue in Duet3D/DuetSoftwareFramework

                          closed [Bug]: M291 from intercepted codes can lead to deadlocks #207

                          Duet software engineer

                          Ant1undefined 1 Reply Last reply Reply Quote 1
                          • Ant1undefined
                            Ant1 @chrishamm
                            last edited by

                            @chrishamm Thanks for your issue on GitHub! Is there any way I could help tracking it down ?

                            Regarding the command connection, I am passing the intercept connection to the different parts of my plugin so that all the commands are sent through it, so this should be working, right ?

                            The weird thing is that this issue happens randomly. That is, if I launch the same GCode several times in a row, it might succeed, or fail, but at different places. The only common denominator is that it fails to move at the right height after it resumes execution from my plugin.

                            chrishammundefined 1 Reply Last reply Reply Quote 0
                            • chrishammundefined
                              chrishamm administrators @Ant1
                              last edited by

                              @Ant1 said in Deadlock reading the object model from a plugin:

                              Is there any way I could help tracking it down ?

                              Not yet, I need to have a look at this problem again in detail before I can comment on it further. If you can, offload the M291 logic to a separate macro for now and invoke that using M98 instead. A quick fix for a blocking M291 request from a custom code might be to prioritize M292, however I need evaluate that option again. I'll keep you posted.

                              @Ant1 said in Deadlock reading the object model from a plugin:

                              Regarding the command connection, I am passing the intercept connection to the different parts of my plugin so that all the commands are sent through it, so this should be working, right ?

                              Not necessarily. When you run further codes on the same channel but using a different command connection, those codes are queued behind the code that is being intercepted. By sending codes through the intercept connection instead, you can ensure that those codes are executed before the code being intercepted finishes. Perhaps that side-effect upsets your custom logic.

                              Duet software engineer

                              Ant1undefined 1 Reply Last reply Reply Quote 0
                              • Ant1undefined
                                Ant1 @chrishamm
                                last edited by

                                @chrishamm
                                Yes, that's what I meant, currently my InterceptConnection is shared between all the modules of my code, so that all the commands get executed before the interception finishes.

                                What is weird though is that we used to have a separate CommandConnection in a previous version of the plugin and that bug never happened back then...

                                chrishammundefined 1 Reply Last reply Reply Quote 0
                                • chrishammundefined
                                  chrishamm administrators @Ant1
                                  last edited by

                                  @Ant1 I've got fixes ready for the message box deadlock in the DSF v3.5 and v3.6 branches. Prioritizing M292 does indeed fix it. That lets me run your test.py script without problems.

                                  If you want to diagnose the order of codes being processed, either check out the CodeLogger utility in /opt/dsf/bin or enable debug logging via /opt/dsf/conf/config.json.

                                  Duet software engineer

                                  Ant1undefined 1 Reply Last reply Reply Quote 1
                                  • Ant1undefined
                                    Ant1 @chrishamm
                                    last edited by

                                    @chrishamm Okay thanks for all the pointers ! I switched all my machines to 3.5.3 and made the changes with the connections that you suggested, and it solves most of my problems.

                                    For the problem that I mentionned due to the machine being shifter along the Z axis after executing a macro from the plugin, I found something that might have caused the issue:

                                    When intercepting a command inside the plugin, I was always starting by sending G90 and M83 to the machine to make sure that whatever mode it is in, I can always perform moves and extrusions in absolute and relative mode, respectively. I suspect that those two commands might have caused the machine to shift position along Z due to synchronization issues because I removed them and the issue never happened again...

                                    1 Reply Last reply Reply Quote 1
                                    • First post
                                      Last post
                                    Unless otherwise noted, all forum content is licensed under CC-BY-SA