Duet3D Logo Duet3D
    • Tags
    • Documentation
    • Order
    • Register
    • Login

    [3.4.5] DSF-Python - timeout failures

    Scheduled Pinned Locked Moved
    DSF Development
    3
    21
    905
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • oozeBotundefined
      oozeBot
      last edited by

      @chrishamm - hoping to get this in front of you. We are seeing timeouts from DSF once or twice a day. The service does restart correctly (most of the time), but we'd obviously prefer it to never timeout and certainly prefer it to never timeout when it doesn't restart itself correctly.

      Is there anything non-blocking that we can do in our python scripts to keep it from timing out? Or is this an issue within DSF?

      Here is a recent log. Please let us know how we can help identify the issue. Thanks!

      May 17 16:04:14 elevate systemd[1]: Started oozeBot Control Server.
      May 18 08:52:18 elevate OCS.py[5029]: Traceback (most recent call last):
      May 18 08:52:18 elevate OCS.py[5029]:   File "/scripts/OCS.py", line 94, in <module>
      May 18 08:52:18 elevate OCS.py[5029]:     cde = intercept_connection.receive_code()
      May 18 08:52:18 elevate OCS.py[5029]:   File "/usr/local/lib/python3.9/dist-packages/dsf/connections.py", line 458, in receive_code
      May 18 08:52:18 elevate OCS.py[5029]:     return self.receive(code.Code)
      May 18 08:52:18 elevate OCS.py[5029]:   File "/usr/local/lib/python3.9/dist-packages/dsf/connections.py", line 122, in receive
      May 18 08:52:18 elevate OCS.py[5029]:     json_string = self.receive_json()
      May 18 08:52:18 elevate OCS.py[5029]:   File "/usr/local/lib/python3.9/dist-packages/dsf/connections.py", line 162, in receive_json
      May 18 08:52:18 elevate OCS.py[5029]:     raise TimeoutError
      May 18 08:52:18 elevate OCS.py[5029]: TimeoutError
      May 18 08:52:18 elevate systemd[1]: ocs.service: Main process exited, code=exited, status=1/FAILURE
      May 18 08:52:18 elevate systemd[1]: ocs.service: Failed with result 'exit-code'.
      May 18 08:52:21 elevate systemd[1]: ocs.service: Scheduled restart job, restart counter is at 4.
      May 18 08:52:21 elevate systemd[1]: Stopped oozeBot Control Server.
      May 18 08:52:22 elevate systemd[1]: Dependency failed for oozeBot Control Server.
      May 18 08:52:22 elevate systemd[1]: ocs.service: Job ocs.service/start failed with result 'dependency'.
      
      1 Reply Last reply Reply Quote 0
      • chrishammundefined
        chrishamm administrators
        last edited by chrishamm

        @oozeBot It sounds like the Python client library doesn't set the correct timeout for IPC connections - it should be infinite. @Falcounet can you have a look please?

        Duet software engineer

        oozeBotundefined Falcounetundefined 2 Replies Last reply Reply Quote 0
        • oozeBotundefined
          oozeBot @chrishamm
          last edited by

          Thanks! In the meantime, we added a heartbeat that is run within the daemon to see if that protects against it timing out.

          1 Reply Last reply Reply Quote 0
          • Falcounetundefined
            Falcounet @chrishamm
            last edited by

            @chrishamm @oozeBot Yes but I'm abroad for now so I couldn't look at this before the end of the week

            oozeBotundefined 1 Reply Last reply Reply Quote 2
            • oozeBotundefined
              oozeBot @Falcounet
              last edited by

              @Falcounet @chrishamm

              FYI - adding a call to a custom mCode every 10 seconds through daemon.g did not fix the issue.. Thanks

              1 Reply Last reply Reply Quote 0
              • oozeBotundefined
                oozeBot
                last edited by

                Bumping this so it doesn't get lost.. plus we've noticed that this is not happening on all our machines, yet they all use the same OS image and are all running 3.4.5.

                We've now protected against this through a second "watchdog" service, but we'd obviously like it to not timeout in the first place as it's happening several times a day.

                Please let us know what we can help test/research between our machines to help diagnose the issue. Thanks

                chrishammundefined 1 Reply Last reply Reply Quote 0
                • chrishammundefined
                  chrishamm administrators @oozeBot
                  last edited by

                  @Falcounet Any idea?

                  Duet software engineer

                  Falcounetundefined 1 Reply Last reply Reply Quote 0
                  • Falcounetundefined
                    Falcounet @chrishamm
                    last edited by

                    @chrishamm @oozeBot From what I see, you are not running the last version of dsfPython but that shouldn't change your issue anyway.

                    It doesn't seems easy for me to reproduce your issue so maybe you can try the following :

                    1. Backup /usr/local/lib/python3.9/dist-packages/dsf/connections.py as connections.py.bak
                    2. Edit /usr/local/lib/python3.9/dist-packages/dsf/connections.py and comment lines 161 and 162 : https://github.com/Duet3D/dsf-python/blob/8fd345ed6455102b4750e1e4470e52028e1b291e/src/dsf/connections.py#L161-L162
                    3. See if the issue still persists
                    1 Reply Last reply Reply Quote 2
                    • oozeBotundefined
                      oozeBot
                      last edited by

                      Thanks! We’ll update to the latest version, make the change, and then let it bake for awhile to see if that fixes it. We’ll report back soon..

                      Falcounetundefined 1 Reply Last reply Reply Quote 0
                      • Falcounetundefined
                        Falcounet @oozeBot
                        last edited by Falcounet

                        @oozeBot If you update first, the file will be /usr/local/lib/python3.9/dist-packages/dsf/connections/base_connection.py at lines 115 & 116 : https://github.com/Duet3D/dsf-python/blob/main/src/dsf/connections/base_connection.py#L115-L116

                        oozeBotundefined 1 Reply Last reply Reply Quote 1
                        • oozeBotundefined
                          oozeBot @Falcounet
                          last edited by

                          @Falcounet

                          Maybe we are missing something , but after upgrading to the latest version, it appears something has changed with the imports. The snippet below worked fine in the previous version, but with the latest version, it fails to import MessageType and LogLevel from dsf.commands.basecommands and InterceptionMode from dsf.initmessages.clientinitmessages.

                          Any thoughts on why and how to get past this? Were they renamed? Seems unlikely but reverting to 3.3.2 resolves the issue.

                          Thanks

                          from dsf.commands.basecommands import MessageType
                          from dsf.commands.basecommands import LogLevel
                          from dsf.commands.code import CodeType
                          from dsf.connections import CommandConnection, InterceptConnection
                          from dsf.initmessages.clientinitmessages import InterceptionMode
                          

                          One of the errors..

                          Jun 05 23:41:47 elevate OCS.py[564]:     from dsf.commands.basecommands import MessageType
                          Jun 05 23:41:47 elevate OCS.py[564]: ModuleNotFoundError: No module named 'dsf.commands.basecommands'
                          
                          Falcounetundefined 1 Reply Last reply Reply Quote 0
                          • Falcounetundefined
                            Falcounet @oozeBot
                            last edited by

                            @oozeBot They are renamed because dsf-python has been refactored mainly to follow DuetAPI

                            Your imports should be changed as :

                            from dsf.commands.code import CodeType
                            from dsf.connections import CommandConnection, InterceptConnection, InterceptionMode
                            from dsf.object_model import LogLevel, MessageType
                            
                            oozeBotundefined 1 Reply Last reply Reply Quote 1
                            • oozeBotundefined
                              oozeBot @Falcounet
                              last edited by

                              @Falcounet It's been over 24 hours since the change was made and there have been no timeouts with the service.. however, I did upgrade to the latest version and remove those two lines of code at the same time. I've added the two lines back and will let it run for another 24 hours to see if something else in the latest version fixed the issue and then report back.

                              chrishammundefined 1 Reply Last reply Reply Quote 2
                              • chrishammundefined
                                chrishamm administrators @oozeBot
                                last edited by

                                @oozeBot @Falcounet DSF may send zero-byte payloads to check if the socket is still open. I don't know if the Python client can actually detect that, if it does, those two lines should remain removed.

                                Duet software engineer

                                oozeBotundefined 1 Reply Last reply Reply Quote 1
                                • oozeBotundefined
                                  oozeBot @chrishamm
                                  last edited by

                                  @chrishamm @Falcounet - just caught an error with the latest version.. so it appears Chris is right - DSF is sending zero-byte payloads which triggers this condition.

                                  I'll remove those lines and let it run to see if that, in fact, fixes it.

                                  Jun 08 13:45:59 elevate OCS.py[566]: Traceback (most recent call last):
                                  Jun 08 13:45:59 elevate OCS.py[566]:   File "/opt/dsf/sd/scripts/OCS.py", line 101, in <module>
                                  Jun 08 13:45:59 elevate OCS.py[566]:     cde = intercept_connection.receive_code()
                                  Jun 08 13:45:59 elevate OCS.py[566]:   File "/usr/local/lib/python3.9/dist-packages/dsf/connections/intercept_connectio>
                                  Jun 08 13:45:59 elevate OCS.py[566]:     return self.receive(commands.code.Code)
                                  Jun 08 13:45:59 elevate OCS.py[566]:   File "/usr/local/lib/python3.9/dist-packages/dsf/connections/base_connection.py">
                                  Jun 08 13:45:59 elevate OCS.py[566]:     json_string = self.receive_json()
                                  Jun 08 13:45:59 elevate OCS.py[566]:   File "/usr/local/lib/python3.9/dist-packages/dsf/connections/base_connection.py">
                                  Jun 08 13:45:59 elevate OCS.py[566]:     raise TimeoutError
                                  Jun 08 13:45:59 elevate OCS.py[566]: TimeoutError
                                  Jun 08 13:46:00 elevate systemd[1]: ocs.service: Main process exited, code=exited, status=1/FAILURE
                                  
                                  1 Reply Last reply Reply Quote 1
                                  • oozeBotundefined
                                    oozeBot
                                    last edited by

                                    @chrishamm @Falcounet - it has been over 3 days without a service crash since those two lines were removed on the latest version.. so it's pretty clear that's the issue.

                                    Please let me know when a new official release is available that removes these two lines so we can update all our machines and test once again. Thanks!

                                    oozeBotundefined 1 Reply Last reply Reply Quote 0
                                    • oozeBotundefined
                                      oozeBot @oozeBot
                                      last edited by

                                      @chrishamm @Falcounet - it has been over 4 months without a service crash since those two lines were removed on the latest version.. so it's pretty clear that's the issue.

                                      When will the codebase be updated to correct this issue in a new release? Thanks

                                      Falcounetundefined 1 Reply Last reply Reply Quote 1
                                      • Falcounetundefined
                                        Falcounet @oozeBot
                                        last edited by

                                        @oozeBot The codebase was updated some months ago but I forgot to release the new version, sorry.
                                        dsf-python 3.4.6 is released today

                                        oozeBotundefined 1 Reply Last reply Reply Quote 1
                                        • oozeBotundefined
                                          oozeBot @Falcounet
                                          last edited by

                                          @Falcounet Thanks! The refactoring broke our code which worked in 3.4.5. I'm still learning and without more examples, I'm not certain what needs to change. Can you guide me through the changes to our declarations to get this working for the new version?

                                          Our Code:

                                          from dsf.commands.code import CodeType
                                          from dsf.connections import CommandConnection, InterceptConnection, InterceptionMode, SubscribeConnection, SubscriptionMode
                                          from dsf.object_model import LogLevel, MessageType
                                          

                                          Errors presented in 3.4.6

                                          Nov 01 14:12:45 workbench1 systemd[1]: Started oozeBot Control Server.
                                          Nov 01 14:12:45 workbench1 OCS.py[868]: Traceback (most recent call last):
                                          Nov 01 14:12:45 workbench1 OCS.py[868]:   File "/opt/dsf/sd/scripts/OCS.py", line 17, in <module>
                                          Nov 01 14:12:45 workbench1 OCS.py[868]:     from dsf.commands.code import CodeType
                                          Nov 01 14:12:45 workbench1 OCS.py[868]:   File "/usr/local/lib/python3.7/dist-packages/dsf/__init__.py", line 10, in <module>
                                          Nov 01 14:12:45 workbench1 OCS.py[868]:     from . import commands, connections, http, object_model
                                          Nov 01 14:12:45 workbench1 OCS.py[868]:   File "/usr/local/lib/python3.7/dist-packages/dsf/connections/__init__.py", line 47, in <module>
                                          Nov 01 14:12:45 workbench1 OCS.py[868]:     from .base_command_connection import BaseCommandConnection
                                          Nov 01 14:12:45 workbench1 OCS.py[868]:   File "/usr/local/lib/python3.7/dist-packages/dsf/connections/base_command_connection.py", line 3, in <module>
                                          Nov 01 14:12:45 workbench1 OCS.py[868]:     from .base_connection import BaseConnection
                                          Nov 01 14:12:45 workbench1 OCS.py[868]:   File "/usr/local/lib/python3.7/dist-packages/dsf/connections/base_connection.py", line 6, in <module>
                                          Nov 01 14:12:45 workbench1 OCS.py[868]:     from .init_messages import client_init_messages, server_init_message
                                          Nov 01 14:12:45 workbench1 OCS.py[868]:   File "/usr/local/lib/python3.7/dist-packages/dsf/connections/init_messages/__init__.py", line 1, in <module>
                                          Nov 01 14:12:45 workbench1 OCS.py[868]:     from . import client_init_messages, server_init_message
                                          Nov 01 14:12:45 workbench1 OCS.py[868]:   File "/usr/local/lib/python3.7/dist-packages/dsf/connections/init_messages/client_init_messages.py", line 43, in <module>
                                          Nov 01 14:12:45 workbench1 OCS.py[868]:     auto_flush: bool = True):
                                          Nov 01 14:12:45 workbench1 OCS.py[868]: TypeError: 'type' object is not subscriptable
                                          Nov 01 14:12:45 workbench1 systemd[1]: ocs.service: Main process exited, code=exited, status=1/FAILURE
                                          Nov 01 14:12:45 workbench1 systemd[1]: ocs.service: Failed with result 'exit-code'.
                                          
                                          Falcounetundefined 1 Reply Last reply Reply Quote 0
                                          • Falcounetundefined
                                            Falcounet @oozeBot
                                            last edited by

                                            @oozeBot I will need more of the source code to understand what is going on, not only the imports.

                                            oozeBotundefined 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Unless otherwise noted, all forum content is licensed under CC-BY-SA