• Tags
  • Documentation
  • Order
  • Register
  • Login
Duet3D Logo Duet3D
  • Tags
  • Documentation
  • Order
  • Register
  • Login

[3.4.5] DSF-Python - timeout failures

Scheduled Pinned Locked Moved
DSF Development
3
21
905
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • undefined
    oozeBot
    last edited by 19 May 2023, 15:15

    @chrishamm - hoping to get this in front of you. We are seeing timeouts from DSF once or twice a day. The service does restart correctly (most of the time), but we'd obviously prefer it to never timeout and certainly prefer it to never timeout when it doesn't restart itself correctly.

    Is there anything non-blocking that we can do in our python scripts to keep it from timing out? Or is this an issue within DSF?

    Here is a recent log. Please let us know how we can help identify the issue. Thanks!

    May 17 16:04:14 elevate systemd[1]: Started oozeBot Control Server.
    May 18 08:52:18 elevate OCS.py[5029]: Traceback (most recent call last):
    May 18 08:52:18 elevate OCS.py[5029]:   File "/scripts/OCS.py", line 94, in <module>
    May 18 08:52:18 elevate OCS.py[5029]:     cde = intercept_connection.receive_code()
    May 18 08:52:18 elevate OCS.py[5029]:   File "/usr/local/lib/python3.9/dist-packages/dsf/connections.py", line 458, in receive_code
    May 18 08:52:18 elevate OCS.py[5029]:     return self.receive(code.Code)
    May 18 08:52:18 elevate OCS.py[5029]:   File "/usr/local/lib/python3.9/dist-packages/dsf/connections.py", line 122, in receive
    May 18 08:52:18 elevate OCS.py[5029]:     json_string = self.receive_json()
    May 18 08:52:18 elevate OCS.py[5029]:   File "/usr/local/lib/python3.9/dist-packages/dsf/connections.py", line 162, in receive_json
    May 18 08:52:18 elevate OCS.py[5029]:     raise TimeoutError
    May 18 08:52:18 elevate OCS.py[5029]: TimeoutError
    May 18 08:52:18 elevate systemd[1]: ocs.service: Main process exited, code=exited, status=1/FAILURE
    May 18 08:52:18 elevate systemd[1]: ocs.service: Failed with result 'exit-code'.
    May 18 08:52:21 elevate systemd[1]: ocs.service: Scheduled restart job, restart counter is at 4.
    May 18 08:52:21 elevate systemd[1]: Stopped oozeBot Control Server.
    May 18 08:52:22 elevate systemd[1]: Dependency failed for oozeBot Control Server.
    May 18 08:52:22 elevate systemd[1]: ocs.service: Job ocs.service/start failed with result 'dependency'.
    
    1 Reply Last reply Reply Quote 0
    • undefined
      chrishamm administrators
      last edited by chrishamm 22 May 2023, 07:09

      @oozeBot It sounds like the Python client library doesn't set the correct timeout for IPC connections - it should be infinite. @Falcounet can you have a look please?

      Duet software engineer

      undefined undefined 2 Replies Last reply 22 May 2023, 15:40 Reply Quote 0
      • undefined
        oozeBot @chrishamm
        last edited by 22 May 2023, 15:40

        Thanks! In the meantime, we added a heartbeat that is run within the daemon to see if that protects against it timing out.

        1 Reply Last reply Reply Quote 0
        • undefined
          Falcounet @chrishamm
          last edited by 22 May 2023, 16:43

          @chrishamm @oozeBot Yes but I'm abroad for now so I couldn't look at this before the end of the week

          undefined 1 Reply Last reply 23 May 2023, 15:46 Reply Quote 2
          • undefined
            oozeBot @Falcounet
            last edited by 23 May 2023, 15:46

            @Falcounet @chrishamm

            FYI - adding a call to a custom mCode every 10 seconds through daemon.g did not fix the issue.. Thanks

            1 Reply Last reply Reply Quote 0
            • undefined
              oozeBot
              last edited by 31 May 2023, 12:11

              Bumping this so it doesn't get lost.. plus we've noticed that this is not happening on all our machines, yet they all use the same OS image and are all running 3.4.5.

              We've now protected against this through a second "watchdog" service, but we'd obviously like it to not timeout in the first place as it's happening several times a day.

              Please let us know what we can help test/research between our machines to help diagnose the issue. Thanks

              undefined 1 Reply Last reply 31 May 2023, 13:34 Reply Quote 0
              • undefined
                chrishamm administrators @oozeBot
                last edited by 31 May 2023, 13:34

                @Falcounet Any idea?

                Duet software engineer

                undefined 1 Reply Last reply 3 Jun 2023, 15:41 Reply Quote 0
                • undefined
                  Falcounet @chrishamm
                  last edited by 3 Jun 2023, 15:41

                  @chrishamm @oozeBot From what I see, you are not running the last version of dsfPython but that shouldn't change your issue anyway.

                  It doesn't seems easy for me to reproduce your issue so maybe you can try the following :

                  1. Backup /usr/local/lib/python3.9/dist-packages/dsf/connections.py as connections.py.bak
                  2. Edit /usr/local/lib/python3.9/dist-packages/dsf/connections.py and comment lines 161 and 162 : https://github.com/Duet3D/dsf-python/blob/8fd345ed6455102b4750e1e4470e52028e1b291e/src/dsf/connections.py#L161-L162
                  3. See if the issue still persists
                  1 Reply Last reply Reply Quote 2
                  • undefined
                    oozeBot
                    last edited by 3 Jun 2023, 22:44

                    Thanks! We’ll update to the latest version, make the change, and then let it bake for awhile to see if that fixes it. We’ll report back soon..

                    undefined 1 Reply Last reply 4 Jun 2023, 07:54 Reply Quote 0
                    • undefined
                      Falcounet @oozeBot
                      last edited by Falcounet 6 Apr 2023, 07:55 4 Jun 2023, 07:54

                      @oozeBot If you update first, the file will be /usr/local/lib/python3.9/dist-packages/dsf/connections/base_connection.py at lines 115 & 116 : https://github.com/Duet3D/dsf-python/blob/main/src/dsf/connections/base_connection.py#L115-L116

                      undefined 1 Reply Last reply 6 Jun 2023, 04:29 Reply Quote 1
                      • undefined
                        oozeBot @Falcounet
                        last edited by 6 Jun 2023, 04:29

                        @Falcounet

                        Maybe we are missing something , but after upgrading to the latest version, it appears something has changed with the imports. The snippet below worked fine in the previous version, but with the latest version, it fails to import MessageType and LogLevel from dsf.commands.basecommands and InterceptionMode from dsf.initmessages.clientinitmessages.

                        Any thoughts on why and how to get past this? Were they renamed? Seems unlikely but reverting to 3.3.2 resolves the issue.

                        Thanks

                        from dsf.commands.basecommands import MessageType
                        from dsf.commands.basecommands import LogLevel
                        from dsf.commands.code import CodeType
                        from dsf.connections import CommandConnection, InterceptConnection
                        from dsf.initmessages.clientinitmessages import InterceptionMode
                        

                        One of the errors..

                        Jun 05 23:41:47 elevate OCS.py[564]:     from dsf.commands.basecommands import MessageType
                        Jun 05 23:41:47 elevate OCS.py[564]: ModuleNotFoundError: No module named 'dsf.commands.basecommands'
                        
                        undefined 1 Reply Last reply 6 Jun 2023, 10:10 Reply Quote 0
                        • undefined
                          Falcounet @oozeBot
                          last edited by 6 Jun 2023, 10:10

                          @oozeBot They are renamed because dsf-python has been refactored mainly to follow DuetAPI

                          Your imports should be changed as :

                          from dsf.commands.code import CodeType
                          from dsf.connections import CommandConnection, InterceptConnection, InterceptionMode
                          from dsf.object_model import LogLevel, MessageType
                          
                          undefined 1 Reply Last reply 7 Jun 2023, 17:26 Reply Quote 1
                          • undefined
                            oozeBot @Falcounet
                            last edited by 7 Jun 2023, 17:26

                            @Falcounet It's been over 24 hours since the change was made and there have been no timeouts with the service.. however, I did upgrade to the latest version and remove those two lines of code at the same time. I've added the two lines back and will let it run for another 24 hours to see if something else in the latest version fixed the issue and then report back.

                            undefined 1 Reply Last reply 8 Jun 2023, 08:01 Reply Quote 2
                            • undefined
                              chrishamm administrators @oozeBot
                              last edited by 8 Jun 2023, 08:01

                              @oozeBot @Falcounet DSF may send zero-byte payloads to check if the socket is still open. I don't know if the Python client can actually detect that, if it does, those two lines should remain removed.

                              Duet software engineer

                              undefined 1 Reply Last reply 8 Jun 2023, 17:55 Reply Quote 1
                              • undefined
                                oozeBot @chrishamm
                                last edited by 8 Jun 2023, 17:55

                                @chrishamm @Falcounet - just caught an error with the latest version.. so it appears Chris is right - DSF is sending zero-byte payloads which triggers this condition.

                                I'll remove those lines and let it run to see if that, in fact, fixes it.

                                Jun 08 13:45:59 elevate OCS.py[566]: Traceback (most recent call last):
                                Jun 08 13:45:59 elevate OCS.py[566]:   File "/opt/dsf/sd/scripts/OCS.py", line 101, in <module>
                                Jun 08 13:45:59 elevate OCS.py[566]:     cde = intercept_connection.receive_code()
                                Jun 08 13:45:59 elevate OCS.py[566]:   File "/usr/local/lib/python3.9/dist-packages/dsf/connections/intercept_connectio>
                                Jun 08 13:45:59 elevate OCS.py[566]:     return self.receive(commands.code.Code)
                                Jun 08 13:45:59 elevate OCS.py[566]:   File "/usr/local/lib/python3.9/dist-packages/dsf/connections/base_connection.py">
                                Jun 08 13:45:59 elevate OCS.py[566]:     json_string = self.receive_json()
                                Jun 08 13:45:59 elevate OCS.py[566]:   File "/usr/local/lib/python3.9/dist-packages/dsf/connections/base_connection.py">
                                Jun 08 13:45:59 elevate OCS.py[566]:     raise TimeoutError
                                Jun 08 13:45:59 elevate OCS.py[566]: TimeoutError
                                Jun 08 13:46:00 elevate systemd[1]: ocs.service: Main process exited, code=exited, status=1/FAILURE
                                
                                1 Reply Last reply Reply Quote 1
                                • undefined
                                  oozeBot
                                  last edited by 11 Jun 2023, 21:01

                                  @chrishamm @Falcounet - it has been over 3 days without a service crash since those two lines were removed on the latest version.. so it's pretty clear that's the issue.

                                  Please let me know when a new official release is available that removes these two lines so we can update all our machines and test once again. Thanks!

                                  undefined 1 Reply Last reply 30 Oct 2023, 14:18 Reply Quote 0
                                  • undefined
                                    oozeBot @oozeBot
                                    last edited by 30 Oct 2023, 14:18

                                    @chrishamm @Falcounet - it has been over 4 months without a service crash since those two lines were removed on the latest version.. so it's pretty clear that's the issue.

                                    When will the codebase be updated to correct this issue in a new release? Thanks

                                    undefined 1 Reply Last reply 1 Nov 2023, 17:06 Reply Quote 1
                                    • undefined
                                      Falcounet @oozeBot
                                      last edited by 1 Nov 2023, 17:06

                                      @oozeBot The codebase was updated some months ago but I forgot to release the new version, sorry.
                                      dsf-python 3.4.6 is released today

                                      undefined 1 Reply Last reply 1 Nov 2023, 18:19 Reply Quote 1
                                      • undefined
                                        oozeBot @Falcounet
                                        last edited by 1 Nov 2023, 18:19

                                        @Falcounet Thanks! The refactoring broke our code which worked in 3.4.5. I'm still learning and without more examples, I'm not certain what needs to change. Can you guide me through the changes to our declarations to get this working for the new version?

                                        Our Code:

                                        from dsf.commands.code import CodeType
                                        from dsf.connections import CommandConnection, InterceptConnection, InterceptionMode, SubscribeConnection, SubscriptionMode
                                        from dsf.object_model import LogLevel, MessageType
                                        

                                        Errors presented in 3.4.6

                                        Nov 01 14:12:45 workbench1 systemd[1]: Started oozeBot Control Server.
                                        Nov 01 14:12:45 workbench1 OCS.py[868]: Traceback (most recent call last):
                                        Nov 01 14:12:45 workbench1 OCS.py[868]:   File "/opt/dsf/sd/scripts/OCS.py", line 17, in <module>
                                        Nov 01 14:12:45 workbench1 OCS.py[868]:     from dsf.commands.code import CodeType
                                        Nov 01 14:12:45 workbench1 OCS.py[868]:   File "/usr/local/lib/python3.7/dist-packages/dsf/__init__.py", line 10, in <module>
                                        Nov 01 14:12:45 workbench1 OCS.py[868]:     from . import commands, connections, http, object_model
                                        Nov 01 14:12:45 workbench1 OCS.py[868]:   File "/usr/local/lib/python3.7/dist-packages/dsf/connections/__init__.py", line 47, in <module>
                                        Nov 01 14:12:45 workbench1 OCS.py[868]:     from .base_command_connection import BaseCommandConnection
                                        Nov 01 14:12:45 workbench1 OCS.py[868]:   File "/usr/local/lib/python3.7/dist-packages/dsf/connections/base_command_connection.py", line 3, in <module>
                                        Nov 01 14:12:45 workbench1 OCS.py[868]:     from .base_connection import BaseConnection
                                        Nov 01 14:12:45 workbench1 OCS.py[868]:   File "/usr/local/lib/python3.7/dist-packages/dsf/connections/base_connection.py", line 6, in <module>
                                        Nov 01 14:12:45 workbench1 OCS.py[868]:     from .init_messages import client_init_messages, server_init_message
                                        Nov 01 14:12:45 workbench1 OCS.py[868]:   File "/usr/local/lib/python3.7/dist-packages/dsf/connections/init_messages/__init__.py", line 1, in <module>
                                        Nov 01 14:12:45 workbench1 OCS.py[868]:     from . import client_init_messages, server_init_message
                                        Nov 01 14:12:45 workbench1 OCS.py[868]:   File "/usr/local/lib/python3.7/dist-packages/dsf/connections/init_messages/client_init_messages.py", line 43, in <module>
                                        Nov 01 14:12:45 workbench1 OCS.py[868]:     auto_flush: bool = True):
                                        Nov 01 14:12:45 workbench1 OCS.py[868]: TypeError: 'type' object is not subscriptable
                                        Nov 01 14:12:45 workbench1 systemd[1]: ocs.service: Main process exited, code=exited, status=1/FAILURE
                                        Nov 01 14:12:45 workbench1 systemd[1]: ocs.service: Failed with result 'exit-code'.
                                        
                                        undefined 1 Reply Last reply 1 Nov 2023, 18:38 Reply Quote 0
                                        • undefined
                                          Falcounet @oozeBot
                                          last edited by 1 Nov 2023, 18:38

                                          @oozeBot I will need more of the source code to understand what is going on, not only the imports.

                                          undefined 1 Reply Last reply 1 Nov 2023, 18:47 Reply Quote 0
                                          • First post
                                            Last post
                                          Unless otherwise noted, all forum content is licensed under CC-BY-SA