[3.4.5] DSF-Python - timeout failures
-
Thanks! In the meantime, we added a heartbeat that is run within the daemon to see if that protects against it timing out.
-
@chrishamm @oozeBot Yes but I'm abroad for now so I couldn't look at this before the end of the week
-
FYI - adding a call to a custom mCode every 10 seconds through daemon.g did not fix the issue.. Thanks
-
Bumping this so it doesn't get lost.. plus we've noticed that this is not happening on all our machines, yet they all use the same OS image and are all running 3.4.5.
We've now protected against this through a second "watchdog" service, but we'd obviously like it to not timeout in the first place as it's happening several times a day.
Please let us know what we can help test/research between our machines to help diagnose the issue. Thanks
-
@Falcounet Any idea?
-
@chrishamm @oozeBot From what I see, you are not running the last version of dsfPython but that shouldn't change your issue anyway.
It doesn't seems easy for me to reproduce your issue so maybe you can try the following :
- Backup
/usr/local/lib/python3.9/dist-packages/dsf/connections.py
as connections.py.bak - Edit
/usr/local/lib/python3.9/dist-packages/dsf/connections.py
and comment lines 161 and 162 : https://github.com/Duet3D/dsf-python/blob/8fd345ed6455102b4750e1e4470e52028e1b291e/src/dsf/connections.py#L161-L162 - See if the issue still persists
- Backup
-
Thanks! We’ll update to the latest version, make the change, and then let it bake for awhile to see if that fixes it. We’ll report back soon..
-
@oozeBot If you update first, the file will be
/usr/local/lib/python3.9/dist-packages/dsf/connections/base_connection.py
at lines 115 & 116 : https://github.com/Duet3D/dsf-python/blob/main/src/dsf/connections/base_connection.py#L115-L116 -
Maybe we are missing something , but after upgrading to the latest version, it appears something has changed with the imports. The snippet below worked fine in the previous version, but with the latest version, it fails to import MessageType and LogLevel from dsf.commands.basecommands and InterceptionMode from dsf.initmessages.clientinitmessages.
Any thoughts on why and how to get past this? Were they renamed? Seems unlikely but reverting to 3.3.2 resolves the issue.
Thanks
from dsf.commands.basecommands import MessageType from dsf.commands.basecommands import LogLevel from dsf.commands.code import CodeType from dsf.connections import CommandConnection, InterceptConnection from dsf.initmessages.clientinitmessages import InterceptionMode
One of the errors..
Jun 05 23:41:47 elevate OCS.py[564]: from dsf.commands.basecommands import MessageType Jun 05 23:41:47 elevate OCS.py[564]: ModuleNotFoundError: No module named 'dsf.commands.basecommands'
-
@oozeBot They are renamed because dsf-python has been refactored mainly to follow DuetAPI
Your imports should be changed as :
from dsf.commands.code import CodeType from dsf.connections import CommandConnection, InterceptConnection, InterceptionMode from dsf.object_model import LogLevel, MessageType
-
@Falcounet It's been over 24 hours since the change was made and there have been no timeouts with the service.. however, I did upgrade to the latest version and remove those two lines of code at the same time. I've added the two lines back and will let it run for another 24 hours to see if something else in the latest version fixed the issue and then report back.
-
@oozeBot @Falcounet DSF may send zero-byte payloads to check if the socket is still open. I don't know if the Python client can actually detect that, if it does, those two lines should remain removed.
-
@chrishamm @Falcounet - just caught an error with the latest version.. so it appears Chris is right - DSF is sending zero-byte payloads which triggers this condition.
I'll remove those lines and let it run to see if that, in fact, fixes it.
Jun 08 13:45:59 elevate OCS.py[566]: Traceback (most recent call last): Jun 08 13:45:59 elevate OCS.py[566]: File "/opt/dsf/sd/scripts/OCS.py", line 101, in <module> Jun 08 13:45:59 elevate OCS.py[566]: cde = intercept_connection.receive_code() Jun 08 13:45:59 elevate OCS.py[566]: File "/usr/local/lib/python3.9/dist-packages/dsf/connections/intercept_connectio> Jun 08 13:45:59 elevate OCS.py[566]: return self.receive(commands.code.Code) Jun 08 13:45:59 elevate OCS.py[566]: File "/usr/local/lib/python3.9/dist-packages/dsf/connections/base_connection.py"> Jun 08 13:45:59 elevate OCS.py[566]: json_string = self.receive_json() Jun 08 13:45:59 elevate OCS.py[566]: File "/usr/local/lib/python3.9/dist-packages/dsf/connections/base_connection.py"> Jun 08 13:45:59 elevate OCS.py[566]: raise TimeoutError Jun 08 13:45:59 elevate OCS.py[566]: TimeoutError Jun 08 13:46:00 elevate systemd[1]: ocs.service: Main process exited, code=exited, status=1/FAILURE
-
@chrishamm @Falcounet - it has been over 3 days without a service crash since those two lines were removed on the latest version.. so it's pretty clear that's the issue.
Please let me know when a new official release is available that removes these two lines so we can update all our machines and test once again. Thanks!
-
@chrishamm @Falcounet - it has been over 4 months without a service crash since those two lines were removed on the latest version.. so it's pretty clear that's the issue.
When will the codebase be updated to correct this issue in a new release? Thanks
-
@oozeBot The codebase was updated some months ago but I forgot to release the new version, sorry.
dsf-python 3.4.6 is released today -
@Falcounet Thanks! The refactoring broke our code which worked in 3.4.5. I'm still learning and without more examples, I'm not certain what needs to change. Can you guide me through the changes to our declarations to get this working for the new version?
Our Code:
from dsf.commands.code import CodeType from dsf.connections import CommandConnection, InterceptConnection, InterceptionMode, SubscribeConnection, SubscriptionMode from dsf.object_model import LogLevel, MessageType
Errors presented in 3.4.6
Nov 01 14:12:45 workbench1 systemd[1]: Started oozeBot Control Server. Nov 01 14:12:45 workbench1 OCS.py[868]: Traceback (most recent call last): Nov 01 14:12:45 workbench1 OCS.py[868]: File "/opt/dsf/sd/scripts/OCS.py", line 17, in <module> Nov 01 14:12:45 workbench1 OCS.py[868]: from dsf.commands.code import CodeType Nov 01 14:12:45 workbench1 OCS.py[868]: File "/usr/local/lib/python3.7/dist-packages/dsf/__init__.py", line 10, in <module> Nov 01 14:12:45 workbench1 OCS.py[868]: from . import commands, connections, http, object_model Nov 01 14:12:45 workbench1 OCS.py[868]: File "/usr/local/lib/python3.7/dist-packages/dsf/connections/__init__.py", line 47, in <module> Nov 01 14:12:45 workbench1 OCS.py[868]: from .base_command_connection import BaseCommandConnection Nov 01 14:12:45 workbench1 OCS.py[868]: File "/usr/local/lib/python3.7/dist-packages/dsf/connections/base_command_connection.py", line 3, in <module> Nov 01 14:12:45 workbench1 OCS.py[868]: from .base_connection import BaseConnection Nov 01 14:12:45 workbench1 OCS.py[868]: File "/usr/local/lib/python3.7/dist-packages/dsf/connections/base_connection.py", line 6, in <module> Nov 01 14:12:45 workbench1 OCS.py[868]: from .init_messages import client_init_messages, server_init_message Nov 01 14:12:45 workbench1 OCS.py[868]: File "/usr/local/lib/python3.7/dist-packages/dsf/connections/init_messages/__init__.py", line 1, in <module> Nov 01 14:12:45 workbench1 OCS.py[868]: from . import client_init_messages, server_init_message Nov 01 14:12:45 workbench1 OCS.py[868]: File "/usr/local/lib/python3.7/dist-packages/dsf/connections/init_messages/client_init_messages.py", line 43, in <module> Nov 01 14:12:45 workbench1 OCS.py[868]: auto_flush: bool = True): Nov 01 14:12:45 workbench1 OCS.py[868]: TypeError: 'type' object is not subscriptable Nov 01 14:12:45 workbench1 systemd[1]: ocs.service: Main process exited, code=exited, status=1/FAILURE Nov 01 14:12:45 workbench1 systemd[1]: ocs.service: Failed with result 'exit-code'.
-
@oozeBot I will need more of the source code to understand what is going on, not only the imports.
-
For posterity - this must have been an issue with Python as upgrading to 3.12 resolved the issue I just reported.