Deadlock reading the object model from a plugin
-
Hello,
I am developping a interception plugin on my duet, and I am experiencing occasional deadlocks when reading from the object model, which cause some prints to hang forever.
More details:
- My plugin intercepts some GCode meta-commands and must retrieve a sensor value using a call to
CommandConnection.get_object_model()
from thedsf-python
package. - Most of the time, this just works, but sometimes, it hangs forever, causing the print to never resume because it is stuck in my plugin.
- When it happened, I checked the logs using
sudo journalctl -u duetcontrolserver -r
, and I got the following suspicious lines:
Oct 04 09:02:35 alpha DuetControlServer[29146]: [warn] Resending packet #0 (request GetObjectModel) Oct 04 09:02:35 alpha DuetControlServer[29146]: [warn] Resending packet #0 (request GetObjectModel) Oct 04 09:02:35 alpha DuetControlServer[29146]: [warn] Resending packet #0 (request GetObjectModel) Oct 04 09:02:35 alpha DuetControlServer[29146]: [warn] Resending packet #0 (request GetObjectModel) Oct 04 09:02:35 alpha DuetControlServer[29146]: [warn] Resending packet #0 (request GetObjectModel) Oct 04 09:02:35 alpha DuetControlServer[29146]: [warn] Resending packet #0 (request GetObjectModel) Oct 04 09:02:35 alpha DuetControlServer[29146]: [warn] Resending packet #0 (request GetObjectModel) Oct 04 09:02:35 alpha DuetControlServer[29146]: [warn] Resending packet #0 (request GetObjectModel) Oct 04 09:02:35 alpha DuetControlServer[29146]: [warn] Resending packet #0 (request GetObjectModel) Oct 04 09:02:35 alpha DuetControlServer[29146]: [warn] Resending packet #0 (request GetObjectModel) Oct 04 09:02:35 alpha DuetControlServer[29146]: [warn] Resending packet #0 (request GetObjectModel)
I suspect that the command to retrieve the object model is stuck in a deadlock, which sounds like a synchronization issue.
I have tried to call
CommandConnection.sync_object_model()
before retrieving the object model, but the deadlock also occured once in that case.Do you have any ideas on how to solve this issue ?
Do you think that I am doing something wrong, or could it be an issue in the firmware ?
In that case, should I update it to the latest version (3.5.3) ?Thank you very much for your help!
Antoine
- My plugin intercepts some GCode meta-commands and must retrieve a sensor value using a call to
-
@Ant1 I think you will need @chrishamm to have a look at this issue.
Ian
-
@Ant1 It sounds like the object model response is too long so RRF fails to send it over. A plugin can only request the object model from DCS, which keeps a copy of it for all other applications, so I don't think it's the dsf-python call that is responsible for that problem.
Please do update to the latest version (3.5.3) because I'm quite sure we had to implement a different query method for
tools[]
andmove.axes[]
for large configurations. -
@chrishamm Okay I will update my machines and see if it solves the issue. Thanks!
Otherwise, is there a way to only get a specific key of the object model that prevents full updates to be propagated to the raspberry PI ? Maybe this would help reducing too long responses.
-
@Ant1 No, only plugins can subscribe to specific parts of the object model. Please let me know if this is still an issue with 3.5.3 and if it is, please share your configuration as well.
-
@chrishamm Okay, good to know. I've made the update to 3.5.3 on one of my machines this week, and it seems to be working for now. I will make an update if I still observe issues.
Thanks for your help!
-
@Ant1 Hey! Unfortunately the issue just happened again today
Here is information with regard to my configuration:
config.gAnd here are some information about the plugin that I am developping. I only included the interception part, but if you want to see the other files I can share.
plugin.json
intercept.py -
@Ant1 Sorry to hear that. Did you see the same
Resending packet #0 (request GetObjectModel)
messages this time, too? Your config doesn't look terribly complex, so I find it surprising that you're running out of output buffer space - that's the only reason for that particular log message.You could try to disable PanelDue for testing purposes and check if the problem persists then. PanelDue, networking, and SBC interfaces share the same output buffer pool.
Btw, you should cancel and discard the code being intercepted if
flush
returnsfalse
, else the code action may be still executed even though the underlying code or (macro) file is already cancelled. -
@chrishamm
Yes, it was the same issue withResending package #0
.
Okay I will check if disabling the PanelDue works.
And thank you for the tip, I will add that line of code to the plugin. -
@Ant1 Oh and by the way, we also observed a new, more concerning issue with the plugin. When resuming the GCode execution after a custom command has been intercepted by my plugin, the printer shifts everything vertically, which basically makes the print fail.
I have checked the GCode, and there is always an instruction
G1 Z...
after the call to the macro that gets intercepted, so the machine should move to a specific height. But instead, it moves 1 or 2mm higher than that and it prints in the air.We have checked, and this issue only happens when the plugin is activated. Could it also be some kind of synchronization issue between the plugin and the duet ? I am kinda lost on this one...