DCS Crash with 3.01-R10 / DWC 2.1.5 / DSF 2.1.1
-
the only thing I could think of that with respect to DCS to complaining about the RDY pin is basically an interrupt storm which can grind the Pi to a halt. not sure if relevant though.
-
Well this sucks - it's just done the same thing with RC9 ..... I had screen running at the time and it reported nothing ....
[debug] Assigning filament Prusament PETG to extruder drive 0 [debug] Requesting update of key boards, seq 0 -> 0 [debug] Requesting update of key directories, seq 0 -> 0 [debug] Requesting update of key fans, seq 0 -> 7 [debug] Requesting update of key heat, seq 0 -> 7 [debug] Requesting update of key inputs, seq 0 -> 0 [debug] Requesting update of key job, seq 0 -> 2 [debug] Requesting update of key move, seq 0 -> 30 [debug] Requesting update of key network, seq 0 -> 3 [debug] Requesting update of key sensors, seq 0 -> 4 [debug] Requesting update of key spindles, seq 0 -> 0 [debug] Requesting update of key state, seq 0 -> 1 [debug] Requesting update of key tools, seq 0 -> 5 [debug] Requesting update of key volumes, seq 0 -> 0 [debug] IPC#2: Got new UNIX connection, checking mode... [debug] IPC#2: Subscription processor registered in Patch mode [debug] IPC#3: Got new UNIX connection, checking mode... [debug] Updated key boards [debug] IPC#3: Command processor added [debug] IPC#3: Received command AddUserSession [debug] Updated key directories [debug] Updated key fans [debug] Updated key heat [debug] Updated key inputs [debug] Updated key job [debug] Updated key move [debug] Updated key network [debug] IPC#4: Got new UNIX connection, checking mode... [debug] IPC#4: Command processor added [debug] IPC#4: Received command ResolvePath [debug] IPC#5: Got new UNIX connection, checking mode... [debug] IPC#5: Command processor added [debug] IPC#4: Connection closed [debug] IPC#5: Received command ResolvePath [debug] IPC#6: Got new UNIX connection, checking mode... [debug] IPC#5: Connection closed [debug] IPC#7: Got new UNIX connection, checking mode... [debug] IPC#6: Command processor added [debug] IPC#6: Received command ResolvePath [debug] IPC#7: Command processor added [debug] IPC#7: Received command ResolvePath [debug] IPC#6: Connection closed [debug] IPC#7: Connection closed [debug] Updated key sensors [debug] IPC#8: Got new UNIX connection, checking mode... [debug] IPC#8: Command processor added [debug] IPC#8: Received command ResolvePath [debug] IPC#8: Connection closed [debug] Updated key state [debug] Updated key tools [debug] Updated key volumes [debug] IPC#9: Got new UNIX connection, checking mode... [debug] IPC#10: Got new UNIX connection, checking mode... [debug] IPC#11: Got new UNIX connection, checking mode... [debug] IPC#10: Command processor added [debug] IPC#9: Command processor added [debug] IPC#11: Command processor added [debug] IPC#9: Received command ResolvePath [debug] IPC#10: Received command ResolvePath [debug] IPC#11: Received command ResolvePath [debug] Requesting update of key job, seq 2 -> 3 [debug] Requesting update of key move, seq 30 -> 31 [debug] IPC#9: Connection closed [debug] IPC#11: Connection closed [debug] IPC#10: Connection closed [debug] Updated key job [debug] Updated key move [debug] IPC#12: Got new UNIX connection, checking mode... [debug] IPC#12: Subscription processor registered in Patch mode [debug] IPC#13: Got new UNIX connection, checking mode... [debug] IPC#13: Command processor added [debug] IPC#13: Received command ResolvePath
I feel the need for a compatibility matrix for the 3 main components - which versions of RRF work wich versions of DWC.
I'd like to help the team figure this out but I haven't a clue where to start and I need the printer at least 'functional' even if there are known issues. Right now it isn't even functional.
I'm using a 4 Gig RPi 4 if it is worth anything (yes I know its OTT) ...
-
@Garfield said in DCS Crash with 3.01-R10 / DWC 2.1.5 / DSF 2.1.1:
Well this sucks - it's just done the same thing with RC9 ..... I had screen running at the time and it reported nothing ....
unfortunate, but still probably still helpfull for chrishamm. what happened to the ssh session timed out after a while or was it terminated immediately?
-
Terminated immediately ...
-
@bearer said in DCS Crash with 3.01-R10 / DWC 2.1.5 / DSF 2.1.1:
@Garfield said in DCS Crash with 3.01-R10 / DWC 2.1.5 / DSF 2.1.1:
I will try the screen though - what does that offer?
its a terminal multiplexer / window manager or sometihng like so. it achieves that dcs will keep running if you have a network glitch. if you run dcs in the foreground and ssh stops all the processes in that shell are terminated - with screen they can keep running.
It's not a network glitch. When it happens it tends to take out the entire Pi. Completely. So using screen isn't going to help. I've managed to get the network to stay up about 3 or 4 times out of 40 or so crashes, in which case screen isn't needed as you can still issue commands.... everything just takes an age to respond. But yeh, as soon as the SSH connection goes a power cycle is the only fix.
@Garfield said in DCS Crash with 3.01-R10 / DWC 2.1.5 / DSF 2.1.1:
Well this sucks - it's just done the same thing with RC9 ..... I had screen running at the time and it reported nothing ....
That sucks. It's only been RC10 that I've had this issue on. To the extent that the first time it happened I went and checked to see if my router had died.
-
@Garfield said in DCS Crash with 3.01-R10 / DWC 2.1.5 / DSF 2.1.1:
Terminated immediately ...
t
hats interesting; it means the cpu is able to relatively cleanly terminate the session as opposed to just freezing; although it doesn't help you.see below for correction. -
I wonder if something didn't uninstall or get overwritten in the 'downgrade' process. I never used RC9, I came straight from RC6
-
@ChrisP said in DCS Crash with 3.01-R10 / DWC 2.1.5 / DSF 2.1.1:
So using screen isn't going to help.
mostly a precaution to avoid terminating the process if the session is interrupted for other reasons.
-
@bearer I should say that there is no disconnect - just zero repsonse - no messages, it just stops ... I use a commercial tool (Secure CRT 8.7) and it still thinks it is connected but hitting enter just causes an on screen line feed.
-
@Garfield said in DCS Crash with 3.01-R10 / DWC 2.1.5 / DSF 2.1.1:
@bearer I should say that there is no disconnect - just zero repsonse - no messages, it just stops ... I use a commercial tool (Secure CRT 8.7) and it still thinks it is connected but hitting enter just causes an on screen line feed.
ah, that is more what i was expecting. it would terminate after 30-60 seconds or so as a timeout; in turn meaning the pi forze or was too busy to close the connection. still good info one way or the other.
-
@bearer said in DCS Crash with 3.01-R10 / DWC 2.1.5 / DSF 2.1.1:
it would terminate after 30-60 seconds or so as a timeout
Yup it does ...
-
(i wonder if setting process affinity could isolate the hanging to leave a core running ssh etc, if possible in raspbian - anyways thats it for me today)
-
I found this in the duet web server log
Apr 26 19:19:11 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker[3] Apr 26 19:19:11 duet3 DuetWebServer[1106]: Route matched with {action = "Get", controller = "WebSocket"}. Executing controller action with signature System.Threading.Tasks.Task Get() on controller DuetWebServer.Controllers.WebSocketController (DuetWe Apr 26 19:19:11 duet3 DuetWebServer[1106]: fail: DuetWebServer.Controllers.WebSocketController[0] Apr 26 19:19:11 duet3 DuetWebServer[1106]: [WebSocketController] DCS is not started Apr 26 19:19:11 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker[2] Apr 26 19:19:11 duet3 DuetWebServer[1106]: Executed action DuetWebServer.Controllers.WebSocketController.Get (DuetWebServer) in 339302.5643ms Apr 26 19:19:11 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Routing.EndpointMiddleware[1] Apr 26 19:19:11 duet3 DuetWebServer[1106]: Executed endpoint 'DuetWebServer.Controllers.WebSocketController.Get (DuetWebServer)' Apr 26 19:19:11 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Hosting.Diagnostics[2] Apr 26 19:19:11 duet3 DuetWebServer[1106]: Request finished in 339446.6255ms 101 Apr 26 19:19:11 duet3 DuetWebServer[1106]: warn: DuetWebServer.Services.ModelObserver[0] Apr 26 19:19:11 duet3 DuetWebServer[1106]: Failed to synchronize machine model Apr 26 19:19:11 duet3 DuetWebServer[1106]: System.Net.Sockets.SocketException (107): Transport endpoint is not connected Apr 26 19:19:11 duet3 DuetWebServer[1106]: at DuetAPI.Utility.JsonHelper.ReceiveUtf8Json(Socket socket, CancellationToken cancellationToken) in /home/christian/duet/DuetSoftwareFramework/src/DuetAPI/Utility/JsonHelper.cs:line 154 Apr 26 19:19:11 duet3 DuetWebServer[1106]: at DuetAPIClient.BaseConnection.ReceiveJson(CancellationToken cancellationToken) in /home/christian/duet/DuetSoftwareFramework/src/DuetAPIClient/BaseConnection.cs:line 294 Apr 26 19:19:11 duet3 DuetWebServer[1106]: at DuetAPIClient.SubscribeConnection.GetMachineModelPatch(CancellationToken cancellationToken) in /home/christian/duet/DuetSoftwareFramework/src/DuetAPIClient/SubscribeConnection.cs:line 100 Apr 26 19:19:11 duet3 DuetWebServer[1106]: at DuetWebServer.Services.ModelObserver.Execute() in /home/christian/duet/DuetSoftwareFramework/src/DuetWebServer/Services/ModelObserver.cs:line 156 Apr 26 19:19:11 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker[2]
Apr 26 19:59:29 duet3 DuetWebServer[1106]: warn: DuetWebServer.Services.ModelObserver[0] Apr 26 19:59:29 duet3 DuetWebServer[1106]: Failed to synchronize machine model Apr 26 19:59:29 duet3 DuetWebServer[1106]: System.Net.Internals.SocketExceptionFactory+ExtendedSocketException (99): Cannot assign requested address /var/run/dsf/dcs.sock Apr 26 19:59:29 duet3 DuetWebServer[1106]: at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress) Apr 26 19:59:29 duet3 DuetWebServer[1106]: at System.Net.Sockets.Socket.Connect(EndPoint remoteEP) Apr 26 19:59:29 duet3 DuetWebServer[1106]: at DuetAPIClient.BaseConnection.Connect(ClientInitMessage initMessage, String socketPath, CancellationToken cancellationToken) in /home/christian/duet/DuetSoftwareFramework/src/DuetAPIClient/BaseConnection.cs:l Apr 26 19:59:29 duet3 DuetWebServer[1106]: at DuetWebServer.Services.ModelObserver.Execute() in /home/christian/duet/DuetSoftwareFramework/src/DuetWebServer/Services/ModelObserver.cs:line 131 Apr 26 19:59:30 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Hosting.Diagnostics[1] Apr 26 19:59:30 duet3 DuetWebServer[1106]: Request starting HTTP/1.1 GET http://10.100.2.225/machine Apr 26 19:59:30 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Routing.EndpointMiddleware[0] Apr 26 19:59:30 duet3 DuetWebServer[1106]: Executing endpoint 'DuetWebServer.Controllers.WebSocketController.Get (DuetWebServer)' Apr 26 19:59:30 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker[3] Apr 26 19:59:30 duet3 DuetWebServer[1106]: Route matched with {action = "Get", controller = "WebSocket"}. Executing controller action with signature System.Threading.Tasks.Task Get() on controller DuetWebServer.Controllers.WebSocketController (DuetWe Apr 26 19:59:30 duet3 DuetWebServer[1106]: fail: DuetWebServer.Controllers.WebSocketController[0] Apr 26 19:59:30 duet3 DuetWebServer[1106]: [WebSocketController] DCS is not started Apr 26 19:59:30 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker[2] Apr 26 19:59:30 duet3 DuetWebServer[1106]: Executed action DuetWebServer.Controllers.WebSocketController.Get (DuetWebServer) in 6.6056ms Apr 26 19:59:30 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Routing.EndpointMiddleware[1] Apr 26 19:59:30 duet3 DuetWebServer[1106]: Executed endpoint 'DuetWebServer.Controllers.WebSocketController.Get (DuetWebServer)' Apr 26 19:59:30 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Hosting.Diagnostics[2] Apr 26 19:59:30 duet3 DuetWebServer[1106]: Request finished in 7.2117ms 101 Apr 26 19:59:32 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Hosting.Diagnostics[1] Apr 26 19:59:32 duet3 DuetWebServer[1106]: Request starting HTTP/1.1 GET http://10.100.2.225/machine Apr 26 19:59:32 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Routing.EndpointMiddleware[0] Apr 26 19:59:32 duet3 DuetWebServer[1106]: Executing endpoint 'DuetWebServer.Controllers.WebSocketController.Get (DuetWebServer)' Apr 26 19:59:32 duet3 DuetWebServer[1106]: info: Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker[3] Apr 26 19:59:32 duet3 DuetWebServer[1106]: Route matched with {action = "Get", controller = "WebSocket"}. Executing controller action with signature System.Threading.Tasks.Task Get() on controller DuetWebServer.Controllers.WebSocketController (DuetWe Apr 26 19:59:32 duet3 DuetWebServer[1106]: fail: DuetWebServer.Controllers.WebSocketController[0] Apr 26 19:59:32 duet3 DuetWebServer[1106]: [WebSocketController] DCS is not started
Same error even in RC9
-
@Garfield said in DCS Crash with 3.01-R10 / DWC 2.1.5 / DSF 2.1.1:
I feel the need for a compatibility matrix for the 3 main components - which versions of RRF work wich versions of DWC.
Interesting. Duet3 + Pi 4B, 4 gig. I've been having random hangs that take a power cycle to clear. I am also on RC10, as of mid evening yesterday. I was not certain this was happening, nor certain that it started at RC whatever, so I have not reported anything, yet.
Now that I think about it, it came on hard when I switched to RC10. I had to power cycle at least eight or ten times last night.
I typically have a DWC, a VNC and a SSH running. They all just hang. Attempting to start a new SSH also hangs (note, not refused, connects and never gets a password prompt).
I will see what data I can gather.
-
Can I ask all you guys with issues, if the DWC is NOT connected, does it still lock up?
-
I've never run without it, only ever connected via WiFi, would take me a while to set up if the way to test is to put the SD card into the duet itself.
-
I believe he's saying, "start a job, and then close DWC".
Yes, I tried that. SSH only, no VNC, no DWC. Still locked within a few minutes.
-
@Danal said in DCS Crash with 3.01-R10 / DWC 2.1.5 / DSF 2.1.1:
I believe he's saying, "start a job, and then close DWC".
Yes, I tried that. SSH only, no VNC, no DWC. Still locked within a few minutes.
Yeah that was it. I wanted to make sure it wasn't DWC related. I've been running RC10 + 2.1.1 and printing fine but I don't use the DWC.
Something to try... The systemd service file for the DCS was changed to set
CPUSchedulingPolicy=fifo CPUSchedulingPriority=20
which may be contributing to the problem.
Edit
/lib/systemd/system/duetcontrolserver.service
and remove those 2 lines, then reboot and see if that helps. -
@gtj0 said in DCS Crash with 3.01-R10 / DWC 2.1.5 / DSF 2.1.1:
Edit /lib/systemd/system/duetcontrolserver.service and remove those 2 lines, then reboot and see if that helps.
Will do. I've made a bunch of other changes, so let me re-verify the hangs are real, then I will try that. THANKS!
-
I'm a little late to the reporting; to help confirm, I too have seen full system lock ups and print connection losses too with RRF 3.01-R10, DWC 2.1.5 and DSF 2.1.1.
I was able to overcome and get working with:
sudo system duetcontrolserver restart
No system hardware changes, ribbons, or otherwise, just the new Beta install.
The duetcontrolserver would go to 400% CPU usage, and painstakingly getting to the terminal (I have screen direct on my Pi) was able to get a terminal open and issue the fix. SSH & web access were dead - couldn't remote in.
After the duetcontrolserver restart, system will work but still get random disconnects. I will try the modification above and report back too.