Dangerous CAN bus failure
-
@gnydick said in Dangerous CAN bus failure:
@Phaedrux Fail-safe. There needs to be a heartbeat. Without the heartbeat, all nodes should shoot themselves in the head.
I agree that a heartbeat sounds appropriate. The tool board should shut down all outputs if it's lost communication.
As a work around could you not monitor the communication and temps in daemon.g ?
I have my bed and heater power supplies controlled by SSR
In daemon.g I monitor heater state, temps and other things.
If a heater is "off" but the temp is not falling then the SSR is shut off. -
- log into your Single Board Computer
- run
sudo journalctl -x -u duetcontrolserver
-
@gnydick I agree that the main board should have failed loudly and safely. You'd probably need a relay on the power line to the toolboard to be able to shut it down when itself can't any more, and then shut down the main control board.
-
@gnydick Was the board still reporting the hotend temperatures (you can usually see small fluctuations). Was there any error message reported in the DWC console? If it happens again try running M122 to the toolboards to get more information and to check if the mainboard is still able to talk to the board. The rapid flashing light does not necessarily mean that the toolboard has lost all contact with the mainboard, it may just have lost time sync (which can be caused by a total loss of communications but it can also be caused by other things).
-
@OwenD if there is communication loss, you couldn't tell the toolboard to turn off the heater. that's why it has to be self-governed.
-
@gloomyandy it was still reporting temperature. I'm going to grab the logs to see what there is.
-
@gnydick
I didn't mean tell the tool board to turn off.
I meant if the main board can't communicate or senses heater anomaly then you turn off power to heaters (via SSR's).
If it can't communicate it should reflect in the object model (maybe can address?)
You said DWC showed heater turned off but stable high temp, so presumably thermistor is on main board? -
@OwenD no, the heater and thermistor are both on the toolboard.
-
these were happening and stopped at 00:09
Dec 18 00:07:24 Duet3 DuetControlServer[26090]: [warn] Resending packet #0 (request GetObjectModel) Dec 18 00:07:24 Duet3 DuetControlServer[26090]: [warn] Resending packet #0 (request GetObjectModel) Dec 18 00:07:24 Duet3 DuetControlServer[26090]: [warn] Resending packet #0 (request GetObjectModel) Dec 18 00:07:24 Duet3 DuetControlServer[26090]: [warn] Resending packet #0 (request GetObjectModel) Dec 18 00:07:24 Duet3 DuetControlServer[26090]: [warn] Resending packet #0 (request GetObjectModel) Dec 18 00:07:24 Duet3 DuetControlServer[26090]: [warn] Resending packet #0 (request GetObjectModel) Dec 18 00:07:25 Duet3 DuetControlServer[26090]: [warn] Resending packet #0 (request GetObjectModel) Dec 18 00:07:25 Duet3 DuetControlServer[26090]: [warn] Resending packet #0 (request GetObjectModel) Dec 18 00:07:25 Duet3 DuetControlServer[26090]: [warn] Resending packet #0 (request GetObjectModel) Dec 18 00:07:25 Duet3 DuetControlServer[26090]: [warn] Resending packet #0 (request GetObjectModel) Dec 18 00:07:25 Duet3 DuetControlServer[26090]: [warn] Resending packet #0 (request GetObjectModel) Dec 18 00:07:25 Duet3 DuetControlServer[26090]: [warn] Resending packet #0 (request GetObjectModel) Dec 18 00:07:25 Duet3 DuetControlServer[26090]: [warn] Resending packet #0 (request GetObjectModel) Dec 18 00:07:25 Duet3 DuetControlServer[26090]: [warn] Resending packet #0 (request GetObjectModel) Dec 18 00:07:25 Duet3 DuetControlServer[26090]: [warn] Resending packet #0 (request GetObjectModel) Dec 18 00:07:25 Duet3 DuetControlServer[26090]: [warn] Resending packet #0 (request GetObjectModel)
the print then finished at 00:19
these are all of the log entries around that timeDec 18 00:19:12 Duet3 DuetControlServer[26090]: [info] Finished macro file daemon.g Dec 18 00:19:22 Duet3 DuetControlServer[26090]: [info] Starting macro file daemon.g on channel Daemon Dec 18 00:19:22 Duet3 DuetControlServer[26090]: [info] Finished macro file daemon.g Dec 18 00:19:31 Duet3 DuetControlServer[26090]: [error] Response timeout: CAN addr 121, req type 6013, RID=595 Dec 18 00:19:32 Duet3 DuetControlServer[26090]: [error] Response timeout: CAN addr 122, req type 6013, RID=596 Dec 18 00:19:32 Duet3 DuetControlServer[26090]: [info] Starting macro file tfree0.g on channel File Dec 18 00:19:32 Duet3 DuetControlServer[26090]: [info] Starting macro file /macros/tool_unlock on channel File Dec 18 00:19:32 Duet3 DuetControlServer[26090]: [info] Starting macro file daemon.g on channel Daemon Dec 18 00:19:32 Duet3 DuetControlServer[26090]: [info] Finished macro file daemon.g Dec 18 00:19:35 Duet3 DuetControlServer[26090]: [info] Finished macro file /macros/tool_unlock Dec 18 00:19:36 Duet3 DuetControlServer[26090]: [info] Finished macro file tfree0.g Dec 18 00:19:38 Duet3 DuetControlServer[26090]: [error] Response timeout: CAN addr 121, req type 6013, RID=603 Dec 18 00:19:42 Duet3 DuetControlServer[26090]: [info] Starting macro file daemon.g on channel Daemon Dec 18 00:19:42 Duet3 DuetControlServer[26090]: [info] Finished macro file daemon.g Dec 18 00:19:48 Duet3 DuetControlServer[26090]: [info] Finished job file Dec 18 00:19:48 Duet3 DuetControlServer[26090]: [info] Finished printing file 0:/gcodes/bowden anchor x 4.gcode, print time was 1h 10m Dec 18 00:19:52 Duet3 DuetControlServer[26090]: [info] Starting macro file daemon.g on channel Daemon Dec 18 00:19:52 Duet3 DuetControlServer[26090]: [info] Finished macro file daemon.g Dec 18 00:20:02 Duet3 DuetControlServer[26090]: [info] Starting macro file daemon.g on channel Daemon Dec 18 00:20:02 Duet3 DuetControlServer[26090]: [info] Finished macro file daemon.g
-
@gnydick said in Dangerous CAN bus failure:
@Phaedrux Fail-safe. There needs to be a heartbeat. Without the heartbeat, all nodes should shoot themselves in the head.
this is not always the case for all users. Some users want to be able to swap out an inactive tool for example, and so have the can bus drop for that may be acceptable for some users.
I do agree that default turn off all heaters and motors on can bus disconnect is probably a good default.
-
@T3P3Tony that's true, but that's the exception that should be coded for. One observation, it doesn't seem like the tool boards like being hot swapped, they pop when you plug power in while it's on.
I don't know how aware everyone is, but a few years ago when 3D printers proliferated and cheap clones started popping up with no thermal runaway safeguards, all of the influencers started shredding them, just ripping their reputations apart, telling followers to not buy any of brand X, Y, Z and maybe not even after they fix it, wait a couple generations.
I don't think it would be very good PR for Duet if that started happening again.
-
@gnydick for your information, the reason that RRF requires the sensor that controls a heater to be on the same board as the heater is so that in the event of any CAN issues, temperature control is never lost. So loss of CAN communication does not have the same consequences as "thermal runaway".
What firmware versions were you running on the main board and tool boards when this issue occurred?
-
@dc42 that's what I would expect, but it's not happening, temperature control is lost for all intents and purposes because the firmware in the toolboard doesn't handle it. Also, you're right, it isn't the same exact thing as thermal runaway but it presents the same risk. No control over heater == END GAME.
-
@gnydick I'm confused in your original post you said that the toolboard was reporting the temperature to be the print temperature, which would imply that it was continuing to control the temperature at the last set value that it saw (which I think is what DC42 said it should do). Is that not happening?
-
@gloomyandy for all intents and purposes, the toolboard was sending messages, but not receiving them.
-
@gnydick Yes but was it maintaining the last set temperature?
-
@gnydick Just to be clear I'm not trying to debate what should happen if a message is lost or if communication with a toolboard is interrupted. I'm trying to establish if the toolboard did what it is supposed to do at the moment and maintained the last set temperature.
-
@gloomyandy I realize that, no worries. It's a good question. I don't know how it's supposed to work out what exactly was happening, just that it's wasn't responsive.