Heater Fault handling improvements

T3P3Tony · 14 Oct 2021, 13:35

@jbjhjm the reason there is no pause on heater faults is its still a firmware limitation that heater faults and motor stalls are not handled on CAN conencted boards in the same way as they are on directly connected heaters:

https://duet3d.dozuki.com/Wiki/Duet_3_firmware_configuration_limitations:

The main board does not react to heater faults on expansion boards by pausing the print.

We will remove the limitation ASAP but it depends on underlying work that needs to be done to the CAN protocol.

jbjhjm · 14 Oct 2021, 16:08

thanks for explaining @t3p3tony ! Looking forward to seeing that limitation being blown away!

Til then, best bet for everyone may be to try out the deamon.g checker.
https://forum.duet3d.com/topic/20722/heater-fault-checking-routine-to-be-run-in-daemon-g

wdenker · 16 Oct 2021, 01:54

@t3p3tony can we get it resolved on duet wifi? No can connection there. I run into this usually once every two weeks or so with how many printers I'm running which gets super annoying because I leave them for hours and always find them too late.

OwenD · 16 Oct 2021, 06:22

@jbjhjm said in Heater Fault handling improvements:

Had some heater faults due to layer fan cooling down hotend too much.
Have to figure out why PID is off/doesn't compensate, but that's another topic.

<<SNIP>>

OwenD created a deamon.g which can be used.
But I consider it a workaround - complicated solution, and it will not react immediately due to ~1Hz frequency.
Pausing the print there might already be too late for flawless continuation.
https://forum.duet3d.com/topic/20722/heater-fault-checking-routine-to-be-run-in-daemon-g

A macro would be best; I'd use it to do various things:
pause the job, display an error, saving resurrect state and generate audible sound to let me know something's off there.

My 2 cents....
The daemon code I shared, may seem complicated, but in reality is probably less so than the attendant changes to the source code to implement such things. It's just shifting the workload onto those seeking the change.
I think that's great because you can do things that have low value to the greater majority, or that flies against the tide of what the developers must do for safety and liability reasons.
The first rule must always be that the system fail to safe.

What you're asking for is pretty much for RRF to hand over all control of heater safety to the end users. That's a recipe for disaster IMO.
And yes I note you stated that this would be dependent on the existence of a macro.

There are currently settings for heater deviation time.
Perhaps you can set these a bit higher and use a macro to check for deviation before the hard limit.
This would allow your to pause, but honestly if your bed heater has failed, what chance of a successful re-start?

With regards to the requirement to be able to get flawless continuation, I would say that's very low on the list of requirements if a heater has faulted (which is a potential safety risk).

My motivation for the code I shared was to cater for cases where something happens when the printer is idle.
From what I understand, some heater checks don't occur in this circumstance.
I'm not sure if that's still the case, but it also allows me to monitor coolant flow etc. (plus I enjoy tinkering)

T3P3Tony · 16 Oct 2021, 07:18

@wdenker this is a separate issue AFAIK if it's on duet2. Are you saying you get a heater fault and the print is not paused? As you say there won't be a CAN expansion board in the system if it's a Duet 2 so the limitation does not apply.

T3P3Tony · 16 Oct 2021, 07:22

@owend said in Heater Fault handling improvements:

What you're asking for is pretty much for RRF to hand over all control of heater safety to the end users. That's a recipe for disaster IMO.
And yes I note you stated that this would be dependent on the existence of a macro.

At this stage the plan is to extend the current heater fault behaviour to also work when the heater that has the fault is on a CAN expansion board.

jbjhjm · 16 Oct 2021, 08:55

@owend I agree to your thoughts, however I think my idea is a bit different to what you described. Maybe my initial post was too imprecise.

I am not talking about a macro REPLACING the automated fault reaction.
I am talking about a macro called AFTER reacting to a fault.

As far as I can see this does not violate any safety behavior or risk the safety routines to be broken.
However it allows for additional, individual measurements. Let yourself be notified by a beeper that there's some trouble for example. That would even increase safety because one will take notice sooner and can react faster.

In other cases, like those that have just happened to me, there was no real fault, just the layer fan at 100% cooling the hotend so much that a fault was detected.
In this case, if the printer stops immediately (Toolboard CAN fix) and notifies me immediately (beep, show a big message), I am able to react fast enough to check what has happened and allow the print to continue. (Of course only if I'm 100% sure no real fault has happened. And yes, I already did this successfully, but only could do so cause I knew a certain print will very likely fail due to "cooldown fault" and was sitting next to the printer with a finger on the pause key.)

wdenker · 16 Oct 2021, 14:05

@t3p3tony correct it paused for a limited time and then the print is gone. It just needs to stay paused indefinitely. I think the time limit is like 5-10 minutes but I'm only checking the printers once every 5-6 hours.

OwenD · 16 Oct 2021, 22:16

@wdenker said in Heater Fault handling improvements:

@t3p3tony correct it paused for a limited time and then the print is gone. It just needs to stay paused indefinitely. I think the time limit is like 5-10 minutes but I'm only checking the printers once every 5-6 hours.

An indefinite pause, is in my mind a dangerous situation when you have a heater fault.
If you're getting heater faults then essentially you have a tuning or hardware problem (if caused by a fan).
However, it's not hard to implement some code to notify you of a fault and allow you to act before RRF shuts off the printer.
Something like this in daemon.g should work.
If you want it to run at intervals less than 10 seconds, put the whole thing a while true loop.

 ;create a global to stare how meany heaters are in error state
if !exists(global.HeaterErrorCount)
	global HeaterErrorCount=0
else
	set global.HeaterErrorCount==0
	
; loop through al heaters and check state for fault
while iterations < #heat.heaters
	if heat.heaters[iterations].state=="fault"
		set global.HeaterErrorCount==global.HeaterErrorCount+1
		
;if any heater is in error state, sound an alarm	
if global.HeaterErrorCount>0		
	M42 P6 S1 ; turn on GPIO connected to alarm relay
	; do some more code to send a text, flash a light or whatever
else
	M42 P6 S0 ; turn off GPIO connected to alarm relay

You could also investigate BtnCmd and the other plugins which have the ability to use HTTP Post, or MQTT to send you a notification.

wdenker · 16 Oct 2021, 22:37

@owend I don't see how an indefinite pause would be dangerous the heater is turned off that has faulted. Which gives me time to fix the issue at hand and continue where it left off.

OwenD · 16 Oct 2021, 22:58

@wdenker
We'll have to agree to disagree...
The reason I see it as a dangerous move is that you have no way of knowing what caused the fault code, or how serious it is. Especially in the five hour timeframe you quote.
For example, if it's a heater runaway, then in it's possible the heater can't be shut down.
That's why RRF also calls M81.
Now if you have a thermal fuse and other hardware safety features, perhaps you can take the risk.
In any case, if the above code doesn't allow you to react in the timeout allowed before forced shutdown then you'll need to continue arguing the case for a configurable option to ignore the fault.

wdenker · 16 Oct 2021, 23:08

@owend of it was stuck on because of a hardware fault shutting down the board wouldn't resolve it either.

OwenD · 17 Oct 2021, 00:00

@wdenker said in Heater Fault handling improvements:

@owend of it was stuck on because of a hardware fault shutting down the board wouldn't resolve it either.

M81 is called to allow the power supply to the heater(s) to be turned off.
So if your system is configured with proper safety in mind, then yes, it will resolve the problem in most cases.
I guess frozen duet would be the exception, but then you wouldn't have a heater fault problem either

wdenker · 17 Oct 2021, 00:08

@owend so when I say pause I mean after the heater is turned off. So that I can repair the issue and then bring temp back and continue. The only thing I don't like about the current setup is that it times out after so long to where you can't recover.

T3P3Tony · 17 Oct 2021, 21:38

@wdenker what is your M570 command set to?
https://duet3d.dozuki.com/Wiki/M570
You can use the S parameter to set a number of minutes between a heater fault and the subsequent cancellation (and power off)

Snnn Integer timeout in minutes (can be set to 0) for print to be cancelled after heater fault (Firmware 1.20 and later). If the S parameter timeout occurs (which only happens if a SD print is in progress), RRF will also try to turn off power via the PS_ON pin.

wdenker · 18 Oct 2021, 18:42

@t3p3tony I don't have one didn't even realize I could configure that. How new is this?

T3P3Tony · 18 Oct 2021, 18:54

@wdenker its been around in some form since RRF 1.14

zapta · 18 Oct 2021, 22:26

@t3p3tony said in Heater Fault handling improvements:

@wdenker its been around in some form since RRF 1.14

That's only 2.16 versions ago.

	;create a global to stare how meany heaters are in error state
	if !exists(global.HeaterErrorCount)
	global HeaterErrorCount=0
	else
	set global.HeaterErrorCount==0

	; loop through al heaters and check state for fault
	while iterations < #heat.heaters
	if heat.heaters[iterations].state=="fault"
	set global.HeaterErrorCount==global.HeaterErrorCount+1

	;if any heater is in error state, sound an alarm
	if global.HeaterErrorCount>0
	M42 P6 S1 ; turn on GPIO connected to alarm relay
	; do some more code to send a text, flash a light or whatever
	else
	M42 P6 S0 ; turn off GPIO connected to alarm relay