Heater Fault handling improvements

jbjhjm

Had some heater faults due to layer fan cooling down hotend too much.
Have to figure out why PID is off/doesn't compensate, but that's another topic.

This is about when heater fault state actually kicks in.
The annoying experience I just made is: One will not notice until it is too late to check/fix.
The printer will not pause, there will be no error message.
If you're lucky you'll stumble upon the lowering temp chart graph or notive the small "fault" label on DWC.

My proposal for improvements are:

RRF should log an error message on heater fault to make it more visible
RRF should call system macro heaterfault.g if it exists to allow for individual handling

A fault should attract attention immediately. It may be a threat to the printer, even catch fire. In best case the print is unusable.
In any of these cases I would want to be notified immediately.

Old threads discussing this without visible results:

OwenD created a deamon.g which can be used.
But I consider it a workaround - complicated solution, and it will not react immediately due to ~1Hz frequency.
Pausing the print there might already be too late for flawless continuation.
https://forum.duet3d.com/topic/20722/heater-fault-checking-routine-to-be-run-in-daemon-g

A macro would be best; I'd use it to do various things:
pause the job, display an error, saving resurrect state and generate audible sound to let me know something's off there.

engikeneer

@jbjhjm another thread I raised something similar in:
https://forum.duet3d.com/topic/24596/rrf3-3-m143-sxx-a3-behaviour-with-external-5v-supply
Would definitely vote for this!

jbjhjm

@engikeneer ah yes this is very similar. Didn't have any bed issues recently so I had no head scratching about PS_ON - but the problem is the same and the ideas would help in both cases.

T3P3Tony

@jbjhjm are you using any CAN connected expansion boards?

jbjhjm

@t3p3tony yes, using a Toolboard v1.1!

T3P3Tony

@jbjhjm the reason there is no pause on heater faults is its still a firmware limitation that heater faults and motor stalls are not handled on CAN conencted boards in the same way as they are on directly connected heaters:

https://duet3d.dozuki.com/Wiki/Duet_3_firmware_configuration_limitations:

The main board does not react to heater faults on expansion boards by pausing the print.

We will remove the limitation ASAP but it depends on underlying work that needs to be done to the CAN protocol.

jbjhjm

thanks for explaining @t3p3tony ! Looking forward to seeing that limitation being blown away!

Til then, best bet for everyone may be to try out the deamon.g checker.
https://forum.duet3d.com/topic/20722/heater-fault-checking-routine-to-be-run-in-daemon-g

wdenker

@t3p3tony can we get it resolved on duet wifi? No can connection there. I run into this usually once every two weeks or so with how many printers I'm running which gets super annoying because I leave them for hours and always find them too late.

OwenD

@jbjhjm said in Heater Fault handling improvements:

Had some heater faults due to layer fan cooling down hotend too much.
Have to figure out why PID is off/doesn't compensate, but that's another topic.

<<SNIP>>

OwenD created a deamon.g which can be used.
But I consider it a workaround - complicated solution, and it will not react immediately due to ~1Hz frequency.
Pausing the print there might already be too late for flawless continuation.
https://forum.duet3d.com/topic/20722/heater-fault-checking-routine-to-be-run-in-daemon-g

A macro would be best; I'd use it to do various things:
pause the job, display an error, saving resurrect state and generate audible sound to let me know something's off there.

My 2 cents....
The daemon code I shared, may seem complicated, but in reality is probably less so than the attendant changes to the source code to implement such things. It's just shifting the workload onto those seeking the change.
I think that's great because you can do things that have low value to the greater majority, or that flies against the tide of what the developers must do for safety and liability reasons.
The first rule must always be that the system fail to safe.

What you're asking for is pretty much for RRF to hand over all control of heater safety to the end users. That's a recipe for disaster IMO.
And yes I note you stated that this would be dependent on the existence of a macro.

There are currently settings for heater deviation time.
Perhaps you can set these a bit higher and use a macro to check for deviation before the hard limit.
This would allow your to pause, but honestly if your bed heater has failed, what chance of a successful re-start?

With regards to the requirement to be able to get flawless continuation, I would say that's very low on the list of requirements if a heater has faulted (which is a potential safety risk).

My motivation for the code I shared was to cater for cases where something happens when the printer is idle.
From what I understand, some heater checks don't occur in this circumstance.
I'm not sure if that's still the case, but it also allows me to monitor coolant flow etc. (plus I enjoy tinkering)

T3P3Tony

@wdenker this is a separate issue AFAIK if it's on duet2. Are you saying you get a heater fault and the print is not paused? As you say there won't be a CAN expansion board in the system if it's a Duet 2 so the limitation does not apply.

T3P3Tony

@owend said in Heater Fault handling improvements:

What you're asking for is pretty much for RRF to hand over all control of heater safety to the end users. That's a recipe for disaster IMO.
And yes I note you stated that this would be dependent on the existence of a macro.

At this stage the plan is to extend the current heater fault behaviour to also work when the heater that has the fault is on a CAN expansion board.

jbjhjm

@owend I agree to your thoughts, however I think my idea is a bit different to what you described. Maybe my initial post was too imprecise.

I am not talking about a macro REPLACING the automated fault reaction.
I am talking about a macro called AFTER reacting to a fault.

As far as I can see this does not violate any safety behavior or risk the safety routines to be broken.
However it allows for additional, individual measurements. Let yourself be notified by a beeper that there's some trouble for example. That would even increase safety because one will take notice sooner and can react faster.

In other cases, like those that have just happened to me, there was no real fault, just the layer fan at 100% cooling the hotend so much that a fault was detected.
In this case, if the printer stops immediately (Toolboard CAN fix) and notifies me immediately (beep, show a big message), I am able to react fast enough to check what has happened and allow the print to continue. (Of course only if I'm 100% sure no real fault has happened. And yes, I already did this successfully, but only could do so cause I knew a certain print will very likely fail due to "cooldown fault" and was sitting next to the printer with a finger on the pause key.)

wdenker

@t3p3tony correct it paused for a limited time and then the print is gone. It just needs to stay paused indefinitely. I think the time limit is like 5-10 minutes but I'm only checking the printers once every 5-6 hours.

OwenD

@wdenker said in Heater Fault handling improvements:

@t3p3tony correct it paused for a limited time and then the print is gone. It just needs to stay paused indefinitely. I think the time limit is like 5-10 minutes but I'm only checking the printers once every 5-6 hours.

An indefinite pause, is in my mind a dangerous situation when you have a heater fault.
If you're getting heater faults then essentially you have a tuning or hardware problem (if caused by a fan).
However, it's not hard to implement some code to notify you of a fault and allow you to act before RRF shuts off the printer.
Something like this in daemon.g should work.
If you want it to run at intervals less than 10 seconds, put the whole thing a while true loop.

;create a global to stare how meany heaters are in error state
if !exists(global.HeaterErrorCount)
	global HeaterErrorCount=0
else
	set global.HeaterErrorCount==0
	
; loop through al heaters and check state for fault
while iterations < #heat.heaters
	if heat.heaters[iterations].state=="fault"
		set global.HeaterErrorCount==global.HeaterErrorCount+1
		
;if any heater is in error state, sound an alarm	
if global.HeaterErrorCount>0		
	M42 P6 S1 ; turn on GPIO connected to alarm relay
	; do some more code to send a text, flash a light or whatever
else
	M42 P6 S0 ; turn off GPIO connected to alarm relay

You could also investigate BtnCmd and the other plugins which have the ability to use HTTP Post, or MQTT to send you a notification.

wdenker

@owend I don't see how an indefinite pause would be dangerous the heater is turned off that has faulted. Which gives me time to fix the issue at hand and continue where it left off.

OwenD

@wdenker
We'll have to agree to disagree...
The reason I see it as a dangerous move is that you have no way of knowing what caused the fault code, or how serious it is. Especially in the five hour timeframe you quote.
For example, if it's a heater runaway, then in it's possible the heater can't be shut down.
That's why RRF also calls M81.
Now if you have a thermal fuse and other hardware safety features, perhaps you can take the risk.
In any case, if the above code doesn't allow you to react in the timeout allowed before forced shutdown then you'll need to continue arguing the case for a configurable option to ignore the fault.

wdenker

@owend of it was stuck on because of a hardware fault shutting down the board wouldn't resolve it either.

OwenD

@wdenker said in Heater Fault handling improvements:

@owend of it was stuck on because of a hardware fault shutting down the board wouldn't resolve it either.

M81 is called to allow the power supply to the heater(s) to be turned off.
So if your system is configured with proper safety in mind, then yes, it will resolve the problem in most cases.
I guess frozen duet would be the exception, but then you wouldn't have a heater fault problem either

wdenker

@owend so when I say pause I mean after the heater is turned off. So that I can repair the issue and then bring temp back and continue. The only thing I don't like about the current setup is that it times out after so long to where you can't recover.

T3P3Tony

@wdenker what is your M570 command set to?
https://duet3d.dozuki.com/Wiki/M570
You can use the S parameter to set a number of minutes between a heater fault and the subsequent cancellation (and power off)

Snnn Integer timeout in minutes (can be set to 0) for print to be cancelled after heater fault (Firmware 1.20 and later). If the S parameter timeout occurs (which only happens if a SD print is in progress), RRF will also try to turn off power via the PS_ON pin.