Unsolved Temperature warnings are so frequent that DWC control is lost
-
The machine I am working on needs improved cooling for the MCU/stepper drivers. In the meanwhile the volume of the over-temperature-warnings being written to the console seem to flood DWC such that you cannot connect or control the machine. It is impossible to connect to the machine until things cool down. Is there a way to limit these messages?
Board: Duet 2 WiFi (2WiFi) Firmware: RepRapFirmware for Duet 2 WiFi/Ethernet 3.4.0 (2022-03-15) Duet WiFi Server Version: 1.26
-
@brendon said in Temperature warnings are so frequent that DWC control is lost:
Is there a way to limit these messages?
Cool the board?
-
@phaedrux This is an inane, embarrassing response to receive from a "moderator". I clearly mentioned that I was intending to better cool the board. Thank you for wasting both of our time.
In the meanwhile, there should not exist a thermal scenario where the board is warm enough to throw warnings but able to continue moving/operating if it cannot safely be controlled via DWC. Once the board enters this state, the single web browser connected to DWC disconnects, meaning there is no way to pause/stop stepper motion/spindle/etc without killing the power to the board. Perhaps it's possible that the wireless portion of the Duet2 becomes too warm to operate, but the plethora of warnings in the console immediately before it disconnects seems to imply the volume of messages is the issue.
-
In version 3.4 you control what happens in the event of a driver error or warning in driver-error.g or driver-warning.g
You need to create it and take appropriate action. This may include disabling the driver in your case to stop the errors.Driver stalls, errors and warnings that occur on expansion boards are now reported to the main board and treated in the same way as local driver errors and warnings. System macro file driver-stall.g, driver-error.g or driver-warning.g (as appropriate) is run if it exists. The local driver number is passed as param.D, the CAN board address as param.B, the encoded driver status as param.P and the error or warning message as param.S.
EDIT:
Also see here which gives more info including default actions.
https://docs.duet3d.com/en/User_manual/RepRapFirmware/Events#processing-events -
@owend Thanks for the tip! I'll look into implementing driver-error.g and driver-warning.g on my machine.
I'll have to run a test with driver-warning.g implemented to see if it suppresses the flood of messages to the console. It still seems awkward, as the default, that control of the machine is lost when the MCU was only reporting ~60c.
- RRF 3.3 reports driver warnings (e.g. over temperature, or phase disconnected) to the user. RRF 3.4 attempts to run driver_warning.g. If that file is not found then it just notifies the user via the console.
-
@brendon the driver warnings come on if the stepper driver chips detect over 100C (i think) internally, and error at over 150C. They dont report an actual temperature number, just a state. The MCU temp you are reading is over an inch away so won't be at the same temperature, but I'd still say 60C is reasonably toasty (mine is usually hovering about the 30-35C mark, though the board is actively cooled and not pushing high currents...)
Does seem that flooding DWC with repeated messages at a high rate is unnecessary though. Its a warning so you may intend to act on it (e.g. pause the print, change a setting etc), rather than just kill power....
-
@brendon Just curious: have you cooled down in the meantime … - I mean, your board? Instead of abusing the moderator, better waste some thoughts on potential reasons of the warnings:
How many amps do you pull from your drivers? What’s in your config.g? Does one of the driver chips (or even more than one) feel hot? There are just two possibilities: either the firmware is going wild, or the driver’s reports tell the bitter truth.
To suppress the warnings is no option if you don’t want to risk your board. First things first: Have you tried to use a hair blower (cold air) or something similar? Does that help to get rid of the warnings?
the plethora of warnings in the console immediately before it disconnects seems to imply the volume of messages is the issue.
That’s wishful thinking: find the issue for the warnings, don’t bitch about how many of them you get.
-
@engikeneer I agree that it's logical that the driver reports the warnings to the console. My issue is that when the board is in that state, you cannot connect to DWC which means you cannot pause or stop the machine.
-
@infiniteloop I know that the board requires active cooling in the position that it's mounted on this machine. I have already mounted a fan and that has resolved the warnings.
My issue is that the board can get into a state where the machine is still moving/operating, but cannot be paused/stopped since DWC is unable to connect, possibly due to the high rate of errors that are being sent to the console.
-
@brendon said in Temperature warnings are so frequent that DWC control is lost:
but the plethora of warnings in the console immediately before it disconnects seems to imply the volume of messages is the issue.
On this point.
A quick look (by an absolute novice) at the source code shows that events like this are queued.
They are not repeated if a similar event s already in the queue.
However if you have no driver-warning.g file, the default action is just to echo a warning to the console. This would take milliseconds so the event would be cleared from the queue very quickly.
Adding a G4 into the driver-warning.g and driver-error.g seems like it would limit the display of the error messages as the event hasn't finished and should allow interaction via DWC as that would be on a different input.
I can only test by simulation with M957 though and that doesn't cause the error to persist, however sending multiple M957 in quick succession doesn't cause the macro to run multiple times which suggests it's still queued until the macro exits.
Scratch that.
It did run more than once. -
@brendon said in Temperature warnings are so frequent that DWC control is lost:
@phaedrux This is an inane, embarrassing response to receive from a "moderator". I clearly mentioned that I was intending to better cool the board. Thank you for wasting both of our time.
In the meanwhile, there should not exist a thermal scenario where the board is warm enough to throw warnings but able to continue moving/operating if it cannot safely be controlled via DWC. Once the board enters this state, the single web browser connected to DWC disconnects, meaning there is no way to pause/stop stepper motion/spindle/etc without killing the power to the board. Perhaps it's possible that the wireless portion of the Duet2 becomes too warm to operate, but the plethora of warnings in the console immediately before it disconnects seems to imply the volume of messages is the issue.
I'm sorry for the levity, but it really is your immediate solution. If the drivers are warning you of overtemp they are over 100c. Cool them down.
The rate of warnings you are reporting may be an issue, and I'll definitely be passing that along to have it looked at, but that's a code change potentially for another day and won't help you right now.
-
@phaedrux I appreciate you checking back in. I apologize for being short with you; there's a place for levity when not counter-productive, but it's not helpful when the problem being reported could be construed as a possible safety issue. I was probably too brief in acknowledging I understood active-cooling was required, I was in the process of mounting a fan when I made the post. Since the fan has been mounted the board is behaving as reliably as expected.
My concern is that it's possible that there is no rate-limiting on the warnings being issued to the console. From what I saw, this only affected DWC and the machine seemed to be behaving/continuing it's expected operations. As a software engineer, it's not a huge leap to imagine that if more drivers started emitting incessant warnings that the actual machine performance could be affected. It's also possible that the loss of DWC control was not due to the volume of warnings, but some other gray area WRT thermal accumulation. I can only speculate as the last thing I saw in DWC before it disconnected was a flood of warnings.
Obviously if there was an actual risk to myself or the machine the power could be disconnected, I just didn't want to scrap hours of experiments if it could safely be avoided.
-
@brendon said in Temperature warnings are so frequent that DWC control is lost:
it's not a huge leap to imagine that if more drivers started emitting incessant warnings that the actual machine performance could be affected. It's also possible that the loss of DWC control was not due to the volume of warnings, but some other gray area WRT thermal accumulation. I can only speculate as the last thing I saw in DWC before it disconnected was a flood of warnings.
Yes this will be investigated. Thank you for reporting it.