Intermittent communication disruption between 6HC and 3HC
-
Will do.
Quick question - do I need to redo PID for 3.2? I thought I saw that there is a new algorithm, but it's not necessary to redo PID.
I am seeing some very weird behavior on 3.2. Regardless of the temperature I set on the bed, it drops temps without the firmware screaming bloody murder and thermal runaway.
Attaching screenshot. Please note that the temp is 100C, while it's set at 105C.
Hotend temp dropping was my doing.Just in case, log from the main board below.
m122 b0 === Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.2 running on Duet 3 MB6HC v0.6 or 1.0 (standalone mode) Board ID: 08DJM-956L2-G43S4-6JKDA-3SJ6T-1B6GH Used output buffers: 1 of 40 (21 max) === RTOS === Static ram: 149788 Dynamic ram: 94408 of which 76 recycled Never used RAM 114560, free system stack 129 words Tasks: NETWORK(ready,189) ETHERNET(blocked,109) SENSORS(blocked,19) HEAT(blocked,289) CanReceiv(blocked,848) CanSender(blocked,341) CanClock(blocked,352) TMC(blocked,19) MAIN(running,717) IDLE(ready,19) Owned mutexes: HTTP(MAIN) === Platform === Last reset 00:29:42 ago, cause: power up Last software reset at 2021-02-03 19:30, reason: User, GCodes spinning, available RAM 114560, slot 2 Software reset code 0x0003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x00400000 BFAR 0x00000000 SP 0x00000000 Task MAIN Freestk 0 n/a Error status: 0x00 Aux0 errors 0,0,0 Aux1 errors 0,0,0 MCU temperature: min 37.9, current 44.5, max 46.5 Supply voltage: min 23.9, current 24.0, max 24.1, under voltage events: 0, over voltage events: 0, power good: yes 12V rail voltage: min 12.1, current 12.1, max 12.2, under voltage events: 0 Driver 0: position 23945, ok, reads 31637, writes 21 timeouts 0, SG min/max 0/1023 Driver 1: position 3358, ok, reads 31637, writes 21 timeouts 0, SG min/max 0/1023 Driver 2: position 8994, ok, reads 31637, writes 21 timeouts 0, SG min/max 0/1023 Driver 3: position 0, ok, reads 31638, writes 21 timeouts 0, SG min/max 0/1023 Driver 4: position 0, ok, reads 31638, writes 21 timeouts 0, SG min/max 0/1023 Driver 5: position 0, ok, reads 31638, writes 21 timeouts 0, SG min/max 0/1023 Date/time: 2021-02-03 20:01:50 Slowest loop: 44.47ms; fastest: 0.05ms === Storage === Free file entries: 9 SD card 0 detected, interface speed: 25.0MBytes/sec SD card longest read time 3.4ms, write time 0.0ms, max retries 0 === Move === DMs created 125, maxWait 89884ms, bed compensation in use: mesh, comp offset 0.000 === MainDDARing === Scheduled moves 70080, completed moves 70020, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state 3 === AuxDDARing === Scheduled moves 0, completed moves 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === Heat === Bed heaters = 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1 Heater 0 is on, I-accum = 0.2 Heater 1 is on, I-accum = 0.3 === GCodes === Segments left: 1 Movement lock held by null HTTP is ready with "m122 b0" in state(s) 0 Telnet is idle in state(s) 0 File is doing "G1 X178.308 Y129.689 E0.02381" in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger is idle in state(s) 0 Queue is idle in state(s) 0 LCD is idle in state(s) 0 SBC is idle in state(s) 0 Daemon is idle in state(s) 0 Aux2 is idle in state(s) 0 Autopause is idle in state(s) 0 Code queue is empty. === Network === Slowest loop: 79.63ms; fastest: 0.02ms Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0), 0 sessions Telnet(0), 0 sessions HTTP sessions: 1 of 8 - Ethernet - State: active Error counts: 0 0 1 0 0 Socket states: 2 5 2 2 2 0 0 0 === CAN === Messages queued 73379, send timeouts 0, received 7629, lost 0, longest wait 5ms for reply type 6029, free buffers 48
-
@pkos I would say it's worth trying the new PID tuning. The old values should be usable though.
-
I'll give another PID tune a try then just in case.
Still... weird that firmware didn't alert me at once that the temperature is not rising fast enough - especially since it was below the set temp and actually very slowly dropping - you don't see it on the screenshot above, but by the end of that print, temp was sub 100C. -
You don't have to re-run PID tuning for 3.2 if you are happy with the PID performance under 3.1.1. You do need to re-tune if you want the benefit of hot end heater feedforward to compensate for changes in the speed of the print cooling fan.
-
Update from today.
I ran two tests today on 3.2 (6HC) and 3.2.1 (3HC).
One was a print and generic sync test, the other was a sync test after a long power down.Quick sync, but print failed on 3.2 and 3.2.1
There was no problem with sync. That caught on pretty much immediately.
Print failed very quickly - after about 5 layers, I heard lost steps and the print was shifted. I stopped the print and turned the device off for the second part of the test.Instant sync after a long power down.
I left the printer off for a good 6 hours. Now I came back and turned it on - it seemed almost as if both boards came up at the same time and synced up even before the rest of the system launched.So I guess the sync problem can be considered solved.
A new symptom that I never had before is the behavior of the hotbed. Historically, this printer has been running just fine since May - often printing long prints, never had issues with temperature. I guess it's just coincidence that now I keep seeing so many issues
The problem is that after about an hour of the hotbed being heated up to ABS temps (about 105C) - I notice that the thermistor starts reporting dropping values, but the firmware doesn't complain about it. One screenshot was posted above. Today I had this happen a couple times. I now need to figure out what is going on there. I'm guessing a hardware issue, but I am curious why the firmware doesn't complain.
My hotbed is an 8mm alu plate with an AC Keenovo mat on it (it has the thermistor embedded in it). That mat is connected to the Duet 6HC via a Crydom SSR. There is a thermal fuse between the Crydom and the Keenovo mat (it triggers at 125C). The PSU is a proper Mean Well (bought from TME) not a noname PSU.
If you want me to shift to a new topic to track this, I'll be happy to, but for now - I'll ask here.
Is there a way to see what is being sent to the hotbed output from the 6HC? Forgive the comparison, but there is one nice feature in Klipper and Mainsail that would help debugging here. They show that data in their GUI (screenshot below). I'm wondering if there's a way to see that on a Duet with RRF - this would help debugging a lot since I could see what/if anything is sent to the board when the temps start dropping and how the numbers compare to the time, when temp is held steady at the preset temperature.
Is this kind of behavior normal? I set the temp of the hotbed to 105C, it gets there fine and stays there for a while, but - I see temps start dropping to 100C, but the firmware doesn't complain? Even if I set the temperature to 120C - temps do not rise, but keep falling. It's very slow (1C over 10 or so minutes), but still - it is falling.
I did run a new PID tune on the bed for the temps I usually print at (ABS, so ~105C). But that didn't eliminate the problem.
-
OK. So it seems the Keenovo mat is broken - or at least everything points to it being at fault regarding the temps.
What bugs me though is how come the firmware didn't object when instead of temp being on point, it started dropping...
-
@pkos said in Intermittent communication disruption between 6HC and 3HC:
What bugs me though is how come the firmware didn't object when instead of temp being on point, it started dropping...
The allowed temperature drop before a fault is registered is set by the M570 command. The default is 15C because when we tried setting it lower, too many users reported heater faults, mostly caused by print cooling fans blowing too much cold air at the heater block. I suggest you configure a lower maximum temperature drop, at least for the bed heater.
-
Understood. Thank you.
I have a new mat on the way (lucky to find someone wanting to get rid of theirs locally), but until I get it - I won't be able to do any more testing for the next couple of days.
-
@pkos said in Intermittent communication disruption between 6HC and 3HC:
Understood. Thank you.
I have a new mat on the way (lucky to find someone wanting to get rid of theirs locally), but until I get it - I won't be able to do any more testing for the next couple of days.
No problem.
I think the issue with missed steps you saw with firmware 3.2 was caused by a problem that I suspected to exist in firmware 3.2 on the MB6HC (caused by the code speed increase in 3.2 on MB6HC) which we fixed in 3.3. When we discovered this issue, I was surprised that we had no reports of missed steps with 3.2 on MB6HC. I guess you are the first user to use the high step rates on the MB6HC that are needed to trigger this problem.
I have back-ported the fix for this to release 3.2.1 which is currently undergoing testing.
-
Ah sweet that you found it too
Granted, I am printing at pretty fast speeds - infil at 150mm/s, move at 300 mm/s and this was visible usually around layer 3 or 4.
I can't wait to see how 3.2.1 will do then once I get my printer back online and I see you are looking at a solution for DHT22 (even if it means dropping DHT11)
You guys are amazing! Thanks for the support and patience. I am a big fan of the Duet ecosystem
-
The candidate 3.2.1 firmware already includes the fix for DHT sensors on MB6HC. It's already available on Dropbox but needs further testing before we release it.
-
The 3.2.1 candidate firmware is now available for community testing, see https://forum.duet3d.com/topic/21446/candidate-3-2-1-firmware-binaries-available.
-
Just a short update.
My printer is back online, I've been printing happily for the last two days on 3.2.2.
DHT works perfectly fine, no interruptions in printing, no skipped layers, printer comes on immediately.
I'd say all problems are solved!
Thank you so very much for your help @dc42.
I'll keep monitoring and tinkering with the firmware more, but I expect it to work fine.