Toolboard keeps loosing heater 1
-
I keep getting random failures on my toolboard where it looses heater 1. We thought it was heat caused and we were able to get it under control but now we don't know where to look. Here is the m122 b121 and m122 as well as the config.g I am at a loss. We know our can communication is good at this point as well. config (12).g
M122 B121
Diagnostics for board 121:
Duet TOOL1LC rev 1.1 or later firmware version 3.4.4 (2022-10-14 11:46:33)
Bootloader ID: SAMC21 bootloader version 2.4 (2021-12-10)
All averaging filters OK
Never used RAM 3080, free system stack 45 words
Tasks: Move(notifyWait,1.7%,91) HEAT(notifyWait,0.1%,115) CanAsync(notifyWait,0.0%,65) CanRecv(notifyWait,0.3%,74) CanClock(notifyWait,0.0%,65) ACCEL(notifyWait,0.0%,61) TMC(delaying,3.1%,57) MAIN(running,89.8%,441) IDLE(ready,0.0%,26) AIN(delaying,5.1%,142), total 100.0%
Last reset 00:01:42 ago, cause: power up
Last software reset data not available
Driver 0: pos 183615, 80.0 steps/mm,ok, SG min 0, read errors 0, write errors 0, ifcnt 9, reads 50700, writes 9, timeouts 0, DMA errors 0, CC errors 0, steps req 230109 done 229541
Moves scheduled 4302, completed 4296, in progress 1, hiccups 0, step errors 0, maxPrep 595, maxOverdue 59723241, maxInc 59723061, mcErrs 0, gcmErrs 0
Peak sync jitter 0/4, peak Rx sync delay 216, resyncs 0/0, next step interrupt due in 43 ticks, enabled
VIN voltage: min 24.3, current 24.4, max 24.6
MCU temperature: min 59.7C, current 60.3C, max 61.4C
Last sensors broadcast 0x00000000 found 0 173 ticks ago, 0 ordering errs, loop time 0
CAN messages queued 842, send timeouts 0, received 5235, lost 0, free buffers 37, min 36, error reg 0
dup 0, oos 0/0/0/0, bm 0, wbm 0, rxMotionDelay 496, adv -59723030/74654
Accelerometer: LIS3DH, status: 00
I2C bus errors 1, naks 3, other errors 0M122
=== Diagnostics ===
RepRapFirmware for Duet 3 MB6HC version 3.4.5 (2022-11-30 19:35:23) running on Duet 3 MB6HC v1.02 or later (standalone mode)
Board ID: 08DJM-9P63L-DJ3S0-7J9D4-3SN6J-9UMZA
Used output buffers: 3 of 40 (40 max)
=== RTOS ===
Static ram: 152760
Dynamic ram: 99104 of which 0 recycled
Never used RAM 97704, free system stack 114 words
Tasks: NETWORK(ready,65.1%,220) ETHERNET(notifyWait,1.9%,401) HEAT(notifyWait,0.4%,322) Move(notifyWait,40.1%,245) CanReceiv(notifyWait,0.7%,772) CanSender(notifyWait,0.8%,326) CanClock(delaying,0.1%,339) TMC(notifyWait,61.7%,57) MAIN(running,56.1%,925) IDLE(ready,0.0%,30), total 227.1%
Owned mutexes:
=== Platform ===
Last reset 27:05:39 ago, cause: power up
Last software reset at 2023-06-12 23:10, reason: User, GCodes spinning, available RAM 95568, slot 0
Software reset code 0x0003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x00400000 BFAR 0x00000000 SP 0x00000000 Task MAIN Freestk 0 n/a
Error status: 0x04
Aux0 errors 0,3,0
Step timer max interval 277
MCU temperature: min 31.8, current 37.6, max 41.8
Supply voltage: min 23.8, current 24.1, max 24.3, under voltage events: 0, over voltage events: 0, power good: yes
12V rail voltage: min 11.9, current 12.2, max 12.5, under voltage events: 0
Heap OK, handles allocated/used 0/0, heap memory allocated/used/recyclable 0/0/0, gc cycles 0
Events: 0 queued, 0 completed
Driver 0: ok, SG min 0, mspos 234, reads 26227, writes 68 timeouts 0
Driver 1: standstill, SG min n/a, mspos 8, reads 26295, writes 0 timeouts 0
Driver 2: ok, SG min 0, mspos 579, reads 26226, writes 68 timeouts 0
Driver 3: ok, SG min 0, mspos 549, reads 26245, writes 50 timeouts 0
Driver 4: ok, SG min 0, mspos 709, reads 26245, writes 50 timeouts 0
Driver 5: ok, SG min 0, mspos 234, reads 26227, writes 68 timeouts 0
Date/time: 2023-06-15 21:27:48
Slowest loop: 789.25ms; fastest: 0.04ms
=== Storage ===
Free file entries: 9
SD card 0 detected, interface speed: 25.0MBytes/sec
SD card longest read time 4.0ms, write time 43.5ms, max retries 0
=== Move ===
DMs created 125, segments created 46, maxWait 9424672ms, bed compensation in use: mesh, comp offset 0.000
=== MainDDARing ===
Scheduled moves 191582, completed 191522, hiccups 0, stepErrors 0, LaErrors 0, Underruns [32, 0, 10], CDDA state 3
=== AuxDDARing ===
Scheduled moves 0, completed 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1
=== Heat ===
Bed heaters 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamber heaters 2 -1 -1 -1, ordering errs 0
Heater 0 is on, I-accum = 0.0
Heater 2 is on, I-accum = 0.0
=== GCodes ===
Segments left: 1
Movement lock held by null
HTTP is idle in state(s) 0
Telnet is idle in state(s) 0
File is doing "G1 F5065.426" in state(s) 0
USB is idle in state(s) 0
Aux is idle in state(s) 0
Trigger is idle in state(s) 0
Queue is idle in state(s) 0
LCD is idle in state(s) 0
SBC is idle in state(s) 0
Daemon is idle in state(s) 0
Aux2 is idle in state(s) 0
Autopause is idle in state(s) 0
Code queue is empty
=== CAN ===
Messages queued 3879546, received 2688079, lost 0, boc 0
Longest wait 7ms for reply type 6024, peak Tx sync delay 571, free buffers 50 (min 36), ts 480652/480652/0
Tx timeouts 0,0,0,0,0,0
=== Network ===
Slowest loop: 2628.63ms; fastest: 0.03ms
Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0)
HTTP sessions: 1 of 8
= Ethernet =
State: active
Error counts: 0 0 13 0 0 0
Socket states: 5 2 2 2 2 2 0 2
= WiFi =
Network state is disabled
WiFi module is disabled
Failed messages: pending 2779096485, notready 2779096485, noresp 2779096485
Socket states: 0 0 0 0 0 0 0 0
=== Multicast handler ===
Responder is inactive, messages received 0, responses 0 -
-
@wdenker looks like you have a power issue as the toolboard reset 1 min 42 before the M122.
in RRF 3.4, the firmware doesn't know if a toolboard/expansion board has been lost. This functionality has been added in 3.5b4. you could update to that and keep an eye on things (3.5b4 prints ok for me) -
@jay_s_uk sadly the customer likes to pull the power to reset. So that is not the issue.
-
@wdenker they pull the power on the toolboard but not the mainboard?
anyway, if they do that then my comment about the mainboard not knowing still stands -
@jay_s_uk The heater loss doesn't persist after the pull of power. Because the customer knows hey if I pull power it'll come back and I can start over. He then grabs the m122 separately from each other, one before the power loss and one after.
-
@wdenker so why does the M122 for mainboard show an uptime of 27 hours and the M122 for the toolboard shows 1 minute?
-
@jay_s_uk because the m122 was pulled for one before the power was pulled and one after.
-
@wdenker its probably a communication/cable issue anyway but you won't know with 3.4.5 as it doesn't tell you
-
@jay_s_uk the can cable communication light never blinks irratic. We originally had an issue with the can connection. We have resolved it since and now getting this error vs stuff just stopping like it lost connection briefly. Which is how a bad can cable presents itself.
-
@wdenker update to 3.5b4 so you can see if there are any reports of a CAN board not communicating.
-
@jay_s_uk found a good indicator... I2C bus errors 1 is what shows when it is having the issue, errors 0 when not having the issue. So am I safe to assume it is can connection again even though nothing else indicating can connection issues? I also updated to 3.5.0-beta.4 and am not getting any notifications about can timeout or connection issues which I would have expected.
-
-
@wdenker Do you have the opportunity to swap the toolboard in case its a hardware issue? The I2C bus error is not directly related to the heater (its used for the LIS3DH only, however its interesting that they happen at the same time.
How hot are you running the toolboard?
-
@T3P3Tony we swapped to new board and still getting the same issue just not as frequent. So we then decided to swap the thermistor ports to a phoenix connector instead of the JST. Seems like this has resolved the issue. What else does that error indicate or work with other than the accelerometer?
-
@wdenker the message "Board 121 does not have heater 1" implies that the tool board has reset. The M122 B121 report indicates that it was not a firmware crash or other software reset, because the Last Software Reset Data is shown as "not available". So it must have been a watchdog reset, hardware reset, or power failure. Therefore, we need to see M122 reports for both boards before pulling the power, to determine the reason for the tool board reset. Please ask your customer to provide this next time the fault occurs.
-
@dc42 or @T3P3Tony Here are the M122 from each board prior to power restart. Looks like toolboard ran out of memory. 2023-06-21 10:39, reason: OutOfMemory, How do I resolve that?
M122
=== Diagnostics ===
RepRapFirmware for Duet 3 MB6HC version 3.5.0-beta.4 (2023-06-08 23:41:30) running on Duet 3 MB6HC v1.02 or later (standalone mode)
Board ID: 08DJM-9P63L-DJ3S0-7J9D4-3SN6J-9UMZA
Used output buffers: 9 of 40 (40 max)
=== RTOS ===
Static ram: 155012
Dynamic ram: 121824 of which 52 recycled
Never used RAM 67352, free system stack 122 words
Tasks: NETWORK(1,ready,33.0%,139) ETHERNET(5,nWait,2.1%,317) HEAT(3,nWait,1.3%,323) Move(4,nWait,88.6%,214) CanReceiv(6,nWait,1.7%,642) CanSender(5,nWait,2.0%,326) CanClock(7,delaying,0.3%,349) TMC(4,nWait,131.2%,59) MAIN(1,running,137.7%,137) IDLE(0,ready,0.5%,30), total 398.4%
Owned mutexes:
=== Platform ===
Last reset 50:22:42 ago, cause: power up
Last software reset at 2023-06-17 02:19, reason: User, Gcodes spinning, available RAM 68440, slot 1
Software reset code 0x0003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0043c000 BFAR 0x00000000 SP 0x00000000 Task MAIN Freestk 0 n/a
Error status: 0x14
Aux0 errors 0,6,0
MCU temperature: min 30.3, current 40.2, max 43.8
Supply voltage: min 23.8, current 24.1, max 24.3, under voltage events: 0, over voltage events: 0, power good: yes
12V rail voltage: min 11.9, current 12.2, max 12.6, under voltage events: 0
Heap OK, handles allocated/used 0/0, heap memory allocated/used/recyclable 0/0/0, gc cycles 0
Events: 13 queued, 13 completed
Driver 0: standstill, SG min 0, mspos 976, reads 2699, writes 208 timeouts 0
Driver 1: standstill, SG min n/a, mspos 8, reads 2896, writes 11 timeouts 0
Driver 2: standstill, SG min 0, mspos 80, reads 2707, writes 200 timeouts 0
Driver 3: standstill, SG min 0, mspos 112, reads 2738, writes 169 timeouts 0
Driver 4: standstill, SG min 0, mspos 880, reads 2738, writes 169 timeouts 0
Driver 5: standstill, SG min 0, mspos 464, reads 2700, writes 208 timeouts 0
Date/time: 2023-06-21 11:11:07
Slowest loop: 1000.15ms; fastest: 0.05ms
=== Storage ===
Free file entries: 18
SD card 0 detected, interface speed: 25.0MBytes/sec
SD card longest read time 4.5ms, write time 380.9ms, max retries 0
=== Move ===
DMs created 125, segments created 73, maxWait 5933869ms, bed compensation in use: mesh, height map offset 0.000, ebfmin 0.00, ebfmax 0.00
no step interrupt scheduled
=== DDARing 0 ===
Scheduled moves 102436, completed 102436, hiccups 0, stepErrors 0, LaErrors 0, Underruns [341, 0, 24], CDDA state -1
=== DDARing 1 ===
Scheduled moves 0, completed 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1
=== Heat ===
Bed heaters 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamber heaters 2 -1 -1 -1, ordering errs 0
Heater 0 is on, I-accum = 0.0
Heater 2 is on, I-accum = 0.0
=== GCodes ===
Movement locks held by null, null
HTTP is idle in state(s) 0
Telnet is idle in state(s) 0
File is idle in state(s) 0
USB is idle in state(s) 0
Aux is idle in state(s) 0
Trigger is idle in state(s) 0
Queue is idle in state(s) 0
LCD is idle in state(s) 0
SBC is idle in state(s) 0
Daemon is idle in state(s) 0
Aux2 is idle in state(s) 0
Autopause is idle in state(s) 0
File2 is idle in state(s) 0
Queue2 is idle in state(s) 0
Q0 segments left 0, axes/extruders owned 0x80000007
Code queue 0 is empty
Q1 segments left 0, axes/extruders owned 0x0000000
Code queue 1 is empty
=== CAN ===
Messages queued 7095317, received 5098900, lost 0, boc 56
Longest wait 5ms for reply type 6029, peak Tx sync delay 623, free buffers 50 (min 47), ts 906815/906786/0
Tx timeouts 0,0,0,0,0,0
=== Network ===
Slowest loop: 8427.39ms; fastest: 0.03ms
Responder states: MQTT(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0)
HTTP sessions: 1 of 8
= Ethernet =
Interface state: active
Error counts: 0 0 0 1 0 0
Socket states: 5 2 2 2 2 2 0 2
= WiFi =
Interface state: disabled
Module is disabled
Failed messages: pending 0, notready 0, noresp 0
Socket states: 0 0 0 0 0 0 0 0
=== Multicast handler ===
Responder is inactive, messages received 0, responses 0M122 B121
Diagnostics for board 121:
Duet TOOL1LC rev 1.1 or later firmware version 3.5.0-beta.4 (2023-06-08 16:22:30)
Bootloader ID: SAMC21 bootloader version 2.4 (2021-12-10)
All averaging filters OK
Never used RAM 1812, free system stack 88 words
Tasks: Move(3,nWait,0.7%,111) HEAT(2,nWait,0.1%,101) CanAsync(5,nWait,0.0%,54) CanRecv(3,nWait,0.1%,75) CanClock(5,nWait,0.0%,66) ACCEL(3,nWait,0.0%,53) TMC(2,delaying,3.1%,57) MAIN(1,running,90.8%,444) IDLE(0,ready,0.0%,27) AIN(2,delaying,5.1%,142), total 100.0%
Last reset 00:10:46 ago, cause: power up
Last software reset at 2023-06-21 10:39, reason: OutOfMemory, available RAM 12, slot 0
Software reset code 0x01c0 ICSR 0x00000000 SP 0x200054a8 Task Move Freestk 137 ok
Stack: 20005600 000062df 00000000 2000554c a5a5a5a5 00009f99 00000000 00008a67 0001f1c6 00000000 477e8400 477e8400 0000005b 007e8400 a625a5a5 a5a5a5a5 0001ee68 477e8400 477fa700 2e57b417 2e57b417 2000554c 20007100 20005108 3627cde7 00008c77 bdd53594
Driver 0: pos 0, 80.0 steps/mm, standstill, SG min 0, read errors 0, write errors 0, ifcnt 9, reads 59686, writes 9, timeouts 3, DMA errors 0, CC errors 0, failedOp 0x72, steps req 0 done 1157273
Moves scheduled 13624, completed 13624, in progress 0, hiccups 468, step errors 0, maxPrep 602, maxOverdue 2022819261, maxInc 2022780880, mcErrs 0, gcmErrs 0, ebfmin 0.00, ebfmax 1.00
Peak sync jitter 0/5, peak Rx sync delay 261, resyncs 0/0, no timer interrupt scheduled
VIN voltage: min 24.6, current 24.7, max 24.8
MCU temperature: min 58.7C, current 60.5C, max 61.1C
Last sensors broadcast 0x00000000 found 0 30 ticks ago, 0 ordering errs, loop time 0
CAN messages queued 5200, send timeouts 0, received 19467, lost 0, free buffers 18, min 16, error reg 0
dup 0, oos 0/0/0/0, bm 0, wbm 0, rxMotionDelay 837, adv -2022819048/74672
Accelerometer: LIS3DH, status: 00
I2C bus errors 0, naks 3, other errors 0Diagnostics for board 1:
Duet EXP3HC rev 1.02 or later firmware version 3.5.0-beta.4 (2023-06-08 16:24:05)
Bootloader ID: SAME5x bootloader version 2.3 (2021-01-26b1)
All averaging filters OK
Never used RAM 156016, free system stack 172 words
Tasks: Move(3,nWait,1.1%,104) HEAT(2,nWait,1.2%,82) CanAsync(5,nWait,0.0%,67) CanRecv(3,nWait,1.4%,79) CanClock(5,nWait,0.4%,70) TMC(2,nWait,135.7%,69) MAIN(1,running,46.7%,456) IDLE(0,ready,0.0%,40) AIN(2,delaying,61.7%,265), total 248.2%
Last reset 50:23:10 ago, cause: power up
Last software reset data not available
Driver 0: pos -6382700, 320.0 steps/mm, standstill, SG min 0, mspos 464, reads 3139, writes 165 timeouts 0, steps req 6 done 4042678
Driver 1: pos -6382777, 320.0 steps/mm, standstill, SG min 0, mspos 880, reads 3140, writes 165 timeouts 0, steps req 6 done 4042745
Driver 2: pos 0, 80.0 steps/mm, standstill, SG min 0, mspos 8, reads 3295, writes 11 timeouts 0, steps req 0 done 0
Moves scheduled 1377083, completed 1377083, in progress 0, hiccups 0, step errors 0, maxPrep 52, maxOverdue 41, maxInc 11, mcErrs 0, gcmErrs 0, ebfmin 0.00, ebfmax 0.00
Peak sync jitter -8/9, peak Rx sync delay 188, resyncs 0/0, no timer interrupt scheduled
VIN voltage: min 24.2, current 24.3, max 24.4
V12 voltage: min 12.3, current 12.3, max 12.4
MCU temperature: min 43.1C, current 47.2C, max 51.2C
Last sensors broadcast 0x00000000 found 0 30 ticks ago, 0 ordering errs, loop time 0
CAN messages queued 1451192, send timeouts 0, received 3716073, lost 0, free buffers 38, min 38, error reg ff0000
dup 0, oos 0/0/0/0, bm 0, wbm 0, rxMotionDelay 420, adv 8041/74573 -
-
@jay_s_uk already have tried that.
-
@wdenker the out of memory issue is new in firmware 3.5.0-beta.4. If you have already tried using a less demanding input shaping method then I suggest you revert to firmware 3.4.5 on the main board (and 3.4.4 on the tool board, which is the version included in the 3.4.5 release). Then when the problem happens again, get a M122 B121 report again before powering down.