1LC Multiple Disconnects
-
Last night during a print, one of the toolboards disconnected multiple times due to multiple CAN Timeouts. I grabbed M122 output for the board that disconnected, as well as the main M122:
M122 B20 Diagnostics for board 20: Duet TOOL1LC rev 1.1 or later firmware version 3.4.6 (2023-07-21 14:17:33) Bootloader ID: SAMC21 bootloader version 2.8 (2023-07-25) All averaging filters OK Never used RAM 2876, free system stack 45 words Tasks: Move(notifyWait,0.0%,91) HEAT(notifyWait,0.1%,115) CanAsync(notifyWait,0.0%,65) CanRecv(notifyWait,0.0%,74) CanClock(notifyWait,0.0%,65) ACCEL(notifyWait,0.0%,61) TMC(delaying,3.0%,57) MAIN(running,92.0%,439) IDLE(ready,0.0%,26) AIN(delaying,4.9%,142), total 100.0% Last reset 01:15:18 ago, cause: software Last software reset at 2024-01-25 17:16, reason: AssertionFailed, available RAM 1984, slot 0 Software reset code 0x0120 ICSR 0x00000000 SP 0x20002f0c Task MAIN Freestk 814 ok Stack: 0000089f 00023b50 0001a475 a5a5a5a5 a5a5a5a5 a5a5a5a5 a5a5a5a5 20000e2c 00000001 20002074 a5a5a5a5 a5a5a5a5 00004ba5 00000001 00012829 a5a5a5a5 a5a5a5a5 20004370 a5a5a5a5 000005ab 00000000 2000147c 000007d0 00000000 a5a5a5a5 a5a5a5a5 a5a5a5a5 Driver 0: pos 3039, 80.0 steps/mm,standstill, SG min 0, read errors 35, write errors 1, ifcnt 77, reads 29676, writes 11, timeouts 228, DMA errors 0, CC errors 0, failedOp 0x71, steps req 3039 done 3039 Moves scheduled 103, completed 103, in progress 0, hiccups 0, step errors 0, maxPrep 470, maxOverdue 1874543609, maxInc 1874543609, mcErrs 0, gcmErrs 0 Peak sync jitter -237/5, peak Rx sync delay 214, resyncs 0/0, no step interrupt scheduled VIN voltage: min 27.5, current 27.7, max 27.7 MCU temperature: min 52.8C, current 52.8C, max 72.7C Last sensors broadcast 0x00000000 found 0 0 ticks ago, 0 ordering errs, loop time 0 CAN messages queued 36178, send timeouts 0, received 47038, lost 40453, free buffers 37, min 0, error reg ff0000 dup 0, oos 0/0/0/0, bm 0, wbm 0, rxMotionDelay 42285, adv -1869053398/1874618269 Accelerometer: LIS3DH, status: 00 I2C bus errors 0, naks 3, other errors 0
M122 === Diagnostics === RepRapFirmware for Duet 3 MB6XD version 3.4.6+ (2023-11-15 08:39:36) running on Duet 3 MB6XD v1.0 (SBC mode) Board ID: 08DLM-956DA-M24S4-7J1F2-3S46L-9VL6S Used output buffers: 8 of 40 (40 max) === RTOS === Static ram: 151692 Dynamic ram: 72024 of which 204 recycled Never used RAM 122888, free system stack 114 words Tasks: SBC(ready,157.7%,452) HEAT(notifyWait,23.9%,321) Move(notifyWait,324.0%,214) CanReceiv(notifyWait,89.6%,771) CanSender(notifyWait,4.9%,325) CanClock(delaying,8.7%,347) MAIN(running,397.3%,1083) IDLE(ready,8.9%,29), total 1014.9% Owned mutexes: HTTP(MAIN) === Platform === Last reset 549:08:56 ago, cause: power up Last software reset at 2023-12-07 16:15, reason: User, GCodes spinning, available RAM 124808, slot 2 Software reset code 0x0003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x00400000 BFAR 0x00000000 SP 0x00000000 Task SBC Freestk 0 n/a Error status: 0x04 Aux0 errors 0,14,0 Step timer max interval 119624 MCU temperature: min 18.0, current 33.6, max 36.7 Supply voltage: min 27.0, current 27.2, max 27.3, under voltage events: 0, over voltage events: 0, power good: yes 12V rail voltage: min 12.2, current 12.2, max 12.3, under voltage events: 0 Heap OK, handles allocated/used 99/42, heap memory allocated/used/recyclable 2048/1388/700, gc cycles 140 Events: 604 queued, 604 completed Driver 0: ok Driver 1: ok Driver 2: ok Driver 3: ok Driver 4: ok Driver 5: ok Date/time: 2024-01-26 09:08:03 Slowest loop: 1030.97ms; fastest: 0.03ms === Storage === Free file entries: 10 SD card 0 not detected, interface speed: 37.5MBytes/sec SD card longest read time 0.0ms, write time 0.0ms, max retries 0 === Move === DMs created 125, segments created 42, maxWait 501550757ms, bed compensation in use: none, comp offset 0.000 === MainDDARing === Scheduled moves 331162, completed 331162, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 28], CDDA state -1 === AuxDDARing === Scheduled moves 0, completed 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === Heat === Bed heaters 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamber heaters -1 -1 -1 -1, ordering errs 0 Heater 0 is on, I-accum = 0.3 === GCodes === Segments left: 0 Movement lock held by null HTTP* is doing "M122" in state(s) 0 Telnet is idle in state(s) 0 File* is idle in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger* is idle in state(s) 0 Queue* is idle in state(s) 0 LCD is idle in state(s) 0 SBC* is idle in state(s) 0 Daemon is idle in state(s) 0 Aux2 is idle in state(s) 0 Autopause* is idle in state(s) 0 Code queue is empty === Filament sensors === Extruder 0: no data received Extruder 1: no data received === CAN === Messages queued 21879272, received 104729317, lost 0, boc 0 Longest wait 4ms for reply type 6013, peak Tx sync delay 774, free buffers 50 (min 47), ts 9884684/9884683/0 Tx timeouts 0,0,0,0,0,0 === SBC interface === Transfer state: 5, failed transfers: 0, checksum errors: 0 RX/TX seq numbers: 54261/54261 SPI underruns 0, overruns 0 State: 5, disconnects: 0, timeouts: 0 total, 0 by SBC, IAP RAM available 0x2b584 Buffer RX/TX: 0/0-0, open files: 0 === Duet Control Server === Duet Control Server v3.4.6 Code buffer space: 4096 Configured SPI speed: 8000000Hz, TfrRdy pin glitches: 1 Full transfers per second: 39.44, max time between full transfers: 97.6ms, max pin wait times: 84.4ms/35.6ms Codes per second: 1.89 Maximum length of RX/TX data transfers: 4548/1696
Setup is two 6XDs and two 1LCs in SBC Mode. This machine is running 3.4.6+ on the primary 6XD. I'm not sure how valid the reset reason is on the 1LC, as it disconnected multiple times. Some of the request types in the disconnect messages are 6033, others are 6013. I can provide more info if required.
-
-
had some connection issues with the 1LC in the past.
adding a Strain relief to the Cables and securing the connector with a dab of hotglue fixed it for me.
the zh connector used for the CAN connection is quite unreliable -
@Superbrain8 There's strain relief, so I don't think that's the issue. I have a couple things to try out, and this was a one-off issue so far. I figured I'd report it in case there was something I wasn't aware of.
-
@curieos Different machine this time, one of the 1LC tool boards is routinely disconnecting after an hour of printing. I tried replacing the board but the disconnects keep occurring. The board appears to be fully halted, no LED activity (VIN and 5V LEDs are lit, but the status and activity lights are off). I just updated the bootloader to the latest, and it's running the same firmware as the main board (3.4.6). I can't provide the M122 output from the tool board currently, a single tool print is ongoing, but I can get it tomorrow. A full power cycle causes the board to reconnect on boot.
I watched the issue occur once while troubleshooting. The red status light started out blinking brightly, but then gradually faded in intensity. The frequency of blinking appeared to speed up during this process.
M122 === Diagnostics === RepRapFirmware for Duet 3 MB6XD version 3.4.6 (2023-07-21 14:11:58) running on Duet 3 MB6XD v1.01 or later (SBC mode) Board ID: 0JD2M-999AL-D25SW-6JKD0-3S06R-95ZB1 Used output buffers: 1 of 40 (33 max) === RTOS === Static ram: 151692 Dynamic ram: 71048 of which 0 recycled Never used RAM 124260, free system stack 130 words Tasks: SBC(ready,0.9%,452) HEAT(notifyWait,0.0%,346) Move(notifyWait,1.2%,214) CanReceiv(notifyWait,0.1%,771) CanSender(notifyWait,0.0%,327) CanClock(delaying,0.0%,347) MAIN(running,97.7%,1097) IDLE(ready,0.1%,29), total 100.0% Owned mutexes: HTTP(MAIN) === Platform === Last reset 00:54:28 ago, cause: software Last software reset at 2024-01-30 15:55, reason: User, Platform spinning, available RAM 125988, slot 0 Software reset code 0x0000 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x00400000 BFAR 0x00000000 SP 0x00000000 Task SBC Freestk 0 n/a Error status: 0x00 Aux0 errors 0,0,0 Step timer max interval 18687 MCU temperature: min 46.7, current 47.7, max 48.3 Supply voltage: min 26.5, current 26.8, max 26.9, under voltage events: 0, over voltage events: 0, power good: yes 12V rail voltage: min 12.1, current 12.1, max 12.2, under voltage events: 0 Heap OK, handles allocated/used 99/26, heap memory allocated/used/recyclable 2048/1454/916, gc cycles 1 Events: 0 queued, 0 completed Driver 0: ok Driver 1: ok Driver 2: ok Driver 3: ok Driver 4: ok Driver 5: ok Date/time: 2024-01-30 16:50:25 Slowest loop: 999.52ms; fastest: 0.03ms === Storage === Free file entries: 10 SD card 0 not detected, interface speed: 37.5MBytes/sec SD card longest read time 0.0ms, write time 0.0ms, max retries 0 === Move === DMs created 125, segments created 34, maxWait 68051ms, bed compensation in use: mesh, comp offset 0.000 === MainDDARing === Scheduled moves 40689, completed 40630, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state 3 === AuxDDARing === Scheduled moves 0, completed 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === Heat === Bed heaters 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamber heaters -1 -1 -1 -1, ordering errs 0 Heater 0 is on, I-accum = 0.2 Heater 1 is on, I-accum = 0.0 === GCodes === Segments left: 1 Movement lock held by null HTTP* is doing "M122" in state(s) 0 Telnet is idle in state(s) 0 File* is doing "G1 X354.312988 Y291.496002 F24000" in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger* is idle in state(s) 0 Queue* is idle in state(s) 0 LCD is idle in state(s) 0 SBC* is idle in state(s) 0 Daemon is idle in state(s) 0 Aux2 is idle in state(s) 0 Autopause is idle in state(s) 0 Code queue is empty === Filament sensors === Extruder 0: no data received Extruder 1: no data received === CAN === Messages queued 76309, received 155310, lost 0, boc 0 Longest wait 5ms for reply type 6024, peak Tx sync delay 480, free buffers 50 (min 47), ts 16345/16344/0 Tx timeouts 0,0,0,0,0,0 === SBC interface === Transfer state: 5, failed transfers: 0, checksum errors: 0 RX/TX seq numbers: 5661/5661 SPI underruns 0, overruns 0 State: 5, disconnects: 0, timeouts: 0 total, 0 by SBC, IAP RAM available 0x2b584 Buffer RX/TX: 2240/3384-0, open files: 0 === Duet Control Server === Duet Control Server v3.4.6 File /opt/dsf/sd/gcodes/HSP1/Removable Spool Holder Extrusion Bracket v4_0.6n_NYLON_3h38m.gcode is selected, processing File: Buffered code: G1 F2400 Buffered code: G1 X357.98 Y291.496 E.24909 Buffered code: G1 X358.604 Y292.12 E.05994 Buffered code: G1 X358.604 Y292.525 E.02751 Buffered code: G1 X358.604 Y293.504 E.0665 Buffered code: G1 X354.311 Y293.504 E.29162 Buffered code: G1 X354.313 Y291.586 E.13029 Buffered code: M204 P4000 Buffered code: G1 X353.728 Y290.911 F24000 Buffered code: M204 P1000 Buffered code: G1 F2400 Buffered code: G1 X358.223 Y290.911 E.30534 Buffered code: G1 X359.189 Y291.877 E.0928 Buffered code: G1 X359.189 Y292.525 E.04402 Buffered code: G1 X359.189 Y294.089 E.10624 Buffered code: G1 X353.725 Y294.089 E.37116 Buffered code: G1 X353.728 Y291.001 E.20976 Buffered code: M204 P4000 Buffered code: G1 X353.144 Y290.325 F24000 Buffered code: M204 P500 Buffered code: G1 F2400 Buffered code: G1 X358.465 Y290.325 E.36145 Buffered code: G1 X359.775 Y291.635 E.12585 Buffered code: G1 X359.775 Y292.525 E.06046 Buffered code: G1 X359.775 Y294.675 E.14605 Buffered code: G1 X353.138 Y294.675 E.45084 Buffered code: G1 X353.143 Y290.415 E.28938 Buffered code: M204 P4000 ==> 1216 bytes Pending code: ; stop printing object Removable Spool Holder Extrusion Bracket v4.stl id:0 copy 0 Pending code: ; printing object Removable Spool Holder Extrusion Bracket v4.stl id:0 copy 1 Code buffer space: 2304 Configured SPI speed: 8000000Hz, TfrRdy pin glitches: 0 Full transfers per second: 41.84, max time between full transfers: 147.7ms, max pin wait times: 67.9ms/59.0ms Codes per second: 15.36 Maximum length of RX/TX data transfers: 6312/1192
-
@curieos Look like a "Hard fault"
M122 B21 Diagnostics for board 21: Duet TOOL1LC rev 1.1 or later firmware version 3.4.6 (2023-07-21 14:17:33) Bootloader ID: SAMC21 bootloader version 2.8 (2023-07-25) All averaging filters OK Never used RAM 2016, free system stack 62 words Tasks: Move(notifyWait,0.0%,155) HEAT(notifyWait,0.2%,101) CanAsync(notifyWait,0.0%,57) CanRecv(notifyWait,0.0%,76) CanClock(notifyWait,0.0%,65) ACCEL(notifyWait,0.0%,61) TMC(delaying,3.0%,57) MAIN(running,91.9%,351) IDLE(ready,0.0%,26) AIN(delaying,4.9%,142), total 100.0% Last reset 00:17:42 ago, cause: power up Last software reset at 2024-01-30 16:35, reason: HardFault, available RAM 2016, slot 0 Software reset code 0x0060 ICSR 0x00000003 SP 0x20002ed8 Task MAIN Freestk 801 ok Stack: 00000156 00000001 68104a2f 00000016 00000000 0001caf7 0000ceca 21000000 00000000 00000000 00000000 00000000 43ca6148 00000000 200048c8 00001401 00000000 11a81500 0000002c 030014d4 00000000 0000f01f a5a5a5a5 00248b5a 00248b5a 000001d6 000001f4 Driver 0: pos 0, 404.8 steps/mm,standstill, SG min 0, read errors 0, write errors 0, ifcnt 12, reads 7026, writes 12, timeouts 0, DMA errors 0, CC errors 0, steps req 0 done 0 Moves scheduled 0, completed 0, in progress 0, hiccups 0, step errors 0, maxPrep 0, maxOverdue 0, maxInc 0, mcErrs 0, gcmErrs 0 Peak sync jitter 0/4, peak Rx sync delay 213, resyncs 0/0, no step interrupt scheduled VIN voltage: min 27.3, current 27.5, max 27.6 MCU temperature: min 51.2C, current 53.8C, max 53.9C Last sensors broadcast 0x00018004 found 3 187 ticks ago, 0 ordering errs, loop time 0 CAN messages queued 21514, send timeouts 0, received 17783, lost 0, free buffers 37, min 37, error reg 0 dup 0, oos 0/0/0/0, bm 0, wbm 0, rxMotionDelay 0 Accelerometer: LIS3DH, status: 00 I2C bus errors 0, naks 3, other errors 0 === Filament sensors === Interrupt 4 to 9us, poll 8 to 505us Driver 0: pos 2160.00, errs: frame 0 parity 0 ovrun 0 pol 0 ovdue 0
-
@curieos Anyone? Looking at other posts with this issue, this might be a firmware bug?
In case it's not, I tried some additional troubleshooting today. The only things that changed between the toolboard working fine and it having issues were a grounding wire run to the toolhead plate, and a new CAN cable with crimped terminations instead of the soldered pigtails. A genuine JST crimping tool for ZH terminals was used to create the terminations. I don't think it's the cable, as the last thing in the CAN loop is a second 6XD, not the 1LC. I believe, if the cable had issues, the second 6XD would have communication problems tool. Also, the toolboard was locked up, not having CAN connection issues.
Something I noticed during the lockup issues was the last reported MCU temperature, it was always 54.7C. Today I tried recreating this temperature with both the old CAN cable and new CAN cable to see if I could recreate the issue with either cable. I thought it was either a temp issue or a time issue. Neither cable had a lockup event occur when the temperature reached/exceeded 54.7C, so I don't think it's a temperature issue.
I can continue testing variables in case it's a hardware issue, but for my sanity's sake I'd like to know if I'm barking up the wrong tree.
-
-
@curieos looking at your M122 reports and other observations, I think the cause is that the tool board is losing power. Possible reasons for this include:
- A bad crimp connection in the JST VH connector that provides power to the tool board
- One of the two power wires has fractured internally, most likely just above the crimp connection to the JST VH connector
- Something that you have connected to the 5V rail of the tool board (most likely to the OUT0 connector) is drawing excessive current and causing the 5V regulator to go into thermal shutdown.
-
@dc42 The VIN and 5V LEDs never cut out or dip in intensity, do you still think it could be a power issue? I can check the connections, though I did already try disconnecting all connectors besides CAN and VIN. I wasn't super thorough about it admittedly, as I was limited on time.
-
@curieos said in 1LC Multiple Disconnects:
The red status light started out blinking brightly, but then gradually faded in intensity
Did the 5V power LED stay lit to the same intensity throughout?
-
@T3P3Tony From what I recall, yes.
-
@T3P3Tony @dc42 Checking now with a volt meter (the board is currently locked up), VIN is reading fine, 27V. 3.3V from the IO 1 connector reads 3.3V, and 5V from the IO 2 connector reads just shy of 5V (around 4.95), which I believe is also fine.
The only way I can get the board to display the symptoms is to start a print. So far any print I start causes the issue. Idling does not cause the issue. I don't have an exact amount of time it takes to lock up, the console logs don't report the error until functions assigned to that toolboard are used, but I'd estimate it takes roughly 15 minutes after a print is started. The board does not return to normal until the whole machine is restarted.
-
@curieos said in 1LC Multiple Disconnects:
Checking now with a volt meter (the board is currently locked up), VIN is reading fine, 27V. 3.3V from the IO 1 connector reads 3.3V, and 5V from the IO 2 connector reads just shy of 5V (around 4.95), which I believe is also fine.
Yes those are fine; but that doesn't prove that they are always fine. If you have a bad crimp or a fractured wire then the power is likely to disconnect occasionally depending on the movement of the power cable, which likely depends on the position of the print head. If the disconnection is short then the capacitors in the power circuit may supply sufficient power for long enough until the power connection is restored.
The fact that you saw the red LED gradually dim is a sure sign that the 5V supply was gradually lost. Once power is lost, when it is restored the configuration parameters will be lost, so you will need to reboot the machine or at least run config.g to restore normal operation.
Have you provided any strain relief on the power cable, after it has exited the JST VH connector?
Are you powering anything from OUT0 ?
-
@dc42 The red status light, not the red 5V light. The red 5V light stays brightly lit the whole time. The board is locked up, I can not communicate with it in any form. When I send M122 B21 it does not respond. This isn't a momentary issue.
Yes, there is plenty of strain relief. A zip tie secures the power and CAN cables after they exit the chain for both toolheads. There's also cable management to make sure the cover doesn't squish any wires.
OUT0 is hooked up to two 50 watt Slice heaters. During this last print that caused this issue, those heaters were never activated. That tool was unused and idle the whole time.
-
@dc42 Here's some pictures of the toolboard, right now. Both voltage lights are lit up. The CAN activity and status lights are off. It's difficult to illustrate this without taking a video.
I'm trying to get some thermal shots of it to see if there's any hotspots I can't detect, but our thermal camera is refusing to cooperate.edit: I got a thermal camera image. There's a bit of an offset due to how close I have the camera, but you can get an idea of where the hot spots are based on the ghost of the wires and mounting holes:
I believe the hotspot is U3, which is the 12V buck converter.
-
@curieos thanks for the photos. If the hotspot is that area indicated by the red circle and cross, that's diode D5 which feeds the output of the 12V regulator to the VOUT pin of the OUT1 and OUT2 connectors. It would be odd if that component was generating a lot of heat. What I suspect is that the surrounding area is at more or less the same temperature and the heat is coming from both the 12V buck regulator and the 5V linear regulator.
Can you confirm that you are not using the IO_0 connector to power anything? It's not connected in your photo.
It's nor normal for the Status lights to be off and the power lights to be on. That condition should only occur briefly after power up when the crystal oscillator is starting and the board is booting up. So I suspect that the processor has gone into some sort of locked up state that even the watchdog timer can't recover from, or the oscillator has stopped. The only reasons I can think of for it locking up in this way are static discharge, a a loss of power that wasn't quite enough to cause a full shutdown, or a faulty tool board. We know that static discharge is common on hot ends, but we haven't know of it affecting a TOOL1LC in this way. Nevertheless, please ensure that the hot end metalwork is grounded. You can ground it to one of the mounting screws of the tool board.
What was the reason for putting a heatsink on top of the MCU? How did you attach it? The MCU generates very little heat and certainly doesn't need a heatsink. The components that generate heat on the tool board are the 5V linear regulator, the TMC2209 driver (depending on motor current), and the 12V buck regulator (depending on VIN voltage and the current draw from OUT1 and OUT2).
I'd suggest that we replace your tool board, but you said you have already swapped it for a different one.
-
@dc42 Correct, IO_0 is not connected. You said OUT0 earlier, which was confusing.
The hotend metalwork and toolboard are connected via the toolplate. The hotend has continuity with the plate, and the toolboard has a connection to the toolplate via the green wire in the upper right corner.
I put the heatsink on the MCU because I noticed the previous toolboard's MCU was very hot to the touch before I swapped it out. I put it on that toolboard, and then transferred it to the new one when I swapped it out. Originally I thought it was just an overheating issue. The heatsink has preattached thermal adhesive. Not shown is a thin ceramic heatsink on the rear of the toolboard, mounted with some 3M thermal transfer adhesive. That covers the MCU and stepper driver.
-
@dc42 Any other suggestions? I just replaced the power cable for the toolboard with brand new wire and this issue is still occurring.
-
@dc42 I tried updating that machine to 3.5 rc3 and the issue persists. What physically can cause a 1LC to lock up until a full power cycle?
-
Are the mainboard and toolboard powered from a common PSU? If not, are the grounds of the PSU tied together?