Intermittent communication disruption between 6HC and 3HC
-
So I've been having issues with my 6HC + 3HC combo in standalone mode for a while.
It seems that communication between the 3HC and 6HC can drop intermittently and is pretty much never available on boot up.
If the printer sits offline for a couple hours, the 3HC never responds on first power on. Rebooting the device sometimes fixes the problem, most often though it does not and I have to repeat the reboot a couple time (either via emergency stop or power cycle) for the board to appear. Once it appears, it will usually be available for a good long time... unless I let it sit idle for upwards of 10 minutes. Then, the board will just disappear sometimes throwing the DWC into a reset loop (or that's what it seems like).
Right now, I turned the printer on after 3 hours being off.
3HC doesn't boot up. M122 B2 (that's how I have the board set to) says Response timeout: CAN addr 2, req type 6024, RID=##I hit emergency stop - waited a bit, same response on M122 B2.
I then power cycled the printer - initially - no response again from the 3HC, but after about 30 seconds - I got a response that I am pasting below. First lines seem weird.If I repeat the M122 B2, all next attempts will be successful and will report what is visible below in the second excerpt. And then... the 3HC will be available for a good time. I can power cycle, reboot as many times as I want - it will come up just fine each time - as long as it's roughly within 10 minutes of sitting idle. If it's longer, I have to power cycle to get access to the board back.
In terms of connections, I have tried exchanging the CAN cable (now I'm using a 4 wire one, but I also tried a regular 2 wire one - effect is the same).
Powerwise - the 3HC is connected to the 6HC and 6HC directly to the PSU. They are not fed from separate outlets on the PSU (Meanwell that only feeds the board and fans - hotbed is AC powered - Voron 2.4).
I tried looking at what happens on the diag LED and it seems that upon first boot - it takes the 3HC a fair bit of time to start flashing the LED - before this, it's off.
I can't say it got worse with 3.2. I know I had issues with that board before where at one point I had to take it out and use Bossa to reflash it - afterwards, it would work more or less fine, as long as I didn't wait too long to start the print.
I am out of ideas on what else I can check here. Any hints?
Diag logs below:
First successful boot:
m122 b2 Warning: Discarded msg src=2 typ=4510 RID=12 exp 13 Warning: Discarded msg src=2 typ=4510 RID=12 exp 13 Diagnostics for board 2: Duet EXP3HC firmware version 3.2 (2021-01-05) Bootloader ID: not available Never used RAM 154896, free system stack 200 words HEAT 102 CanAsync 94 CanRecv 87 TMC 64 MAIN 305 AIN 257 Last reset 00:02:13 ago, cause: software Last software reset time unknown, reason: AssertionFailed, available RAM 173348, slot 0 Software reset code 0x0120 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0445f85f BFAR 0xe000ed38 SP 0x2002ff8c Task MAIN Freestk 4294967295 ok Stack: 000002f2 0002a9b0 00020f41 00000000 0002256d 20002ec8 20003100 00000031 00000001 20002980 03800209 0001a72d 0001a709 00000000 00000000 ffffffff 0001de4b 200022d8 20002330 00029c48 ffffffed 00000000 00f00000 e000ef34 c0000000 200041dc 00020ea5 Driver 0: position 0, 80.0 steps/mm, standstill, reads 28268, writes 11 timeouts 0, SG min/max 0/0 Driver 1: position 0, 80.0 steps/mm, standstill, reads 28271, writes 11 timeouts 0, SG min/max 0/0 Driver 2: position 0, 80.0 steps/mm, standstill, reads 28274, writes 11 timeouts 0, SG min/max 0/0 Moves scheduled 0, completed 0, in progress 0, hiccups 0 No step interrupt scheduled VIN: 24.1V, V12: 12.1V MCU temperature: min 45.7C, current 45.9C, max 46.1C Ticks since heat task active 68, ADC conversions started 133310, completed 133308, timed out 0 Last sensors broadcast 0x00000000 found 0 71 ticks ago, loop time 0 CAN messages queued 27, send timeouts 0, received 1211, lost 0, free buffers 36
After quick reboot:
m122 b2 Diagnostics for board 2: Duet EXP3HC firmware version 3.2 (2021-01-05) Bootloader ID: not available Never used RAM 154584, free system stack 178 words HEAT 77 CanAsync 94 CanRecv 84 TMC 64 MAIN 279 AIN 257 Last reset 00:03:14 ago, cause: software Last software reset time unknown, reason: AssertionFailed, available RAM 173348, slot 0 Software reset code 0x0120 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0445f85f BFAR 0xe000ed38 SP 0x2002ff8c Task MAIN Freestk 4294967295 ok Stack: 000002f2 0002a9b0 00020f41 00000000 0002256d 20002ec8 20003100 00000031 00000001 20002980 03800209 0001a72d 0001a709 00000000 00000000 ffffffff 0001de4b 200022d8 20002330 00029c48 ffffffed 00000000 00f00000 e000ef34 c0000000 200041dc 00020ea5 Driver 0: position 0, 410.0 steps/mm, standstill, reads 45906, writes 16 timeouts 0, SG min/max 0/0 Driver 1: position 0, 80.0 steps/mm, standstill, reads 45913, writes 11 timeouts 0, SG min/max 0/0 Driver 2: position 0, 80.0 steps/mm, standstill, reads 45916, writes 11 timeouts 0, SG min/max 0/0 Moves scheduled 0, completed 0, in progress 0, hiccups 0 No step interrupt scheduled VIN: 24.1V, V12: 12.1V MCU temperature: min 45.7C, current 45.9C, max 45.9C Ticks since heat task active 21, ADC conversions started 194012, completed 194012, timed out 0 Last sensors broadcast 0x00000000 found 0 24 ticks ago, loop time 0 CAN messages queued 813, send timeouts 0, received 1761, lost 0, free buffers 36
After power cycle:
m122 b2 Diagnostics for board 2: Duet EXP3HC firmware version 3.2 (2021-01-05) Bootloader ID: not available Never used RAM 154584, free system stack 192 words HEAT 102 CanAsync 94 CanRecv 87 TMC 64 MAIN 317 AIN 257 Last reset 00:00:14 ago, cause: software Last software reset time unknown, reason: AssertionFailed, available RAM 173348, slot 0 Software reset code 0x0120 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0445f85f BFAR 0xe000ed38 SP 0x2002ff8c Task MAIN Freestk 4294967295 ok Stack: 000002f2 0002a9b0 00020f41 00000000 0002256d 20002ec8 20003100 00000031 00000001 20002980 03800209 0001a72d 0001a709 00000000 00000000 ffffffff 0001de4b 200022d8 20002330 00029c48 ffffffed 00000000 00f00000 e000ef34 c0000000 200041dc 00020ea5 Driver 0: position 0, 410.0 steps/mm, standstill, reads 54366, writes 11 timeouts 0, SG min/max 0/0 Driver 1: position 0, 80.0 steps/mm, standstill, reads 54369, writes 11 timeouts 0, SG min/max 0/0 Driver 2: position 0, 80.0 steps/mm, standstill, reads 54372, writes 11 timeouts 0, SG min/max 0/0 Moves scheduled 0, completed 0, in progress 0, hiccups 0 No step interrupt scheduled VIN: 24.1V, V12: 12.1V MCU temperature: min 45.7C, current 45.9C, max 45.9C Ticks since heat task active 88, ADC conversions started 14580, completed 14578, timed out 0 Last sensors broadcast 0x00000000 found 0 91 ticks ago, loop time 0 CAN messages queued 95, send timeouts 0, received 153, lost 0, free buffers 36
6HC diag log:
m122 b0 === Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.2 running on Duet 3 MB6HC v0.6 or 1.0 (standalone mode) Board ID: 08DJM-956L2-G43S4-6JKDA-3SJ6T-1B6GH Used output buffers: 1 of 40 (21 max) === RTOS === Static ram: 149788 Dynamic ram: 92984 of which 72 recycled Never used RAM 115988, free system stack 182 words Tasks: NETWORK(ready,159) ETHERNET(blocked,109) HEAT(blocked,296) CanReceiv(blocked,848) CanSender(blocked,371) CanClock(blocked,354) TMC(blocked,50) MAIN(running,1123) IDLE(ready,19) Owned mutexes: LwipCore(NETWORK) HTTP(MAIN) === Platform === Last reset 00:09:19 ago, cause: software Last software reset at 2021-01-22 18:24, reason: User, GCodes spinning, available RAM 116032, slot 1 Software reset code 0x0003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x00400000 BFAR 0x00000000 SP 0x00000000 Task MAIN Freestk 0 n/a Error status: 0x00 Aux0 errors 0,0,0 Aux1 errors 0,0,0 MCU temperature: min 36.9, current 42.2, max 42.4 Supply voltage: min 24.0, current 24.0, max 24.1, under voltage events: 0, over voltage events: 0, power good: yes 12V rail voltage: min 12.1, current 12.1, max 12.2, under voltage events: 0 Driver 0: position 0, standstill, reads 50958, writes 14 timeouts 0, SG min/max 0/0 Driver 1: position 0, standstill, reads 50958, writes 14 timeouts 0, SG min/max 0/0 Driver 2: position 0, standstill, reads 50958, writes 14 timeouts 0, SG min/max 0/0 Driver 3: position 0, standstill, reads 50958, writes 14 timeouts 0, SG min/max 0/0 Driver 4: position 0, standstill, reads 50958, writes 14 timeouts 0, SG min/max 0/0 Driver 5: position 0, standstill, reads 50959, writes 14 timeouts 0, SG min/max 0/0 Date/time: 2021-01-22 18:33:37 Slowest loop: 1049.71ms; fastest: 0.07ms === Storage === Free file entries: 10 SD card 0 detected, interface speed: 25.0MBytes/sec SD card longest read time 3.3ms, write time 0.0ms, max retries 0 === Move === DMs created 125, maxWait 0ms, bed compensation in use: none, comp offset 0.000 === MainDDARing === Scheduled moves 0, completed moves 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === AuxDDARing === Scheduled moves 0, completed moves 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === Heat === Bed heaters = 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1 === GCodes === Segments left: 0 Movement lock held by null HTTP is ready with "m122 b0" in state(s) 0 Telnet is idle in state(s) 0 File is idle in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger is idle in state(s) 0 Queue is idle in state(s) 0 LCD is idle in state(s) 0 SBC is idle in state(s) 0 Daemon is idle in state(s) 0 Aux2 is idle in state(s) 0 Autopause is idle in state(s) 0 Code queue is empty. === Network === Slowest loop: 2.66ms; fastest: 0.02ms Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0), 0 sessions Telnet(0), 0 sessions HTTP sessions: 2 of 8 - Ethernet - State: active Error counts: 0 0 1 0 0 Socket states: 2 2 2 2 2 0 0 0 === CAN === Messages queued 2245, send timeouts 209, received 60, lost 0, longest wait 19ms for reply type 6024, free buffers 48
-
-
Try adding a G4 delay command in config.g before the first command that refers to a device on an expansion board. RRF 3.2 running on Duet 3 MB6HC is so fast that it doesn't always allow enough time for expansion boards to start up.
-
If you find that expansion boards lose CAN sync later on, we are aware of certain conditions of high CAN traffic that can provoke this. We have fixed this in the unofficial 3.3beta.
-
-
Understood.
I'll give the G4 a try then. I guess I should just observe roughly how long the expansion board takes to boot and put that in there, right?
As for high traffic - I am running a number of items off the exp board (extruder, motor, 3 fans, filament sensor) and I am printing quite fast.
I'll check how much the G4 does and then report here. Thanks
-
@pkos For info, I've never had that particular problem (I have plenty of other problems with expansion boards but not that particular one). But for a long while now, I've had G4 S2 in my config.g after the network section and before anything else. Like this
; ------------------- General preferences ------------------------ M111 S0 ; Debugging off G21 ; Work in millimetres G90 ; Send absolute coordinates... M83 ; ...but relative extruder moves ; ------------------- Network ----------------------------------- M550 P"CoreXYUVAB" ; Set machine name M552 S1 ;Turn networking on M552 P***.***.*.*** ;IP Address - **** RPi IP is .140***** M555 P2 ; Set firmware compatibility to look like Marlin ;----------------------Dwell------------------- G4 S2 ....................Drives and axes.................... M584 etc.....
-
Small update. I tried playing with the numbers a bit. G4 S25 is needed to make it work.
Is that number supposed to be this high?
I feel like this long of a delay appeared in 3.2. I'll see about downgrading to 3.1.1 and checking (especially since I am having issues with PanelDue updating temperatures - as in, if I set it to heat up and wait - it won't update - but that I'll take a look at later. ).
-
@pkos said in Intermittent communication disruption between 6HC and 3HC:
Small update. I tried playing with the numbers a bit. G4 S25 is needed to make it work.
Is that number supposed to be this high?
I feel like this long of a delay appeared in 3.2. I'll see about downgrading to 3.1.1 and checking (especially since I am having issues with PanelDue updating temperatures - as in, if I set it to heat up and wait - it won't update - but that I'll take a look at later. ).
Nope. If you need to delay things by 25 seconds, something is very wrong indeed.
-
I switched back to 3.1.1 for the 6HC and 3.1.0 for the 3HC (since that's what was included in the single package).
The 3HC boots up instantly together with the 6HC every time - and I no longer need the G4.
It also solved a problem for me with DHT22 readings, which based on reports doesn't work on the 6HC, but does on Duet 2s.
There is a problem with the 3.1.0 though, where the 3HC will stop responding if it heats up, so I am stuck
-
@pkos said in Intermittent communication disruption between 6HC and 3HC:
Small update. I tried playing with the numbers a bit. G4 S25 is needed to make it work.
No. G4 S1 should be more than enough.
When you first apply power, does the 3HC acquire CAN sync (as shown by the LEDs blinking in sync) within a second or two?
-
@dc42 On 3.1.1 - yes.
On 3.2 - no - it takes about 25-30 seconds for the CAN to wake up and start flashing the LED on the 3HC. On 6HC the CAN led flashes almost immediately. -
Which bootloader version do you have installed in the 3HC? M122 for the 3HC will report it. [Do not try to install a new bootloader.]
There is a problem with the 3.1.0 though, where the 3HC will stop responding if it heats up, so I am stuck
When it stops responding, does the LED on the 3HC continue to flash?
-
@dc42 said in Intermittent communication disruption between 6HC and 3HC:
Which bootloader version do you have installed in the 3HC? M122 for the 3HC will report it. [Do not try to install a new bootloader.]
On 3.2:
Bootloader ID - not available.3.1.0 doesn't give that info. Just in case, below is the log I took just now on 3.1.0.
There is a problem with the 3.1.0 though, where the 3HC will stop responding if it heats up, so I am stuck
When it stops responding, does the LED on the 3HC continue to flash?
I'll have to double check that, but just in case this helps - here is a thread, where we spoke about this problem and where I reported a link to firmware that solved my problem back then.
https://forum.duet3d.com/post/167660One more question, since we are talking
In the logs in the first post here, I see the 3HC reporting an AssertionFailed as reason for last software reset. Might that also point to a problem?
-
I think the slow statup is another instance of this issue https://forum.duet3d.com/topic/21224/can-connectivity-duet-3-mb6hc-to-exp3hc. Please try the beta firmware at https://www.dropbox.com/sh/wme9k0z86sytg33/AAAT6wrHp2eeJHK-dYoW1Um4a?dl=0.
The assertion failure appears to have been caused by a stack overflow. I will examine the stack trace.
-
@dc42 Thank you. I'll try it out and will report here.
-
Here are the results of what I found after trying the 3.3 beta.
Quick summary first, then a bit more data (and diag logs at the bottom).
-
IMPROVEMENT: 3.3 beta solves the 3HC not initializing the CAN sync process on power up (I assume that's why the LED would only start flashing after about 25-30 seconds on 3.2).
-
REGRESSION: 3.3 beta does NOT solve sync issues on power up between 3HC and 6HC until a power cycle is performed (in 3.2, emergency stop would be enough to get sync).
-
NO CHANGE FROM 3.2: DHT22 still does not work, but this time I observed a something weird. In general - on 3.3 and 3.2 DWC still shows 2000C and 2000% (is fine on 3.1.1).
-
NO CHANGE FROM 3.2: Bootloader ID still shows not available
-
NO CHANGE FROM 3.2: Last software reset is still set to AssertionFault (although I did not expect any changes here).
Here's more info:
I started with uploading the firmware to 3HC.
The CAN LED now starts flashing immediately on power up and flashes quite fast (faster than on 3.1.0).
I then updated the 6HC to 3.3.
The 3HC does NOT get connected to the 6HC on first power on. 6HC flashes slowly, 3HC flashes quickly. They didn't sync after 30 minutes of waiting. I then hit the Emergency Stop button - in 3.2 - this would solve the lack of communication, but on 3.3 it did not. The boards don't sync up and each flashes differently.
However, after a quick power cycle - the boards immediately sync up on boot.
I am attaching logs that show this below.After that, everything works except for the DHT22 connected to the main board. That still shows 2000C and 2000% (nothing changed with connections - and these work perfectly fine on 3.1.1).
The very weird thing I observed with the DHT22 on 3.3beta was that I left the printer on for about 10 minutes without touching it. When I cam back to it, I noticed that for the shortest moment, DHT22 readings were available, but the moment i started switching between tabs - readings went back to 2000C/%.
I rolled back to 3.1.1 and immediately the readings came back.
I'm attaching two screenshots of what this looked like (first shows the jump to 2k, then down to normal values, then back up to 2k, where it stayed until I rolled back to 3.1.1 visible on the second screen, where data is updated normally).
Diag logs from cold power up, lack of sync on emergency stop, but immediate sync on power cycle.
1/27/2021, 7:27:16 PM m122 b2 Diagnostics for board 2: Duet EXP3HC firmware version 3.3beta (2021-01-26 20:03:04) Bootloader ID: not available Never used RAM 155180, free system stack 0 words Move 160 HEAT 103 CanAsync 72 CanRecv 84 CanClock 74 TMC 64 MAIN 263 AIN 260 Last reset 00:00:12 ago, cause: software Last software reset time unknown, reason: AssertionFailed, available RAM 173348, slot 0 Software reset code 0x0120 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0445f85f BFAR 0xe000ed38 SP 0x2002ff8c Task MAIN Freestk 4294967295 ok Stack: 000002f2 0002a9b0 00020f41 00000000 0002256d 20002ec8 20003100 00000031 00000001 20002980 03800209 0001a72d 0001a709 00000000 00000000 ffffffff 0001de4b 200022d8 20002330 00029c48 ffffffed 00000000 00f00000 e000ef34 c0000000 200041dc 00020ea5 Driver 0: position 0, 410.0 steps/mm, standstill, reads 28616, writes 16 timeouts 0, SG min/max 0/0, steps req 0 done 0 Driver 1: position 0, 80.0 steps/mm, standstill, reads 28623, writes 11 timeouts 0, SG min/max 0/0, steps req 0 done 0 Driver 2: position 0, 80.0 steps/mm, standstill, reads 28627, writes 11 timeouts 0, SG min/max 0/0, steps req 0 done 0 Moves scheduled 0, completed 0, in progress 0, hiccups 0, step errors 0, maxPrep 0, maxOverdue 0, maxInc 0, mcErrs 0, gcmErrs 0 Peak sync jitter 9, peak Rx sync delay 178, resyncs 0, no step interrupt scheduled VIN: 24.1V, V12: 12.1V MCU temperature: min 45.9C, current 45.9C, max 45.9C Ticks since heat task active 45, ADC conversions started 12536, completed 12536, timed out 0 Last sensors broadcast 0x00000000 found 0 49 ticks ago, loop time 0 CAN messages queued 90, send timeouts 0, received 133, lost 0, free buffers 36, min 36, error reg 100026 dup 0, oos 0, bm 0, wbm 0 1/27/2021, 7:26:58 PM m122 b2 Error: M122: Response timeout: CAN addr 2, req type 6024, RID=11 1/27/2021, 7:26:53 PM Connection established 1/27/2021, 7:26:45 PM Connection interrupted, attempting to reconnect... 1/27/2021, 7:26:36 PM Emergency stop, attemping to reconnect... 1/27/2021, 7:24:16 PM m122 b2 Error: M122: Response timeout: CAN addr 2, req type 6024, RID=11 1/27/2021, 7:23:54 PM Connection established 1/27/2021, 7:23:46 PM Connection interrupted, attempting to reconnect... 1/27/2021, 7:23:37 PM Emergency stop, attemping to reconnect... 1/27/2021, 7:23:35 PM m122 b2 Error: M122: Response timeout: CAN addr 2, req type 6024, RID=12 1/27/2021, 7:23:26 PM m122 b2 Error: M122: Response timeout: CAN addr 2, req type 6024, RID=11 1/27/2021, 7:22:06 PM Connection established
Diag log from the 6HC:
m122 b0 === Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.3beta running on Duet 3 MB6HC v0.6 or 1.0 (standalone mode) Board ID: 08DJM-956L2-G43S4-6JKDA-3SJ6T-1B6GH Used output buffers: 3 of 40 (21 max) === RTOS === Static ram: 149784 Dynamic ram: 91668 of which 40 recycled Never used RAM 109180, free system stack 182 words Tasks: NETWORK(ready,270) ETHERNET(blocked,117) SENSORS(blocked,53) HEAT(blocked,299) CanReceiv(blocked,893) CanSender(blocked,365) CanClock(blocked,328) TMC(blocked,49) MAIN(running,922) IDLE(ready,20) Owned mutexes: HTTP(MAIN) === Platform === Last reset 00:01:45 ago, cause: power up Last software reset at 2021-01-27 19:26, reason: User, GCodes spinning, available RAM 109424, slot 1 Software reset code 0x0003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0044a000 BFAR 0x00000000 SP 0x00000000 Task MAIN Freestk 0 n/a Error status: 0x00 Aux0 errors 0,0,0 Aux1 errors 0,0,0 MCU temperature: min 36.4, current 41.0, max 41.1 Supply voltage: min 24.0, current 24.0, max 24.1, under voltage events: 0, over voltage events: 0, power good: yes 12V rail voltage: min 12.1, current 12.1, max 12.2, under voltage events: 0 Driver 0: position 0, standstill, reads 63168, writes 14 timeouts 0, SG min/max 0/0 Driver 1: position 0, standstill, reads 63168, writes 14 timeouts 0, SG min/max 0/0 Driver 2: position 0, standstill, reads 63168, writes 14 timeouts 0, SG min/max 0/0 Driver 3: position 0, standstill, reads 63169, writes 14 timeouts 0, SG min/max 0/0 Driver 4: position 0, standstill, reads 63169, writes 14 timeouts 0, SG min/max 0/0 Driver 5: position 0, standstill, reads 63169, writes 14 timeouts 0, SG min/max 0/0 Date/time: 2021-01-27 19:28:48 Slowest loop: 41.50ms; fastest: 0.07ms === Storage === Free file entries: 10 SD card 0 detected, interface speed: 25.0MBytes/sec SD card longest read time 3.3ms, write time 0.0ms, max retries 0 === Move === DMs created 125, maxWait 0ms, bed compensation in use: none, comp offset 0.000 === MainDDARing === Scheduled moves 0, completed moves 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === AuxDDARing === Scheduled moves 0, completed moves 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === Heat === Bed heaters = 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1 === GCodes === Segments left: 0 Movement lock held by null HTTP is ready with "m122 b0" in state(s) 0 Telnet is idle in state(s) 0 File is idle in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger is idle in state(s) 0 Queue is idle in state(s) 0 LCD is idle in state(s) 0 SBC is idle in state(s) 0 Daemon is idle in state(s) 0 Aux2 is idle in state(s) 0 Autopause is idle in state(s) 0 Code queue is empty. === CAN === Messages queued 441, send timeouts 0, received 465, lost 0, longest wait 20ms for reply type 6024, peak Tx sync delay 6, free buffers 48 (min 47) === Network === Slowest loop: 2.70ms; fastest: 0.02ms Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0), 0 sessions Telnet(0), 0 sessions HTTP sessions: 2 of 8 - Ethernet - State: active Error counts: 0 0 1 0 0 Socket states: 5 2 2 2 2 0 0 0
-
-
Thanks for the update. Your M122 trace for the EXP3HC indicated a stack overflow, so I've increased the stack size and put a new 3.3beta Duet3Firmware-EXP3HC.bin at https://www.dropbox.com/sh/wme9k0z86sytg33/AAAT6wrHp2eeJHK-dYoW1Um4a?dl=0. Please try it.
-
Will do and report later Thanks for the help
-
Small update for now, I'll write more details later. I need to get a couple prints out fast, so for now I'll switch to 3.1.1.
On the newest beta - connection is quick, no reboots necessary, seems like the initial bit is solved.
However, with that beta - I am unable to complete any print as at some point, the 3HC (which for me runs the extruder, hotend, hotend cooling and filament monitor) probably stops communicating with the 6HC and I get skipped steps, underextrusion and in general - failed prints.
Switching back to 3.1.1 makes it all run.
As soon as I get the next things printed, I'll start investigating more, including diag logs from immediately after a print fails.
-
@pkos said in Intermittent communication disruption between 6HC and 3HC:
Small update for now, I'll write more details later. I need to get a couple prints out fast, so for now I'll switch to 3.1.1.
On the newest beta - connection is quick, no reboots necessary, seems like the initial bit is solved.
However, with that beta - I am unable to complete any print as at some point, the 3HC (which for me runs the extruder, hotend, hotend cooling and filament monitor) probably stops communicating with the 6HC and I get skipped steps, underextrusion and in general - failed prints.
Switching back to 3.1.1 makes it all run.
As soon as I get the next things printed, I'll start investigating more, including diag logs from immediately after a print fails.
Thanks for the update. I am sorry we haven't managed to resolve this issue completely yet.
When you get a chance, I would appreciate it if you can do the following:
- Install the latest beta firmware from https://www.dropbox.com/sh/qr98k8fbkj5ue0k/AABPawUF99QVzDrheBQBDSxia?dl=0
- Run a print up to the point at which it starts failing (assuming it still fails)
- Pause the print
- Run M122 and M122 B# (where # is the expansion board address) and post the results
- Resume the print, and see whether doing the pause has fixed the issue, at least temporarily.
Thanks for your patience.
-
I'm almost done printing the last order, so I'll have more time to experiment tomorrow. I'll report afterwards.
Thanks for being awesome and helping out!
-
Quick update.
I'm 5 hours into the print, so far the print has not failed, no skipped steps or underextrusion, etc. The 3HC starts flashing quickly immediately, catches sync within 2-3 seconds.
There is one more test I need to do - if all goes well, I'll have it done tomorrow morning - that's to check how the 3HC behaves after a long power off (at least one hour). For now I've had the printer running pretty much 24/7 since Saturday.
I ran a quick M122 just in case you want to see what's going on right now (pasting below). The assertion failure is still there and DHT22 does not work.
Diagnostics for board 2: Duet EXP3HC firmware version 3.3beta (2021-02-01 22:29:11) Bootloader ID: not available Never used RAM 154972, free system stack 0 words Move 80 HEAT 78 CanAsync 72 CanRecv 82 CanClock 74 TMC 30 MAIN 263 AIN 260 Last reset 12:15:33 ago, cause: software Last software reset time unknown, reason: AssertionFailed, available RAM 173348, slot 0 Software reset code 0x0120 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0445f85f BFAR 0xe000ed38 SP 0x2002ff8c Task MAIN Freestk 4294967295 ok Stack: 000002f2 0002a9b0 00020f41 00000000 0002256d 20002ec8 20003100 00000031 00000001 20002980 03800209 0001a72d 0001a709 00000000 00000000 ffffffff 0001de4b 200022d8 20002330 00029c48 ffffffed 00000000 00f00000 e000ef34 c0000000 200041dc 00020ea5 Driver 0: position 19481636, 410.0 steps/mm, ok, reads 39334, writes 0 timeouts 0, SG min/max 0/83, steps req 354599 done 354486 Driver 1: position 0, 80.0 steps/mm, standstill, reads 39334, writes 0 timeouts 0, SG min/max not available, steps req 0 done 0 Driver 2: position 0, 80.0 steps/mm, standstill, reads 39334, writes 0 timeouts 0, SG min/max not available, steps req 0 done 0 Moves scheduled 966506, completed 966505, in progress 1, hiccups 0, step errors 0, maxPrep 59, maxOverdue 3, maxInc 3, mcErrs 0, gcmErrs 0 Peak sync jitter 10, peak Rx sync delay 177, resyncs 0, next step interrupt due in 1138 ticks, enabled VIN: 24.1V, V12: 12.1V MCU temperature: min 45.7C, current 45.9C, max 46.1C Ticks since heat task active 16, ADC conversions started 44133758, completed 44133757, timed out 0 Last sensors broadcast 0x00000000 found 0 22 ticks ago, loop time 0 CAN messages queued 1648, send timeouts 0, received 11585, lost 0, free buffers 36, min 36, error reg 0 dup 0, oos 0, bm 0, wbm 0
=== Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.3beta running on Duet 3 MB6HC v0.6 or 1.0 (standalone mode) Board ID: 08DJM-956L2-G43S4-6JKDA-3SJ6T-1B6GH Used output buffers: 1 of 40 (40 max) === RTOS === Static ram: 149800 Dynamic ram: 92100 of which 72 recycled Never used RAM 108700, free system stack 118 words Tasks: NETWORK(ready,228) ETHERNET(blocked,117) SENSORS(blocked,15) HEAT(blocked,280) CanReceiv(blocked,877) CanSender(blocked,337) CanClock(blocked,326) TMC(blocked,16) MAIN(running,616) IDLE(ready,20) Owned mutexes: HTTP(MAIN) === Platform === Last reset 12:15:29 ago, cause: software Last software reset at 2021-02-02 09:54, reason: User, GCodes spinning, available RAM 108908, slot 0 Software reset code 0x0003 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x00400000 BFAR 0x00000000 SP 0x00000000 Task MAIN Freestk 0 n/a Error status: 0x04 Aux0 errors 331,331,331 Aux1 errors 0,0,0 MCU temperature: min 43.7, current 45.2, max 52.1 Supply voltage: min 23.9, current 24.0, max 24.1, under voltage events: 0, over voltage events: 0, power good: yes 12V rail voltage: min 12.0, current 12.1, max 12.2, under voltage events: 0 Driver 0: position 27530, ok, reads 12882, writes 38 timeouts 0, SG min/max 0/1023 Driver 1: position 2814, ok, reads 12882, writes 38 timeouts 0, SG min/max 0/1023 Driver 2: position 2725, ok, reads 12882, writes 38 timeouts 0, SG min/max 0/1023 Driver 3: position 0, ok, reads 12882, writes 38 timeouts 0, SG min/max 0/1023 Driver 4: position 0, ok, reads 12883, writes 38 timeouts 0, SG min/max 0/1023 Driver 5: position 0, ok, reads 12883, writes 38 timeouts 0, SG min/max 0/1023 Date/time: 2021-02-02 22:09:51 Slowest loop: 211.65ms; fastest: 0.04ms === Storage === Free file entries: 9 SD card 0 detected, interface speed: 25.0MBytes/sec SD card longest read time 4.0ms, write time 152.3ms, max retries 0 === Move === DMs created 125, maxWait 514613ms, bed compensation in use: mesh, comp offset 0.000 === MainDDARing === Scheduled moves 458647, completed moves 458612, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 15], CDDA state 3 === AuxDDARing === Scheduled moves 0, completed moves 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === Heat === Bed heaters = 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1 Heater 0 is on, I-accum = 0.0 Heater 1 is on, I-accum = 0.4 === GCodes === Segments left: 1 Movement lock held by null HTTP is ready with "m122 b0" in state(s) 0 Telnet is idle in state(s) 0 File is doing "G1 X190.481 Y213.861 E0.00113" in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger is idle in state(s) 0 Queue is idle in state(s) 0 LCD is idle in state(s) 0 SBC is idle in state(s) 0 Daemon is idle in state(s) 0 Aux2 is idle in state(s) 0 Autopause is idle in state(s) 0 Code queue is empty. === CAN === Messages queued 1143678, send timeouts 0, received 177174, lost 0, longest wait 21ms for reply type 6024, peak Tx sync delay 468, free buffers 48 (min 33) === Network === Slowest loop: 574.55ms; fastest: 0.02ms Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0), 0 sessions Telnet(0), 0 sessions HTTP sessions: 1 of 8 - Ethernet - State: active Error counts: 0 0 1 0 0 Socket states: 5 2 2 2 2 0 0 0