3.3b1CAN timeout resets itself?
-
@dc42 said in 3.3b1CAN timeout resets itself?:
@TypQxQ, please provide your sys/custom/tool_0.g and sys/custom/conf_Endstops.g files.
Here is everything in my config and customconfig folder.
https://we.tl/t-xwMjBHKrGdThank you!
-
Thanks, I am running that job now. How long did you leave it running last time, before you ran M122 B1 and got this message?
CAN messages queued 436976, send timeouts 0, received 689661, lost 0, free buffers 36, min 36, error reg 0 dup 0, oos 16, bm 8, wbm 9772795
-
Progress update. Running your job and your configuration, I am seeing a lot of send timeouts (in the M122 report) on the MB6HC. The number of out-of-sequence messages received on the EXP3HC ('oos' in the M122 B1 report) is much lower, suggesting that most of the send timeouts are spurious; but the fact that they occur at all indicates a problem. I will keep you informed.
-
It appears that with a code change I have resolved the send timeouts and associated 'oos' reports. Your print has been running for 76 minutes now with no error reports.
I still haven't seen any of the 'bm' reports on my system. They are likely to have a different cause.
I forgot to ask one thing. As I understand it, your system has just a MB6HC and one EXP3HC connected via CAN, and no other CAN-connected boards. Have you fitted the CAN termination jumpers to the EXP3HC board?
PS - if you want to try the 6HC and 3HC firmware binaries that I am running, you can find them here https://www.dropbox.com/sh/q5uqqkjbmhgvlhq/AACYqG0ynLME9ogoLd1zLB2Xa?dl=0.
-
@dc42 said in 3.3b1CAN timeout resets itself?:
It appears that with a code change I have resolved the send timeouts and associated 'oos' reports. Your print has been running for 76 minutes now with no error reports.
I still haven't seen any of the 'bm' reports on my system. They are likely to have a different cause.
I forgot to ask one thing. As I understand it, your system has just a MB6HC and one EXP3HC connected via CAN, and no other CAN-connected boards. Have you fitted the CAN termination jumpers to the EXP3HC board?
This is great news!
I can confirm my CAN termination jumpers are closed on 3HC.
I plan is to buy another 3HC if I get multiple motors on XY to work@dc42 said in 3.3b1CAN timeout resets itself?:
Thanks, I am running that job now. How long did you leave it running last time, before you ran M122 B1 and got this message?
Not sure what if you mean time since reboot or since stopped printing.
According to the eventlog above, "Last reset 10:14:33 ago" and was idle for almost 8h.When it was printing and had an uptime of 00:07:53 it read:
CAN messages queued 3305, send timeouts 222, received 2369, lost 0, free buffers 36, min 36, error reg 100000 dup 0, oos 0, bm 4, wbm 5559827
-
What I meant was, how long did you have to print for to get a nonzero 'bm' value when you ran M122 B1? Because I haven't seen any yet. Looks like 7 minutes may have been enough for you.
-
@dc42 said in 3.3b1CAN timeout resets itself?:
What I meant was, how long did you have to print for to get a nonzero 'bm' value when you ran M122 B1? Because I haven't seen any yet. Looks like 7 minutes may have been enough for you.
Trying the new firmware right now.
I don't know what BM and WBM stands for but I got them again after 00:13:24 :CAN messages queued 6740, send timeouts 0, received 6156, lost 0, free buffers 36, min 36, error reg 0 dup 0, oos 0, bm 12, wbm 15972458
Also before printing I had some error reg. at uptime of 00:03:24:
CAN messages queued 2083, send timeouts 0, received 1735, lost 0, free buffers 36, min 36, error reg 110000 dup 0, oos 0, bm 0, wbm 0
Now printing without any timeouts so far but only 20min inside print.
-
'bm' is the bad move command count, which I put in temporarily to debug a previous issue. That's the count of move commands that were scheduled to start before the previous move had ended. 'wbm' is a record of the worst bad instance; the figure is how many 750kHz clock ticks the new move was supposed to start before the previous move had finished. So your 'wbm' figure of 5559827 indicates that a move was scheduled to start 7.4 seconds before the previous move ended.
I don't see a nonzero bad move command count on my machine. However, it's occurred to me that you might legitimately have this situation after homing. I don't do homing of my simulation of your machine, I just use G92 to set all the axis positions. But if you are homing an axis connected to the EXP3HC, then that homing move might be scheduled to take e.g. 10 seconds but stop after 3 seconds when it triggers the homing switch. Then the next move will start before the homing move was originally scheduled to end, triggering the bad move command logging.
I will fix the bad move command counter to take account of moves that complete before they were originally scheduled to complete.
PS:
- I've just replaced the EXP3HC binary at https://www.dropbox.com/sh/q5uqqkjbmhgvlhq/AACYqG0ynLME9ogoLd1zLB2Xa?dl=0 to include that change to the bad move count logging.
- My print has been going for almost 5 hours now. During that time it has recorded just one send timeout on the main board while sending a movement message. I think it must be spurious, because there wasn't a corresponding 'oos' (out of sequence) error on the EXP3HC, which I would expect if the 3HC hadn't received the move.
-
@dc42 I have finished printing and after 9h16m the print looks good and I also only received one timeout:
Messages queued 1750358, send timeouts 1, received 362388, lost 0, longest wait 24ms for reply type 6024, peak Tx sync delay 578, free buffers 49 (min 47) CAN messages queued 361984, send timeouts 0, received 1747438, lost 0, free buffers 36, min 36, error reg 0 dup 0, oos 0, bm 0, wbm 0
I loaded the new 3HC firmware with time 22:51 on Dropbox and will run it again. But the time compile time in M122 hasn't changed after updating this morning: "Duet EXP3HC firmware version 3.3beta1+1 (2021-03-02 15:56:53)"
Thank You!!! It this looks sooo good!
2021-03-03 05:32:39 Finished printing file 0:/gcodes/RearLeft v6.2.gcode, print time was 9h 16m 2021-03-03 06:30:00 === Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.3beta1+1 running on Duet 3 MB6HC v0.6 or 1.0 (SBC mode) Board ID: 08DJM-956L2-G43S4-6JKDD-3SJ6L-TB62G Used output buffers: 1 of 40 (17 max) === RTOS === Static ram: 148212 Dynamic ram: 62952 of which 0 recycled Never used RAM 143028, free system stack 98 words Tasks: Linux(ready,63) HEAT(delaying,284) CanReceiv(notifyWait,774) CanSender(notifyWait,359) CanClock(delaying,342) TMC(notifyWait,18) MAIN(running,649) IDLE(ready,20) Owned mutexes: HTTP(MAIN) === Platform === Last reset 10:23:58 ago, cause: power up Last software reset at 2021-03-02 19:50, reason: User, none spinning, available RAM 139760, slot 2 Software reset code 0x0012 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x00400000 BFAR 0x00000000 SP 0x00000000 Task Linu Freestk 0 n/a Error status: 0x00 Aux0 errors 0,0,0 Aux1 errors 0,0,0 MCU temperature: min 43.3, current 43.5, max 44.8 Supply voltage: min 31.8, current 31.9, max 32.0, under voltage events: 0, over voltage events: 0, power good: yes 12V rail voltage: min 12.0, current 12.1, max 12.1, under voltage events: 0 Driver 0: position 160000, standstill, reads 5226, writes 2 timeouts 0, SG min/max 0/1023 Driver 1: position 0, standstill, reads 5227, writes 2 timeouts 0, SG min/max 0/1023 Driver 2: position 311549, standstill, reads 5227, writes 2 timeouts 0, SG min/max 0/158 Driver 3: position 0, standstill, reads 5227, writes 2 timeouts 0, SG min/max 0/1023 Driver 4: position 0, standstill, reads 5227, writes 2 timeouts 0, SG min/max 0/1023 Driver 5: position 0, standstill, reads 5226, writes 2 timeouts 0, SG min/max 0/1023 Date/time: 2021-03-03 06:29:58 Slowest loop: 90.33ms; fastest: 0.05ms === Storage === Free file entries: 10 SD card 0 not detected, interface speed: 37.5MBytes/sec SD card longest read time 0.0ms, write time 0.0ms, max retries 0 === Move === DMs created 125, maxWait 286ms, bed compensation in use: mesh, comp offset 0.000 === MainDDARing === Scheduled moves 1889328, completed moves 1889328, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === AuxDDARing === Scheduled moves 0, completed moves 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === Heat === Bed heaters = 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1 Heater 0 is on, I-accum = 0.1 === GCodes === Segments left: 0 Movement lock held by null HTTP* is doing "M122 B0" in state(s) 0 Telnet is idle in state(s) 0 File* is idle in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger* is idle in state(s) 0 Queue* is idle in state(s) 0 LCD is idle in state(s) 0 SBC is idle in state(s) 0 Daemon is idle in state(s) 0 Aux2 is idle in state(s) 0 Autopause is idle in state(s) 0 Code queue is empty. === CAN === Messages queued 1750358, send timeouts 1, received 362388, lost 0, longest wait 24ms for reply type 6024, peak Tx sync delay 578, free buffers 49 (min 47) Last cancelled message type 51 dest 1 === SBC interface === State: 4, failed transfers: 0 Last transfer: 1ms ago RX/TX seq numbers: 40775/40775 SPI underruns 0, overruns 0 Number of disconnects: 0, IAP RAM available 0x2d3ac Buffer RX/TX: 0/0-0 2021-03-03 06:30:03 Diagnostics for board 1: Duet EXP3HC firmware version 3.3beta1+1 (2021-03-02 15:56:53) Bootloader ID: SAME5x bootloader version 2.2 (2021-01-16b1) Never used RAM 154884, free system stack 0 words Tasks: Move(notifyWait,80) HEAT(delaying,69) CanAsync(notifyWait,72) CanRecv(notifyWait,82) CanClock(notifyWait,74) TMC(notifyWait,30) MAIN(running,301) AIN(delaying,260) Last reset 10:24:02 ago, cause: power up Last software reset at 2020-11-10 18:16, reason: StackOverflow, available RAM 154112, slot 0 Software reset code 0x0100 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0000080e BFAR 0xe000ed38 SP 0x2002ffb4 Task � Freestk 4294967295 ok Stack: 2000137c 200013b0 0002525d 00000000 0003e418 20001598 20003b28 00000000 00024885 200014b4 20003ab0 00000000 00f00000 e000ef34 c0000000 20003b30 000249c5 0002472c 61000000 ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff Driver 0: position -1440, 160.0 steps/mm, standstill, reads 17400, writes 2 timeouts 0, SG min/max 0/1023, steps req 268204856 done 268206165 Driver 1: position -144480, 160.0 steps/mm, standstill, reads 17399, writes 2 timeouts 0, SG min/max 0/1023, steps req 271278668 done 271279392 Driver 2: position 48970973, 830.0 steps/mm, standstill, reads 17397, writes 2 timeouts 0, SG min/max 0/1023, steps req 47753506 done 47753659 Moves scheduled 1888753, completed 1888753, in progress 0, hiccups 0, step errors 0, maxPrep 191, maxOverdue 5, maxInc 1, mcErrs 0, gcmErrs 0 Peak sync jitter 11, peak Rx sync delay 198, resyncs 30, no step interrupt scheduled VIN: 24.1V, V12: 12.1V MCU temperature: min 39.8C, current 40.0C, max 40.2C Ticks since heat task active 64, ADC conversions started 37442556, completed 37442554, timed out 0 Last sensors broadcast 0x00000002 found 1 68 ticks ago, loop time 0 CAN messages queued 361984, send timeouts 0, received 1747438, lost 0, free buffers 36, min 36, error reg 0 dup 0, oos 0, bm 0, wbm 0 2021-03-03 06:55:17 G10 P0 S220
Before and after the timeout fix:
And final again:
2021-03-03 18:14:05 m122 b0 === Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.3beta1+1 running on Duet 3 MB6HC v0.6 or 1.0 (SBC mode) Board ID: 08DJM-956L2-G43S4-6JKDD-3SJ6L-TB62G Used output buffers: 1 of 40 (21 max) === RTOS === Static ram: 148212 Dynamic ram: 62904 of which 0 recycled Never used RAM 143076, free system stack 98 words Tasks: Linux(ready,69) HEAT(delaying,284) CanReceiv(notifyWait,774) CanSender(notifyWait,359) CanClock(delaying,342) TMC(notifyWait,18) MAIN(running,649) IDLE(ready,20) Owned mutexes: HTTP(MAIN) === Platform === Last reset 10:54:16 ago, cause: software Last software reset at 2021-03-03 07:19, reason: User, none spinning, available RAM 143444, slot 2 Software reset code 0x0012 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x00400000 BFAR 0x00000000 SP 0x00000000 Task Linu Freestk 0 n/a Error status: 0x00 Aux0 errors 0,0,0 Aux1 errors 0,0,0 MCU temperature: min 42.1, current 43.6, max 45.1 Supply voltage: min 31.8, current 31.9, max 32.0, under voltage events: 0, over voltage events: 0, power good: yes 12V rail voltage: min 12.0, current 12.0, max 12.1, under voltage events: 0 Driver 0: position 160000, standstill, reads 42231, writes 2 timeouts 0, SG min/max 0/1023 Driver 1: position 0, standstill, reads 42231, writes 2 timeouts 0, SG min/max 0/1023 Driver 2: position 311549, standstill, reads 42227, writes 6 timeouts 0, SG min/max 0/156 Driver 3: position 0, standstill, reads 42231, writes 2 timeouts 0, SG min/max 0/1023 Driver 4: position 0, standstill, reads 42231, writes 2 timeouts 0, SG min/max 0/1023 Driver 5: position 0, standstill, reads 42231, writes 2 timeouts 0, SG min/max 0/1023 Date/time: 2021-03-03 18:14:02 Slowest loop: 1001.41ms; fastest: 0.05ms === Storage === Free file entries: 10 SD card 0 not detected, interface speed: 37.5MBytes/sec SD card longest read time 0.0ms, write time 0.0ms, max retries 0 === Move === DMs created 125, maxWait 7881ms, bed compensation in use: mesh, comp offset 0.000 === MainDDARing === Scheduled moves 1889328, completed moves 1889328, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 1, 0], CDDA state -1 === AuxDDARing === Scheduled moves 0, completed moves 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === Heat === Bed heaters = 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1 Heater 0 is on, I-accum = 0.1 === GCodes === Segments left: 0 Movement lock held by null HTTP* is doing "M122 B0" in state(s) 0 Telnet is idle in state(s) 0 File* is idle in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger* is idle in state(s) 0 Queue* is idle in state(s) 0 LCD is idle in state(s) 0 SBC is idle in state(s) 0 Daemon is idle in state(s) 0 Aux2 is idle in state(s) 0 Autopause is idle in state(s) 0 Code queue is empty. === CAN === Messages queued 2236102, send timeouts 4, received 463095, lost 0, longest wait 20ms for reply type 6024, peak Tx sync delay 554, free buffers 49 (min 47) Last cancelled message type 51 dest 1 === SBC interface === State: 4, failed transfers: 0 Last transfer: 1ms ago RX/TX seq numbers: 39306/39306 SPI underruns 0, overruns 0 Number of disconnects: 0, IAP RAM available 0x2d3dc Buffer RX/TX: 0/0-0 === Duet Control Server === Duet Control Server v3.2.2 Code buffer space: 4096 Configured SPI speed: 8000000 Hz Full transfers per second: 51.37 Maximum length of RX/TX data transfers: 5092/1688 2021-03-03 18:13:58 m122 b1 Diagnostics for board 1: Duet EXP3HC firmware version 3.3beta1+1 (2021-03-02 15:56:53) Bootloader ID: SAME5x bootloader version 2.2 (2021-01-16b1) Never used RAM 154884, free system stack 0 words Tasks: Move(notifyWait,80) HEAT(delaying,69) CanAsync(notifyWait,72) CanRecv(notifyWait,82) CanClock(notifyWait,74) TMC(notifyWait,30) MAIN(running,331) AIN(delaying,260) Last reset 10:54:27 ago, cause: software Last software reset at 2020-11-10 18:16, reason: StackOverflow, available RAM 154112, slot 0 Software reset code 0x0100 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0000080e BFAR 0xe000ed38 SP 0x2002ffb4 Task � Freestk 4294967295 ok Stack: 2000137c 200013b0 0002525d 00000000 0003e418 20001598 20003b28 00000000 00024885 200014b4 20003ab0 00000000 00f00000 e000ef34 c0000000 20003b30 000249c5 0002472c 61000000 ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff Driver 0: position -480, 160.0 steps/mm, standstill, reads 62626, writes 2 timeouts 0, SG min/max 0/1023, steps req 333181004 done 333181004 Driver 1: position -48160, 160.0 steps/mm, standstill, reads 62625, writes 2 timeouts 0, SG min/max 0/1023, steps req 340878416 done 340878416 Driver 2: position 48970973, 830.0 steps/mm, standstill, reads 62625, writes 3 timeouts 0, SG min/max 0/1023, steps req 60334361 done 60334361 Moves scheduled 1888677, completed 1888677, in progress 0, hiccups 0, step errors 0, maxPrep 180, maxOverdue 4, maxInc 2, mcErrs 0, gcmErrs 0 Peak sync jitter 11, peak Rx sync delay 204, resyncs 36, no step interrupt scheduled VIN: 24.1V, V12: 12.1V MCU temperature: min 39.8C, current 40.0C, max 40.2C Ticks since heat task active 155, ADC conversions started 39267896, completed 39267896, timed out 0 Last sensors broadcast 0x00000002 found 1 160 ticks ago, loop time 0 CAN messages queued 462985, send timeouts 0, received 2236041, lost 0, free buffers 36, min 36, error reg 0 dup 0, oos 0, bm 0, wbm 0 2021-03-03 16:46:53 Finished printing file 0:/gcodes/RearLeft v6.2.gcode, print time was 9h 18m
-
@dc42 said in 3.3b1CAN timeout resets itself?:
The messages timed out and messages lost counters get cleared immediately after you run M122.
The only time I have seen a problem with the CAN communication was under the following conditions:
- The EXP3HC had separate power wires from the PSU
I may have screwed this up. So It's best to power the 3HC from the 6HC Mainboard instead of running a separate wire from the PSU? Why?
-
@ctilley79 said in 3.3b1CAN timeout resets itself?:
I may have screwed this up. So It's best to power the 3HC from the 6HC Mainboard instead of running a separate wire from the PSU? Why?
It only matters if you are powering a high-powered bed heater directly from the Duet and you have long wires between the Duet and the PSU. Under those conditions, the changes in ground voltage caused by the bed heater being switching on/off rapidly with PWM can affect the CAN communication.