[3.6.0-beta.2 and prev] Random out of sync
-
@dc42 thanks for the suggestion!
From M122, "peak Tx sync delay" is about 480 both when the machine is idle and when is printing. This might correspond to the 600us needed to send the CANFD packet, right?
I tried with CanClockIntervalMillis = 199 and below is the (blue) plot from the CAN analyser. The interesting thing now is that the samples between ~4 and 300 are way more than before.
Then I also increased the MaxTimeSyncDelay to 500, and the number of zeros is considerably reduced, which I think is the goal to avoid resyncs.
Could this value of MaxTimeSyncDelay lead to other issues? -
@leone thanks for doing the test. I think that proves my theory.
Are you certain that the number of high acknowledge delays is now higher? I'm not at all sure that the number of delays between 200 and 300 is any higher than before, although the number between 100 and 200 may be higher.
I suspect that this change alone will fix the loss of sync issue by spreading out the high delays, however I agree with you that MaxTimeSyncDelay should be increased to cover the maximum delay expected.
-
@dc42 with CanClockIntervalMillis = 200 the number of high acknowledge delays depends on how many clashes with other transmissions we have, and since it seems to happen quite random, the comparison with CanClockIntervalMillis = 199 is difficult to do. I tried to analyse 10 minutes in idle state with different configurations of CanClockIntervalMillis and MaxTimeSyncDelay.
I think I will run some prints with CanClockIntervalMillis = 199 and MaxTimeSyncDelay = 500, to see if the resync issue still occurs. From the analysys below it seems to be the one with less number of zeros and less high delays.
Do you think this is something relevant to include in the next release?freq199_thr300
Percentage of zeros: 0.50%
Percentage of values between 1 and 10: 96.60%
Percentage of values between 11 and 100: 1.13%
Percentage of values between 101 and 200: 1.10%
Percentage of values greater than 200: 0.67%freq200_thr500
Percentage of zeros: 0.00%
Percentage of values between 1 and 10: 93.77%
Percentage of values between 11 and 100: 2.30%
Percentage of values between 101 and 200: 2.10%
Percentage of values greater than 200: 1.83%freq199_thr500
Percentage of zeros: 0.00%
Percentage of values between 1 and 10: 96.63%
Percentage of values between 11 and 100: 1.30%
Percentage of values between 101 and 200: 1.10%
Percentage of values greater than 200: 0.97% -
@leone I've committed a change to 3.6-dev to increase the allowed transmit delay and to change the time sync interval to 211ms. I'll also look at the other point you mentioned about moves being discarded.
-
@dc42 great, thanks!
-
@leone I'm still trying to work out why you were getting loss of sync. Please can you run the original 3.6 firmware in which you noticed the problem, then when the problem is observed run M122 B# where # is the CAN address of the expansion board. I'm interested in whether it is timeout resyncs or jitter resyncs that are causing the loss of sync.
-
@dc42 I still have one:
All averaging filters OK Never used RAM 164992, free system stack 172 words Tasks: Move(3,nWait 7,0.0%,126) TMC(2,delaying,1.4%,51) HEAT(2,nWait 6,0.1%,89) CanAsync(5,nWait 4,0.0%,66) CanRecv(3,nWait 1,0.1%,73) CanClock(5,nWait 1,0.0%,63) LED(2,delaying,0.0%,123) ACCEL(3,nWait 6,0.0%,87) MAIN(1,running,97.4%,319) IDLE(0,ready,0.0%,29) AIN(2,delaying,1.0%,259), total 100.0% Owned mutexes: Last reset 02:15:02 ago, cause: power up Last software reset data not available Moves scheduled 37920, hiccups 0 (0.00/0.00ms), segs 37, step errors 0 (types 0x0), maxLate 0 maxPrep 62, ebfmin 0.00 max 0.00 Peak sync jitter -3/41, peak Rx sync delay 187, resyncs 1/0, no timer interrupt scheduled, next step interrupt due in 4247606105 ticks, disabled VIN voltage: min 24.2, current 24.4, max 24.5 MCU temperature: min 51.7C, current 58.0C, max 60.5C Driver 0: pos 12510580, 2056.0 steps/mm, standstill, SG min 0, read errors 0, write errors 0, ifcnt 25, reads 53750, writes 25, timeouts 0, DMA errors 0, CC errors 0 Last sensors broadcast 0x3fe0000000 found 9 225 ticks ago, 0 ordering errs, loop time 0 CAN messages queued 192060, send timeouts 0, received 305253, lost 0, ignored 16, errs 0, boc 0, free buffers 22, min 22, error reg 0 dup 0, oos 0/0/0/0, bm 0, wbm 0, rxMotionDelay 299, adv 35157/37138 Accelerometer: ADXL345, status: 00 I2C bus errors 0, naks 0, contentions 0, other errors 0
-
@leone thanks. Was that definitely after a sync error? It includes
resyncs 1/0
which suggests it was a timeout resync, not a jitter resync. I think a timeout resync should only occur if no time sync message with a valid ack delay field is received over a period of 2 seconds. -
@dc42 yes, I logged that message right after the expansion boards got out of sync and the print shifted. I was also able to sniff the CAN messages of another print that failed in the same way and that's where I noticed 'lastTimeAcknowledgeDelay' being equal to zero for more than 10 times consecutively. That's how I started to test higher values for MaxTimeSyncDelay (initially 375), but I didn't know how far I could go.
-
@leone thanks again. Looks like the maximum allowed acknowledge delay should be about 900 to allow for maximum length messages. However, I'm still a little surprised that you had the maximum delay exceeded 10 times in succession.