Duet3D Logo Duet3D
    • Tags
    • Documentation
    • Order
    • Register
    • Login

    [3.6.0-beta.2 and prev] Random out of sync

    Scheduled Pinned Locked Moved
    Beta Firmware
    3
    16
    499
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • dc42undefined
      dc42 administrators @leone
      last edited by

      @leone thanks for your report. If you run M122 what does it report for "peak Tx sync delay", both when the machine has been idle for several minutes and when it has been printing for several minutes? Note, that value gets reset after each M122 report.

      Duet WiFi hardware designer and firmware engineer
      Please do not ask me for Duet support via PM or email, use the forum
      http://www.escher3d.com, https://miscsolutions.wordpress.com

      1 Reply Last reply Reply Quote 0
      • dc42undefined
        dc42 administrators @leone
        last edited by dc42

        @leone further to my previous answer, here are some observations:

        1. I made a note in the source code for StepTimer.cpp "Occasionally on the SAME70 we get very large delays reported. These delays are not genuine." This could be relevant, however AFAIR the delays I was seeing were very much greater than 300.

        2. We don't yet use bit rate switching, so the time taken to send a full 64-byte CAN-FD packet is about 600us. So if the CAN clock task tries to send a time sync message just after transmission of another packet starts, the acknowledge delay is expected to be up to 600us.

        As to why you see the linearly decreasing acknowledge delay, if you see this when the machine is idle then my best guess is that it starts when the CAN clock task wakes up to send a time sync message just after the main board or an expansion board has started to send one of its standard broadcast messages. For example, the sensor readings are broadcast every 250ms whereas the CAN clock message is broadcast every 200ms. If the sending times of these messages clashes, then they clash again 1 second (5 samples) later. The clash won't be exact, because the sensors broadcast interval varies a little. That may explain the linearly decreasing acknowledge delay.

        If am right, we can avoid this clash every 5 messages by changing the CAN clock send interval to be relatively prime with respect to the sensors broadcast interval. Please try changing the definition of CanClockIntervalMillis at CanInterface.cpp(189) to a different value such as 199 or 211 and use your CAN protocol analyser to see if the behaviour has changed.

        Duet WiFi hardware designer and firmware engineer
        Please do not ask me for Duet support via PM or email, use the forum
        http://www.escher3d.com, https://miscsolutions.wordpress.com

        leoneundefined 1 Reply Last reply Reply Quote 0
        • leoneundefined
          leone @dc42
          last edited by

          @dc42 thanks for the suggestion!
          From M122, "peak Tx sync delay" is about 480 both when the machine is idle and when is printing. This might correspond to the 600us needed to send the CANFD packet, right?
          I tried with CanClockIntervalMillis = 199 and below is the (blue) plot from the CAN analyser. The interesting thing now is that the samples between ~4 and 300 are way more than before.
          Then I also increased the MaxTimeSyncDelay to 500, and the number of zeros is considerably reduced, which I think is the goal to avoid resyncs.
          Could this value of MaxTimeSyncDelay lead to other issues?

          360_canfreq.png 360_canfreq_zoom.png

          dc42undefined 1 Reply Last reply Reply Quote 0
          • dc42undefined
            dc42 administrators @leone
            last edited by

            @leone thanks for doing the test. I think that proves my theory.

            Are you certain that the number of high acknowledge delays is now higher? I'm not at all sure that the number of delays between 200 and 300 is any higher than before, although the number between 100 and 200 may be higher.

            I suspect that this change alone will fix the loss of sync issue by spreading out the high delays, however I agree with you that MaxTimeSyncDelay should be increased to cover the maximum delay expected.

            Duet WiFi hardware designer and firmware engineer
            Please do not ask me for Duet support via PM or email, use the forum
            http://www.escher3d.com, https://miscsolutions.wordpress.com

            leoneundefined 1 Reply Last reply Reply Quote 0
            • leoneundefined
              leone @dc42
              last edited by

              @dc42 with CanClockIntervalMillis = 200 the number of high acknowledge delays depends on how many clashes with other transmissions we have, and since it seems to happen quite random, the comparison with CanClockIntervalMillis = 199 is difficult to do. I tried to analyse 10 minutes in idle state with different configurations of CanClockIntervalMillis and MaxTimeSyncDelay.
              I think I will run some prints with CanClockIntervalMillis = 199 and MaxTimeSyncDelay = 500, to see if the resync issue still occurs. From the analysys below it seems to be the one with less number of zeros and less high delays.
              Do you think this is something relevant to include in the next release?

              freq199_thr300
              Percentage of zeros: 0.50%
              Percentage of values between 1 and 10: 96.60%
              Percentage of values between 11 and 100: 1.13%
              Percentage of values between 101 and 200: 1.10%
              Percentage of values greater than 200: 0.67%

              freq200_thr500
              Percentage of zeros: 0.00%
              Percentage of values between 1 and 10: 93.77%
              Percentage of values between 11 and 100: 2.30%
              Percentage of values between 101 and 200: 2.10%
              Percentage of values greater than 200: 1.83%

              freq199_thr500
              Percentage of zeros: 0.00%
              Percentage of values between 1 and 10: 96.63%
              Percentage of values between 11 and 100: 1.30%
              Percentage of values between 101 and 200: 1.10%
              Percentage of values greater than 200: 0.97%

              compare.png

              dc42undefined 1 Reply Last reply Reply Quote 0
              • dc42undefined
                dc42 administrators @leone
                last edited by

                @leone I've committed a change to 3.6-dev to increase the allowed transmit delay and to change the time sync interval to 211ms. I'll also look at the other point you mentioned about moves being discarded.

                Duet WiFi hardware designer and firmware engineer
                Please do not ask me for Duet support via PM or email, use the forum
                http://www.escher3d.com, https://miscsolutions.wordpress.com

                leoneundefined 1 Reply Last reply Reply Quote 0
                • leoneundefined
                  leone @dc42
                  last edited by

                  @dc42 great, thanks!

                  dc42undefined 1 Reply Last reply Reply Quote 0
                  • dc42undefined
                    dc42 administrators @leone
                    last edited by dc42

                    @leone I'm still trying to work out why you were getting loss of sync. Please can you run the original 3.6 firmware in which you noticed the problem, then when the problem is observed run M122 B# where # is the CAN address of the expansion board. I'm interested in whether it is timeout resyncs or jitter resyncs that are causing the loss of sync.

                    Duet WiFi hardware designer and firmware engineer
                    Please do not ask me for Duet support via PM or email, use the forum
                    http://www.escher3d.com, https://miscsolutions.wordpress.com

                    leoneundefined 1 Reply Last reply Reply Quote 0
                    • leoneundefined
                      leone @dc42
                      last edited by

                      @dc42 I still have one:

                      All averaging filters OK Never used RAM 164992, free system stack 172 words 
                      Tasks: Move(3,nWait 7,0.0%,126) TMC(2,delaying,1.4%,51) HEAT(2,nWait 6,0.1%,89) CanAsync(5,nWait 4,0.0%,66) CanRecv(3,nWait 1,0.1%,73) CanClock(5,nWait 1,0.0%,63) LED(2,delaying,0.0%,123) ACCEL(3,nWait 6,0.0%,87) MAIN(1,running,97.4%,319) IDLE(0,ready,0.0%,29) AIN(2,delaying,1.0%,259), total 100.0% 
                      Owned mutexes: 
                      Last reset 02:15:02 ago, cause: power up 
                      Last software reset data not available 
                      Moves scheduled 37920, hiccups 0 (0.00/0.00ms), segs 37, step errors 0 (types 0x0), maxLate 0 maxPrep 62, ebfmin 0.00 max 0.00 
                      Peak sync jitter -3/41, peak Rx sync delay 187, resyncs 1/0, no timer interrupt scheduled, next step interrupt due in 4247606105 ticks, disabled 
                      VIN voltage: min 24.2, current 24.4, max 24.5 
                      MCU temperature: min 51.7C, current 58.0C, max 60.5C 
                      Driver 0: pos 12510580, 2056.0 steps/mm, standstill, SG min 0, read errors 0, write errors 0, ifcnt 25, reads 53750, writes 25, timeouts 0, DMA errors 0, CC errors 0 Last sensors broadcast 0x3fe0000000 found 9 225 ticks ago, 0 ordering errs, loop time 0 
                      CAN messages queued 192060, send timeouts 0, received 305253, lost 0, ignored 16, errs 0, boc 0, free buffers 22, min 22, error reg 0 dup 0, oos 0/0/0/0, bm 0, wbm 0, rxMotionDelay 299, adv 35157/37138 
                      Accelerometer: ADXL345, status: 00 I2C bus errors 0, naks 0, contentions 0, other errors 0 
                      
                      dc42undefined 1 Reply Last reply Reply Quote 0
                      • dc42undefined
                        dc42 administrators @leone
                        last edited by

                        @leone thanks. Was that definitely after a sync error? It includes resyncs 1/0 which suggests it was a timeout resync, not a jitter resync. I think a timeout resync should only occur if no time sync message with a valid ack delay field is received over a period of 2 seconds.

                        Duet WiFi hardware designer and firmware engineer
                        Please do not ask me for Duet support via PM or email, use the forum
                        http://www.escher3d.com, https://miscsolutions.wordpress.com

                        leoneundefined 1 Reply Last reply Reply Quote 0
                        • leoneundefined
                          leone @dc42
                          last edited by leone

                          @dc42 yes, I logged that message right after the expansion boards got out of sync and the print shifted. I was also able to sniff the CAN messages of another print that failed in the same way and that's where I noticed 'lastTimeAcknowledgeDelay' being equal to zero for more than 10 times consecutively. That's how I started to test higher values for MaxTimeSyncDelay (initially 375), but I didn't know how far I could go.

                          dc42undefined 1 Reply Last reply Reply Quote 0
                          • dc42undefined
                            dc42 administrators @leone
                            last edited by

                            @leone thanks again. Looks like the maximum allowed acknowledge delay should be about 900 to allow for maximum length messages. However, I'm still a little surprised that you had the maximum delay exceeded 10 times in succession.

                            Duet WiFi hardware designer and firmware engineer
                            Please do not ask me for Duet support via PM or email, use the forum
                            http://www.escher3d.com, https://miscsolutions.wordpress.com

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            Unless otherwise noted, all forum content is licensed under CC-BY-SA