Duet3D Logo Duet3D
    • Tags
    • Documentation
    • Order
    • Register
    • Login

    [feature request] ECC for SPI transfer

    Scheduled Pinned Locked Moved
    DSF Development
    2
    3
    161
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • timschneiderundefined
      timschneider
      last edited by timschneider

      The current SPI Protocol Implementation is already acounting some transfer errors and will retry up to Settings.MaxSpiRetries.
      The retry mechanis can fail without any retry e.g. for bit errors in the transfer on several positions in the header or data transfer.
      Moreover in a noisy environment the transfer can fail for a single bit error in every transfer for more than Settings.MaxSpiRetries.

      The code is quite complex.

      ExchangeHeader()
      https://github.com/Duet3D/DuetSoftwareFramework/blob/079e0158c757f86d954902e409eb3c89fc7e8197/src/DuetControlServer/SPI/DataTransfer.cs#L1385

      ExchangeData()
      https://github.com/Duet3D/DuetSoftwareFramework/blob/079e0158c757f86d954902e409eb3c89fc7e8197/src/DuetControlServer/SPI/DataTransfer.cs#L1567C14-L1567C14

      In order to make the SPI transfer more robust I propose to implement some sort of Error Correction Code (ECC).

      For example a simple Hamming code, this provides single-bit error correction and 2-bit error detection.

      In order not to loose speed, ECC should be enabled in every header and by default not be enabled in data exchange (CRC is ok).
      If the CRC is wrong for the first time, both ends fallback to ECC even in the data transfer.*

      @chrishamm can tell if speed is a problem in the SPI transfer, if not, ECC should be enabled in every transfer but this may interfere with the use of DMA.

      The Background: I often run printjobs for more than a few days, but it is very likly that the SPI will reset mid print for that long period of time in sbc setup. I do not have this kind of failure in standalone mode - I can run duet in standalone for month without any error - this is not the case in sbc mode.

      For reference:
      https://forum.duet3d.com/topic/34460/multiple-print-failures/15?_=1704373031244
      https://forum.duet3d.com/topic/34315/rff-3-5-0-rc1-spi-reset-mid-print

      chrishammundefined 1 Reply Last reply Reply Quote 0
      • chrishammundefined
        chrishamm administrators @timschneider
        last edited by

        @timschneider I don't really see why the retry mechanisms can fail without any retry. I did extensive tests with all sorts of different transfer errors that could be caused by RRF or DSF, and I can say that the CRC-based error detection/recovery worked well when I did. Also note that we have several SBC customers that have been printing 24/7 for months without any problems, so your report really sounds like an issue specific to your setup. I do remember SPI communication issues with a RockPi that I did not see with a RaspPi, so comparing the two might be worthwhile as well.

        ECC may help in your scenario but implementing and testing it sufficiently doesn't seem like a quick change to me. Also, there are several more urgent things I need to take care of at this point, but of course I'd be happy to accept a PR if you fancy implementing and testing it yourself 😉

        Duet software engineer

        timschneiderundefined 1 Reply Last reply Reply Quote 1
        • timschneiderundefined
          timschneider @chrishamm
          last edited by

          @chrishamm
          I'll check the raspberry vs rockpi hint. And maybe you are right, my crashes are not related to the SPI bus - I'll check that.

          1 Reply Last reply Reply Quote 0
          • First post
            Last post
          Unless otherwise noted, all forum content is licensed under CC-BY-SA