Chasing bad SPI checksums.



  • @chrishamm said in Bad checksums - Which connection?:

    @Danal It looks like your Duet 3 restarted/crashed before the print was cancelled:

    Nov 08 00:41:17 duet3 DuetControlServer[351]: [warn] Controller has been reset

    What RRF version are you using, can you share the output of M122?

    Board: Duet 3 MB6HC (MB6HC)
    DSF Version: 1.1.0.5
    Firmware: RepRapFirmware for Duet 3 MB6HC v0.6 or 1.0 3.0beta12 (2019-11-02b1)
    Duet Web Control 2.0.4

    Unfortunately, I did not obtain M122 prior to a power cycle.



  • @wilriker said in Bad checksums - Which connection?:

    Just a short one from me: I have it running with a RPi4 at ~20.83MHz (set it to 21MHz and it will be get to the actual value through core clock divisor) stable. But I also have fixed my core clock in /boot/config.txt to not have those uncontrollable frequency changes:

    core_freq=250
    core_freq_min=250
    

    I was thinking about trying that, as the Pi4 is plenty fast enough.

    Thanks for the detailed settings, I will do that next.


  • administrators

    @Danal said in Bad checksums - Which connection?:

    core_freq=250
    core_freq_min=250

    @wilriker are there any downsides to this approach?


  • administrators

    @T3P3Tony I also see from the documentation:

    For Pi 4B the default value is 500, 600 is the only other accepted value;



  • @T3P3Tony said in Bad checksums - Which connection?:

    @Danal said in Bad checksums - Which connection?:

    core_freq=250
    core_freq_min=250

    @wilriker are there any downsides to this approach?

    CPU scales independently from the core clock and I could not find what else this has an influence on. So far I have not encountered any downsides.

    Re 500 and 600: that's for overclocking. But you can always underclock. Confirmed on mine.

    I will post some more details findings later. In short SPI clock is only about 40% of what your set it to on RPi 4 unless you fix the core clock.


  • administrators

    @T3P3Tony said in Bad checksums - Which connection?:

    core_freq=250
    core_freq_min=250

    but i i want to be greedy, don't care about power, and have a heatsink i would potentially fix this at 500 or 600?

    I just checked (vcgencmd measure_clock arm) and i am running at 600 right now (probably due to using VNC with a big screen resolution which is quite processor intensive i guess).

    might test:

    core_freq=600
    core_freq_min=600
    

    Also what is not clear is what the actual difference between doing that and using force_turbo=1
    is



  • @T3P3Tony said in Bad checksums - Which connection?:

    @T3P3Tony said in Bad checksums - Which connection?:

    core_freq=250
    core_freq_min=250

    but i i want to be greedy, don't care about power, and have a heatsink i would potentially fix this at 500 or 600?

    I don't think this brings very much improvement but yes, you could fix that at 500 (I would not overclock something I don't even know what it all influences).

    I just had a thought that it might influence RAM clock but I have no evidence to back that up.

    I just checked (vcgencmd measure_clock arm) and i am running at 600 right now (probably due to using VNC with a big screen resolution which is quite processor intensive i guess).

    That's the CPU clock you checked. So although you are running VNC it's still stepped down to it's minimum speed. 😁 It's the clock core you were looking for.

    Also what is not clear is what the actual difference between doing that and using force_turbo=1
    is

    force_turbo=1 will do some more things, some of them being very sketchy when it comes to system stability - but it also blows an "efuse" (i.e. it sets a non-reversible flag) to void your warranty.

    Now to my (most likely confusing) explanation about SPI speed: what you set is the maximum frequency. For the first two Pi generations that is equal to the frequency at any point in time but starting with RPi 3 this changed. Before that the core clock (which is the base for the SPI frequency) was fixed at 250MHz. RPi 3 has a scaling core clock 250-400MHz. And RPi 4 has a range of 200-500MHz (that lower minimum will bite is later).

    SPI frequency is derived from the core frequency divided by an even number and also scales with it. And since all RPis should be interchangeable they all use the common denominator of 250MHz for that. The first even number is 2 and thus 125MHz is the maximum SPI frequency any RPi can achieve.
    But since they introduced possibly higher core frequencies with RPi 3 and later they had to make sure that the above mentioned maximum frequency is not exceeded. They solved it by first deriving the actual maximum by checking for the next lower divider (e.g. setting to 21MHz will result in 125/6 =~ 20.83MHz) and then scale that value again so this will only be reached at maximum core clock.

    Simple example: set SPI frequency to 125MHz. On a RPi 4 with a maximum of 500MHz this means when the core clock actually runs at 250MHz the SPI frequency will also be halfed to 62.5MHz. And now the core clock in RPi 4 can even go down below that to 200MHz and guess what: SPI frequency will follow that as well.

    The biggest problem here is that whenever core clock changes SPI frequency will change with it not awaiting any data transfer to finish - some devices handle this better than others.
    And of course also you get lower speeds than what you requested without even noticing most of the time since with current resource usage when attached to a Duet 3 mine runs at only insignificantly higher usage than fully idle. core clock in mine was virtually fixed at 200MHz.

    That's why I fixed my minimum (new with RPi 4) and maximum to 250MHz. I don't know how it influences SPI frequency if you fix it at any value above that (my scope is orders of magnitude too slow to check that - might get myself a more capable one some day) and whenever I try to formulate a guess in my head I find a reason why this might still be different. So I just don't know.



  • @wilriker Can I just say "ew"?

    Get a Jetson Nano. 🤣



  • First, my cable layout. Nothing really near; nothing crossing; anything even sort of near at 90 degrees.

    IMG_0859.jpeg

    Second, no joy on eliminating thermal management of the core clock.

    /boot/config.txt
    core_freq=250
    core_freq_min=250
    

    And a reboot does not eliminate all of my checksum messages. I'm beginning to think I have a hardware problem. Time to power off and re-seat cables.

    By the way, I did check that the commands had the desired effect:

    pi@duet3:~ $ vcgencmd measure_clock core
    frequency(1)=250000496
    pi@duet3:~ $ vcgencmd measure_clock core
    frequency(1)=250000496
    pi@duet3:~ $ vcgencmd measure_clock core
    frequency(1)=250000496


  • Well... very VERY interesting. Hard to prove a negative... but... I reseated the ribbon cable at both ends, and as of this moment, I've run about 5x the testing that reliably produced a checksum error in the past, with no errors logged.

    Update: Hours and Hours of printing. No Checksum errors.

    Looks like re-seating the cables fixed it.



  • And in your spare time you fly 5 T-Rex's...heh heh. Is there no limit to your robopsychosis?


Log in to reply