Duet3D Logo Duet3D
    • Tags
    • Documentation
    • Order
    • Register
    • Login

    Slow/Unreliable SPI Communication when Logging with M929??

    Scheduled Pinned Locked Moved Unsolved
    DSF Development
    1
    3
    354
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • garethkyundefined
      garethky
      last edited by

      I've been running a rather long macro from the SBC and found that it cannot complete reliably. I get various symptoms and errors:

      • The printer pauses or hesitates between moves for approx. 0.25 to 2 second
        • Pauses are more frequent if the expression language is used, particularly loops
      • Warning that the Macro has not completed in 3000ms (the macro takes 3+ minutes to run, but I don't always get this error, so its no the total macro runtime its complaining about)
      • Warning: Controller has been reset - macro execution stops and I have to reset the machine

      What Hardware and Software?

      Duet 3 and Raspberry Pi 3B+ running 3.2 final

      What debugging steps have been taken?

      In order:

      • Re-worked my macro to remove all scripting and expressions so its plain GCode (eliminates parsing weirdness on the SBC as root cause)
      • Switch to Ethernet on the SBC, disabled WiFi
      • Switched SBC power to an external power adapter, removed the 5v -> SBC jumper
      • Checked that there is no SD card in the Duet
      • Tried the following config changes in DCS suggested here: https://github.com/Duet3D/DuetSoftwareFramework/issues/137
        • Raised the SPI frequency to 16Mhz
        • Dropped the poll interval to 5ms
        • Increased buffered macro codes to 32
      • Moving the Pi board and ribbon cable as far away from any other wiring as is practical in case its interference
      • Disable logging in the script by commenting out the M929 S3 P"/sys/bed-sample.log" line

      Results

      Increasing the SPI frequency and lowering the poll interval had a positive impact. This resulted in noticeably snappier macro execution. But the intermittent complete stalls remained and the Warning: Controller has been reset still happened.

      Turning off the logging command in the macro seems to have eliminated the stall problem. My wild guess in that DCS blocks to perform logging file I/O and this causes an SPI stall but I haven't looked at the DCS code to check. Since I need this on to capture the probing data for Python to process I cant accept turning this off as a solution (unless you know another way to get data out of the duet reliably).

      Possible Next Steps

      • Try a very fast SD card to see if the faster I/O speed helps
      • Try raising the logging level from Info to Error and then log all data as 'Error' on the theory that this would have less overhead somehow
      • Try a Pi4 on the theory that more CPU might help.

      M122

      I got one right before a reset 😁 SPI Shows some disconnects have occurred and there appears to be an underun logged on the Duet (if I'm reading that correctly).

      1/19/2021, 1:06:45 PM	Warning: Controller has been reset
      1/19/2021, 1:06:33 PM	M122
      === Diagnostics ===
      RepRapFirmware for Duet 3 MB6HC version 3.2 running on Duet 3 MB6HC v0.6 or 1.0 (SBC mode)
      Board ID: 08DJM-956L2-G43S4-6JTD4-3SS6K-TA7AH
      Used output buffers: 1 of 40 (16 max)
      === RTOS ===
      Static ram: 149788
      Dynamic ram: 65312 of which 76 recycled
      Never used RAM 143656, free system stack 122 words
      Tasks: Linux(ready,87) HEAT(blocked,296) CanReceiv(blocked,834) CanSender(blocked,344) CanClock(blocked,352) TMC(blocked,19) MAIN(running,1120) IDLE(ready,19)
      Owned mutexes: Telnet(MAIN)
      === Platform ===
      Last reset 00:26:13 ago, cause: software
      Last software reset at 2021-01-19 12:40, reason: User, none spinning, available RAM 143620, slot 0
      Software reset code 0x0012 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0044a000 BFAR 0x00000000 SP 0x00000000 Task Linu Freestk 0 n/a
      Error status: 0x00
      Aux0 errors 0,0,0
      Aux1 errors 0,0,0
      MCU temperature: min 44.5, current 46.4, max 46.7
      Supply voltage: min 24.2, current 24.3, max 24.3, under voltage events: 0, over voltage events: 0, power good: yes
      12V rail voltage: min 12.2, current 12.2, max 12.3, under voltage events: 0
      Driver 0: position 56774, standstill, reads 48573, writes 8 timeouts 0, SG min/max 0/1023
      Driver 1: position 48723, standstill, reads 48573, writes 8 timeouts 0, SG min/max 0/1023
      Driver 2: position 1600, standstill, reads 48573, writes 8 timeouts 0, SG min/max 0/1023
      Driver 3: position 24800, standstill, reads 48578, writes 2 timeouts 0, SG min/max not available
      Driver 4: position 0, standstill, reads 48581, writes 0 timeouts 0, SG min/max not available
      Driver 5: position 0, standstill, reads 48581, writes 0 timeouts 0, SG min/max not available
      Date/time: 2021-01-19 13:06:32
      Slowest loop: 2882.45ms; fastest: 0.04ms
      === Storage ===
      Free file entries: 10
      SD card 0 not detected, interface speed: 37.5MBytes/sec
      SD card longest read time 0.0ms, write time 0.0ms, max retries 0
      === Move ===
      DMs created 125, maxWait 446431ms, bed compensation in use: none, comp offset 0.000
      === MainDDARing ===
      Scheduled moves 1506, completed moves 1506, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 3], CDDA state -1
      === AuxDDARing ===
      Scheduled moves 0, completed moves 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1
      === Heat ===
      Bed heaters = 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1
      === GCodes ===
      Segments left: 0
      Movement lock held by HTTP
      HTTP* is idle in state(s) 45 0, running macro
      Telnet* is doing "M122" in state(s) 0
      File is idle in state(s) 0
      USB is idle in state(s) 0
      Aux is idle in state(s) 0
      Trigger* is idle in state(s) 0
      Queue is idle in state(s) 0
      LCD is idle in state(s) 0
      SBC is idle in state(s) 0
      Daemon is idle in state(s) 0 0, running macro
      Aux2 is idle in state(s) 0
      Autopause is idle in state(s) 0
      Code queue is empty.
      === CAN ===
      Messages queued 6765, send timeouts 0, received 17182, lost 0, longest wait 1ms for reply type 6013, free buffers 48
      === SBC interface ===
      State: 4, failed transfers: 0
      Last transfer: 1ms ago
      RX/TX seq numbers: 8122/8122
      SPI underruns 0, overruns 0
      Number of disconnects: 2, IAP RAM available 0x2c884
      Buffer RX/TX: 1416/112-4088
      === Duet Control Server ===
      Duet Control Server v3.2.0
      HTTP:
      Buffered code: G30 S-1
      Buffered code: G1 Z1 F252
      Buffered code: M118 L1 S"Bed probe location: X:236.872 Y:8.128"
      Buffered code: G1 X236.872 Y20.128 F9000
      Buffered code: G30 S-1
      Buffered code: G1 Z1 F252
      Buffered code: G30 S-1
      Buffered code: G1 Z1 F252
      Buffered code: M118 L1 S"Bed probe location: X:196.872 Y:8.128"
      Buffered code: G1 X196.872 Y20.128 F9000
      Buffered code: G30 S-1
      Buffered code: G1 Z1 F252
      Buffered code: G30 S-1
      Buffered code: G1 Z1 F252
      Buffered code: M118 L1 S"Bed probe location: X:170.0 Y:8.128"
      Buffered code: G1 X170.0 Y20.128 F9000
      Buffered code: G30 S-1
      Buffered code: G1 Z1 F252
      Buffered code: G30 S-1
      Buffered code: G1 Z1 F252
      Buffered code: M118 L1 S"Bed probe location: X:130.0 Y:8.128"
      Buffered code: G1 X130.0 Y20.128 F9000
      Buffered code: G30 S-1
      Buffered code: G1 Z1 F252
      Buffered code: G30 S-1
      Buffered code: G1 Z1 F252
      Buffered code: M118 L1 S"Bed probe location: X:103.128 Y:8.128"
      Buffered code: G1 X103.128 Y20.128 F9000
      Buffered code: G30 S-1
      Buffered code: G1 Z1 F252
      Buffered code: G30 S-1
      Buffered code: G1 Z1 F252
      ==> 1420 bytes
      Executing macro 0:/macros/mesh-bed-leveling/mesh-bed-probe.g, started by M98 P"0:/macros/mesh-bed-leveling/mesh-bed-probe.g"
      > Next stack level
      Code buffer space: 2696
      Configured SPI speed: 16000000 Hz
      Full transfers per second: 78.97
      Maximum length of RX/TX data transfers: 3669/1548
      ok
      1/19/2021, 1:05:45 PM	Connected to duet3
      
      gtjoseph created this issue in Duet3D/DuetSoftwareFramework

      closed DCS 2.2.0: Can't keep up at 150mm/sec print speeds even with 16MHz SPI speed #137

      1 Reply Last reply Reply Quote 1
      • garethkyundefined
        garethky
        last edited by

        Ooh its all C#! Logging happens here using good async constructs, this shouldn't in principal block the thread:

        https://github.com/Duet3D/DuetSoftwareFramework/blob/7fd44735ba064ec831a67377765ccfcc607ec756/src/DuetControlServer/Utility/Logger.cs#L134

        So I'm going after the SD card first then...

        1 Reply Last reply Reply Quote 0
        • garethkyundefined
          garethky
          last edited by

          I switched to a SanDisk Extreme 128GB card and that totally cleared up the issue. This is one that I use for recording 4K video.

          The card I had before was a Kingston 16GB card I got with the Pi.

          I'm happy that I figured out a way around this, but I'm surprised that the async/await programming constructs don't protect us from I/O flakiness!

          • The file is wrapped in a StreamWriter that should have a 4K buffer
          • The whole call chain is async/await. Log writing doesn't have to complete before the next GCode instruction executes (at least from what I can read without debugging it).

          I was going to suggest keeping a queue of logging data and writing it on a separate dedicated thread for I/O. But StreamWriter is basically that. (the only difference being that the log time can be captured without trying to lock the file)

          So yeah, not sure how the code should change. But if you are having problems, get a FAST SD card.

          1 Reply Last reply Reply Quote 2
          • First post
            Last post
          Unless otherwise noted, all forum content is licensed under CC-BY-SA