Urgent: RRF 3.2 Messed Up Delta Auto-Calibration
-
A few months ago, I tried RRF 3.2 beta 2 for testing. I found that it completely broke auto-calibration on my Delta printer. The auto-calibration would routinely skip probe points and fail for no apparent reason. I reported the issue here and was told that it would be fixed in a bug fix update before release. I switched back to RRF 3.1.1 and everything worked as expected.
Fast forward to today, RRF 3.2 is officially released and I started running it. Lo and behold, the issue persists, though to a lesser extent. The auto-calibration routine will still occasionally skip random probing points, except now it always completes the routine, albeit with a ridiculous result at the end due to the missing point(s). When this happens, the printer will also entirely skip mesh bed probing (G29) and go straight to printing. Once in a while, the auto-calibration succeeds, and on those rare instances, the printer performs the mesh bed probing as instructed. When this happens, my prints proceed and complete properly, but it's very rare.
Both G32 and G29 are in my slicer-generated gcode files, so I have no clue why the printer would skip the G29 when G32 skips probing points, since it seems to report that it completed successfully.
In any case, with the release of RRF 3.2, my printer is now practically useless. This is infuriating. And reverting back to 3.1.1 seems non-trivial since the stable repo now contains RRF 3.2 as the most recent version.
I'm running a Duet 3 6HC in SBC mode with a Pi 4 running the latest version of all packages (apt update && apt upgrade). Here's the result of M122:
=== Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.2 running on Duet 3 MB6HC v1.01 or later (SBC mode) Board ID: 08DJM-956L2-G43S8-6J9D2-3SJ6P-1A0LG Used output buffers: 1 of 40 (11 max) === RTOS === Static ram: 149788 Dynamic ram: 63408 of which 64 recycled Never used RAM 145572, free system stack 128 words Tasks: Linux(ready,111) HEAT(blocked,296) CanReceiv(blocked,927) CanSender(blocked,350) CanClock(blocked,352) TMC(blocked,19) MAIN(running,265) IDLE(ready,19) Owned mutexes: HTTP(MAIN) === Platform === Last reset 00:53:28 ago, cause: software Last software reset at 2021-01-05 14:18, reason: User, FilamentSensors spinning, available RAM 145572, slot 0 Software reset code 0x000d HFSR 0x00000000 CFSR 0x00000000 ICSR 0x00400000 BFAR 0x00000000 SP 0x00000000 Task Linu Freestk 0 n/a Error status: 0x00 Aux0 errors 0,0,0 Aux1 errors 0,0,0 MCU temperature: min 43.2, current 44.3, max 45.5 Supply voltage: min 24.0, current 24.3, max 24.6, under voltage events: 0, over voltage events: 0, power good: yes 12V rail voltage: min 11.9, current 12.0, max 12.0, under voltage events: 0 Driver 0: position 173106, standstill, reads 47906, writes 24 timeouts 0, SG min/max 0/89 Driver 1: position 173130, standstill, reads 47908, writes 23 timeouts 0, SG min/max 0/391 Driver 2: position 173168, standstill, reads 47908, writes 23 timeouts 0, SG min/max 0/397 Driver 3: position 0, standstill, reads 47908, writes 23 timeouts 0, SG min/max 0/415 Driver 4: position 0, standstill, reads 47920, writes 11 timeouts 0, SG min/max 0/0 Driver 5: position 0, standstill, reads 47920, writes 11 timeouts 0, SG min/max 0/0 Date/time: 2021-01-05 15:12:22 Slowest loop: 195.85ms; fastest: 0.04ms === Storage === Free file entries: 10 SD card 0 not detected, interface speed: 37.5MBytes/sec SD card longest read time 0.0ms, write time 0.0ms, max retries 0 === Move === DMs created 125, maxWait 2532605ms, bed compensation in use: none, comp offset 0.000 === MainDDARing === Scheduled moves 43, completed moves 43, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === AuxDDARing === Scheduled moves 0, completed moves 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 === Heat === Bed heaters = 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = 2 -1 -1 -1 Heater 0 is on, I-accum = 0.0 Heater 1 is on, I-accum = 0.6 Heater 2 is on, I-accum = 0.0 === GCodes === Segments left: 0 Movement lock held by null HTTP* is doing "M122" in state(s) 0 Telnet is idle in state(s) 0 File* is idle in state(s) 0 USB is idle in state(s) 0 Aux is idle in state(s) 0 Trigger* is idle in state(s) 0 Queue is idle in state(s) 0 LCD is idle in state(s) 0 SBC is idle in state(s) 0 Daemon is idle in state(s) 0 Aux2 is idle in state(s) 0 Autopause is idle in state(s) 0 Code queue is empty. === Filament sensors === Extruder 0: pos 218.32, errs: frame 0 parity 0 ovrun 0 pol 0 ovdue 0 === CAN === Messages queued 12831, send timeouts 28872, received 0, lost 0, longest wait 0ms for reply type 0, free buffers 48 === SBC interface === State: 4, failed transfers: 0 Last transfer: 1ms ago RX/TX seq numbers: 48867/48867 SPI underruns 0, overruns 0 Number of disconnects: 0, IAP RAM available 0x2c8a8 Buffer RX/TX: 0/0-0 === Duet Control Server === Duet Control Server v3.2.0 Code buffer space: 4096 Configured SPI speed: 8000000 Hz Full transfers per second: 35.68 Maximum length of RX/TX data transfers: 2756/1364
-
You can roll back with this.
sudo apt install \ duetsoftwareframework=3.1.1 \ duetcontrolserver=3.1.1 \ duetruntime=3.1.1 \ duetsd=1.0.6 \ duettools=3.1.1 \ duetwebcontrol=3.1.1 \ duetwebserver=3.1.0 \ reprapfirmware=3.1.1-1 \ --allow-downgrades
-
@jay_s_uk That doesn't appear to have flashed the firmware to the board. All the packages were successfully downgraded, but now the SBC can't communicate with the board anymore, I assume because the SPI protocol is different and the board firmware is still at 3.2. This is kinda what I was worried would happen. How do I downgrade the firmware on the board if I can't communicate with it anymore? Do I need to connect to it directly via USB?
-
@GoremanX said in Urgent: RRF 3.2 Messed Up Delta Auto-Calibration:
Do I need to connect to it directly via USB?
That may be the quickest option to use Bossa.
-
@GoremanX, I'm sorry you are having issues with auto calibration. Can you try running those G29 and G32 commands under RRF 3.2 in standalone mode? That will help us identify whether the issue is with RRF on the Duet or DSF on the Pi.
-
@GoremanX I just tried to reproduce this with a similar macro (six sequential
G30
calls with some messages in between) but failed to reproduce your problem. So please share yourbed.g
so we can have a look at it.In addition you may want to try out standalone mode for testing like @dc42 suggested.
To downgrade to 3.1.1, you must first install the older firmware by uploading it on the System page of DWC. Then you can downgrade like @jay_s_uk suggested.
-
@chrishamm Yeah, I totally forgot that I need to flash the firmware file on the board before I downgrade the packages. I did that last time, but downgrading isn't something I do often so I forget obvious steps sometimes.
I re-installed the 3.2 packages and can communicate with the board again. I tried to record a failed auto-calibration, but it completed successfully this time Although it DID inexplicably skip the bed mesh probing. Since it successfully started the print (which I've been trying to get going for hours), I decided to let it complete it. Once it's done, I'll do further testing (trying standalone mode, etc).
Here's the bed.g file I've been using for the last few months:
M561 ; clear any bed transform G28 ; home all towers ; Probe the bed at 6 peripheral and 3 halfway points, and perform 6-factor auto compensation ; Before running this, you should have set up your Z-probe trigger height to suit your build, in the G31 command in config.g. G30 P0 X0 Y249.9 H0 Z-99999 G30 P1 X216.42 Y124.95 H0 Z-99999 G30 P2 X216.42 Y-124.95 H0 Z-99999 G30 P3 X0 Y-249.9 H0 Z-99999 G30 P4 X-216.42 Y-124.95 H0 Z-99999 G30 P5 X-216.42 Y124.95 H0 Z-99999 G30 P6 X0 Y124.9 H0 Z-99999 G30 P7 X108.17 Y-62.45 H0 Z-99999 G30 P8 X-108.17 Y-62.45 H0 Z-99999 G30 P9 X0 Y0 H0 Z-99999 S6 ;G29 S1 ; load current bed mesh compensation map
-
Incidentally, the fact that it completed auto-calibration but then skipped bed mesh probing seems to imply that it's just skipping gcodes or somehow executing them out of order. It doesn't appear to be specifically an auto-probing issue, that's just where I'm getting hit with most of the symptoms.
-
Yesterday I upgraded my delta printer to a Duet Mini5+ and I tried to reproduce your problem but it worked every single time in at least 20 attempts and never skipped a single point. Can you please share your homedelta.g for completeness?
-
@chrishamm not sure that this is helpful at all, but here it is:
; homedelta.g ; called to home all towers on a delta printer ; G91 ; relative positioning G1 H1 X1305 Y1305 Z1305 F9000 ; move all towers to the high end stopping at the endstops (first pass) G1 H2 X-5 Y-5 Z-5 F1800 ; go down a few mm G1 H1 X10 Y10 Z10 F360 ; move all towers up once more (second pass) G1 Z-5 F6000 ; move down a few mm so that the nozzle can be centred G90 ; absolute positioning ;G1 X0 Y0 F6000 ; move X+Y to the centre
The issue with auto-calibration not completing successfully just kinda went away. The skipping of mesh bed probing continued to happen randomly, I'd say it was skipped about 50% of the time. This wasn't something that prevented me from using the printer, so I just lived with it rather than spend time trying to diagnose it.
Another thing I noticed recently; occasional "stutters" while printing. Not sure what else to call them. Every once in a while, for a fraction of a second, the print head would pause, then continue on its way. This only seemed to happen at direction changes and it was very occasional.
Today I finally had time to switch from SBC to standalone mode. This wasn't trivial. My printer was designed from the ground up to run in SBC mode.
Finally got it running.
First test: the calibration and bed mesh probing that precedes every print job. It's part of my printer profile in the slicer. I tried it a dozen times. It never failed. Worked exactly the way it's supposed to every time. Never skipped a point, never skipped bed mesh probing, everything was rock solid reliable. If anything, the entire process seemed snappier than usual, though that's hard to quantify.
Second test: the "stuttering". After repeating the first test a dozen or so times, I let the print job proceed. The stuttering has been more obvious during the initial skirt since those tend to be long, smooth lines (I have it set to 6 perimeter lines 5mm from the print). This time, no stuttering. Everything ran smoothly from line to line. I've been looking at it go for 15 minutes now, and it hasn't stuttered once. That's unheard of in the last couple weeks.
The final print result even seems a bit better. I did the exact same print (same gcode file) immediately before and after switching to standalone mode. The layer lines on the outside walls are more even, and there's fewer blemishes.
Just to be clear, the only difference here is switching from SBC mode to standalone. All of this happened on the exact same day with the exact same setup with the exact same ambient temperature. There was a 2 hour period during switchover to standalone mode.
In SBC mode, I do run a slightly longer ribbon cable than what's provided with the Duet 3. It's about 80mm longer. This was never an issue with firmware 3.1.1, the problems only came about with the switch to version 3.2 (and before that 3.2 beta 3 or so).
-
In think the 80cm cable is the Problem. SPI Signals are designed to be on PCBs only. A cable is aways a compromise.
-
@PCR I didn't say 80cm. That would be insane. I said 80mm. That's less than 50% longer than the original 200mm cable, and is well within the accepted range for an SPI interface. Also, it worked fine with firmware 3.1.1. There were SPI bus changes made for 3.2, supposedly speed improvements. I'm thinking those changes are the cause of the issues.
For reference, I used to run a 500mm ribbon cable between the Pi 4 and Duet 3. That caused occasional connection issues which would mess up auto-calibration. Except the printer would just stop and report a failure rather than continue with missing probing points. Shortening the cable to 400mm fixed all that. In a later configuration change, I further shortened the cable to 280mm and that's how it ran without issues for months and months on firmware 3.1.1. It didnt give me any trouble until I tried 3.2 beta
-
In 3.2 rrf is using a new CRC setting. That could be why.
Btw sorry for the 80cm early here
-
@GoremanX Probably worth looking at your /var/log/syslog file after running an SBC test. See if there are any messages in there from the Duet Control Server, in particular look for CRC errors and other warnings.
-
@gloomyandy I'm not sure I if I want to go back to SBC mode. Switching back and forth is a hassle, and if the issue is with the ribbon cable, then I can't fix it. I can't shorten the cable any more than it is. Using the Pi 4 in bridge mode is way more convenient. I can use a network cable of any length with no limits. The Pi can be out in the open, getting great wifi signal, instead of stuck near the Duet 3 under the printer. Everything is running way more reliably, I don't need to worry about future SPI changes breaking things. About the only drawback I'm coming across is that network gcode uploads to the Duet 3 are slower, but that's not much of an issue.
I'm starting to wonder about the whole SBC/SPI thing, to be honest. It's a neat idea, but right now, it solves nothing and only adds problems. There's literally nothing I can do in SBC mode that I can't do (in some cases better) in standalone mode. This nasty surprise of a simple firmware update breaking things was not cool.
-
@GoremanX If you put the SBC back in, please change the log level in /opt/dsf/conf/config.json from info to debug so we can get a better idea about what happens when G30/G29 are skipped in your case. I've been trying to reproduce your problem with my own delta printer but I haven't had any luck yet. So without a log excerpt or a reliable way to reproduce it this is rather difficult to fix.
You can use M122 at any time to check if it's actually the cable length and/or transfer speed. If there are problems with the SPI line, you can see failed transfers in the diagnostics output.
Can you please post your current config.g? That's the only thing I haven't checked yet.
-
@chrishamm I'll switch it back to SBC later today just to test it out and check those log files. But then I'm re-doing the entire setup to run in standalone mode permanently. Running that way allows me to put the Pi in the monitor housing like I originally intended to, and only run one power cable and one network cable to a single location rather than 2 power cables (Pi + touchscreen), an SPI ribbon cable and an extra long monitor ribbon cable to 2 separate devices.
-
@chrishamm oops, sorry, forgot to include my config.g:
; Configuration file for Duet 3 (firmware version 3) ; executed by the firmware on start-up ; Power On M80 ; Turn on secondary power supply (24v) ; General preferences G90 ; send absolute coordinates... M83 ; ...but relative extruder moves M665 L600:600:600 R306 H570 B280 ; Set basic delta radius, diagonal rod length, printable radius and homed height, refined through config-override.g and auto-calibration before each print ; Drives M569 P0.0 S1 V100 ; physical drive 0.0 goes forwards M569 P0.1 S0 V46 ; physical drive 0.1 goes backwards M569 P0.2 S0 V46 ; physical drive 0.2 goes backwards M569 P0.3 S0 V46 ; physical drive 0.3 goes backwards M584 E0.0 X0.1 Y0.2 Z0.3 ; set drive mapping M350 X16 Y16 Z16 I1 ; configure microstepping with interpolation for tower steppers M350 E16 I1 ; configure microstepping with interpolation for extruder stepper M92 X160.00 Y160.00 Z160.00 E685 ; set steps per mm M566 X1800.00 Y1800.00 Z1800.00 E1200 ; set maximum instantaneous speed changes (mm/min) M203 X12000.00 Y12000.00 Z12000.00 E3600 ; set maximum speeds (mm/min) M201 X1500.00 Y1500.00 Z1500.00 E1500 ; set accelerations (mm/s^2) M906 X1800 Y1800 Z1800 E600 I30 ; set motor currents (mA) and motor idle factor in per cent M84 S30 ; Set idle timeout ; Axis Limits M208 Z-1 S1 ; set minimum Z ; Endstops M574 X2 S1 P"io1.in" ; configure active-high endstop for high end on X via pin P M574 Y2 S1 P"io2.in" ; configure active-high endstop for high end on Y via pin P M574 Z2 S1 P"io3.in" ; configure active-high endstop for high end on Z via pin P ; Z-Probe M558 P8 R0.2 C"io0.in+io0.out" H5 F900 T9000 ; set Z probe type to effector and the dive height + speeds G31 P100 X0 Y0 Z-0.10 ; set Z probe trigger value, offset and trigger height M557 R220 S80 ; define mesh grid ; Heaters M308 S2 P"temp2" Y"thermistor" T100000 B4138 A"Bed Heater" ; configure sensor S as thermistor on pin P M950 H0 C"out2" T2 ; create bed heater output on C and map it to sensor T M307 H0 B1 ; define heater H M140 H0 ; map heated bed to heater H M143 H0 S150 A0 ; set temperature limit S for bed M308 S1 P"temp1" Y"thermistor" T500000 B4723 C1.196220e-7 A"Hotend Heater" ; configure sensor S as thermistor on pin P M950 H1 C"out1" T1 ; create nozzle heater output on C and map it to sensor T M307 H1 B0 ; define heater H M143 H1 S500 A0 ; set temperature limit to S for nozzle M308 S0 P"temp0" Y"thermistor" T100000 B4138 A"Chamber Monitor" ; configure sensor S as thermistor on pin P M308 S3 P"spi.cs0" Y"thermocouple-max31856" A"Chamber Heaters" T"k" F60 ; configure sensor S as type Y on pin P (2 connected in parallel) M950 H2 C"out3" T3 ; create chamber heater output on C and map it to sensor T M307 H2 B1 ; define heater H M141 H2 ; map chamber to heater H M143 H2 S500 A0 ; set temperature limit for chamber to S M308 S4 Y"drivers" A"Drivers" ; configure sensor S as temperature warning and overheat flags on the TMC2660 on Duet M308 S5 Y"mcu-temp" A"MCU" ; configure sensor S as MCU temperature monitor ; Fans M950 F0 C"out5" Q500 ; create fan F on pin C and set its frequency Q M106 P0 S0 H-1 C"Effector Output" ; set fan P value. Thermostatic control is disabled M950 F1 C"out7" Q500 ; create fan F on pin C and set its frequency Q M106 P1 H1 T45 C"Hotend" ; set fan P value. Thermostatic control enabled at temp T M950 F2 C"!out4" Q2500 ; create fan F on pin C (!inverted) and set its frequency Q M106 P2 H4:5 L0.50 X1 B0.3 T35:50 C"Electronics" ; set fan P value. Thermostatic control enabled at temp T, lowest setting is L M950 F3 C"out6" ; create fan F on pin C M106 P3 H1 T45 L255 C"Fume Filter" ; set fan P value. Thermostatic control enabled at temp T, lowest setting is L M950 F4 C"out8" Q500 ; create fan F on pin C and set its frequency Q M106 P4 S0 H-1 L0.35 C"Air Pump" ; set fan P value. Thermostatic control is disabled, lowest setting is L ; Tools M563 P0 D0 H1 F4 S"Hotend" ; define tool P, associated with drive D, heater H and fan F, name it S G10 P0 X0 Y0 Z0 R0 S0 ; set tool P axis offsets and set active and standby temperatures to S M591 D0 P3 C"io4.in" S1 R20:190 L25.25 E3.0 A1 ; configure magnetic filament monitor for tool D on pin C ; Custom printing settings M207 S1.2 F3600 ; firmware retraction M376 H10 ; taper off bed compensation by Hmm height M572 D0 S0.03 ; pressure advance ; Miscellaneous M550 P"KosselExtreme" ; set hostname M552 S0 ; reset networking M552 S1 P0.0.0.0 ; enable networking and get IP address through DHCP M501 ; load saved parameters from non-volatile memory T0 ; select first tool
Note that I don't have M550 or M552 in there when running in SBC mode. Beyond that, the config was identical between the two setups.
-
@chrishamm What am I looking for in M122? Failed transfers? Send timeouts?
-
I switched back to SBC mode and ran the same gcode file repeatedly until it skipped bed mesh probing. My test procedure was basically: start print job, watch it do auto-calibration, wait to see if it skips bed mesh probing, then cancel and restart the same job. It successfully ran bed mesh probing 5 out of 8 times. I never reset the printer between runs, just paused/cancelled/restarted. After the 8 tries, I checked M122 and this was the result for the SBC section:
=== SBC interface === State: 4, failed transfers: 0 Last transfer: 3ms ago RX/TX seq numbers: 25888/25888 SPI underruns 0, overruns 0 Number of disconnects: 0, IAP RAM available 0x2c8a8 Buffer RX/TX: 72/1368-0
Not sure how illuminating that is. I took a video of a successful job and one where bed mesh probing was skipped. I can add these shortly, though again, I'm not sure how helpful that will be. Next I'll try downgrading to 3.1.1 to run the same test.