Poor print quality with RRF3 - especially 3.2.2.

deckingman

@Phaedrux said in Poor print quality with RRF3 - especially 3.2.2.:

Sorry to hear about that head crash and damage. Hopefully the data collected proves useful.

Yes, fingers crossed ......

deckingman

Having slept on it, I realise that I missed something out. Before installing the "Dropbox binaries" I did a test print with pressure advance enabled to see if I still had the same problems with it. I did not - the print looked good. I only did the first couple of layers then aborted it.

But I left it enabled. So the last test print which had the catastrophic failure with what looks like a large random XY excursion, was done with PA so the settings were not identical to all the other prints.

Apologies for forgetting to mention that. I've no idea if it's significant or not, but it likely explains why the print was looking so good before the failure.

o_lampe

@deckingman

Wondering how you disable PA?

I asked myself if there is a difference beween

;M572 D0 S0.05	; using the semikolon
or
M572 D0 S0.0     ;  setting S value to 0.0

IF there is something wrong with PA, would it still be there when setting S0.0?
I guess the FW would still run the PA-routines, but with zero length value. @dc42 ?

deckingman

@o_lampe said in Poor print quality with RRF3 - especially 3.2.2.:

@deckingman

Wondering how you disable PA?

I asked myself if there is a difference beween
;M572 D0 S0.05	; using the semikolon
or
M572 D0 S0.0     ;  setting S value to 0.0
IF there is something wrong with PA, would it still be there when setting S0.0?
I guess the FW would still run the PA-routines, but with zero length value. @dc42 ?

I simply commented it out using a semi colon (and removed the semi colon to re-enable it).

oliof

@o_lampe said in Poor print quality with RRF3 - especially 3.2.2.:

I guess the FW would still run the PA-routines, but with zero length value. @dc42 ?

I'm no dc42 but I can use code search on github, and it looks like PA is set to 0.0 by default (also here) if it's not configured differently by issuing M572. So by my reading, not setting it or setting it to 0.0 yields exactly the same configuration.

o_lampe

@oliof Which means, if there's a bug in PA-routine, it wouldn't help outcommenting it?

oliof

@o_lampe still looking to see whether PA routine is skipped if value is set to 0.0 or not. I am not that familiar with the RRF code yet, as I have mainly messed around in contained places (Kinematics) so far.

dc42

@deckingman, I'm very sorry to hear that your machine was damaged. I'm glad the damage isn't too severe.

You reported that the expansion board firmware version didn't appear to have changed, so I checked that first. When I attempted to upload those same firmware binaries to my test system, M115 reported a different firmware version from yours. So I checked what version it should have reported, and that was different again: Duet EXP3HC firmware version 3.3beta1+1 (2021-03-02 15:56:53)".

It turned out that although DWC uploads firmware binaries to /firmware, RRF was still fetching binaries from /sys when upgrading expansion board firmware. So the 3.3beta1 expansion board firmware binaries in /sys were being re-installed instead of the newly-uploaded ones. That explains the version number not changing. I will of course fix this in the next beta.

However, the changes between 3.3beta1 and the later expansion board firmware are minor. I have just gone through the commit logs to verify that there have been no critical changes. In particular, the CAN protocols have not changed. So this doesn't explain why your machine crashed.

I next turned to the M122 logs that you posted. Thanks for having the presence of mind to pause the print and take a set of M122 readings before resetting. I was rather expecting to find that board 3 had reset, which would explain some missing X moves. However, in the "Console dump after pause" log, the last reset times of the boards read as follows (ignoring the second M122 B3 after you cancelled the print):

0: Last reset 01:19:09 ago, cause: power up/Last software reset at 2021-03-07 11:22, reason: User
1: Last reset 01:19:18 ago, cause: power up
2: Last reset 01:19:25 ago, cause: power up
3: Last reset 01:19:32 ago, cause: power up

So all four boards were reset at the same time.

Looking in more detail at those logs, I found a couple of interesting parts:

The M122 B3 report when paused shows the following:

Driver 0: position -1182960, 1600.0 steps/mm, standstill, reads 45627, writes 0 timeouts 0, SG min/max 14/326, steps req 4320 done 4320
Driver 1: position 3200, 80.0 steps/mm, ok, reads 45627, writes 0 timeouts 0, SG min/max 0/1019, steps req 1418006 done 1419078
Driver 2: position -145076, 80.0 steps/mm, ok, reads 45626, writes 0 timeouts 0, SG min/max 0/373, steps req 1426723 done 1427794
Moves scheduled 16699, completed 16698, in progress 1, hiccups 0, step errors 0, maxPrep 85, maxOverdue 5, maxInc 2, mcErrs 0, gcmErrs 0

It's reporting 1 move in progress, yet the number of steps done on drivers 1 and 2 is greater than the number of steps requested. However, I think this may be caused by running the previous M122 when the printer was not paused or otherwise idle, so that steps from moves that were in the queue when the previous M122 was run have been included in the steps-done count.

The M122 report for the main board shows this:

Tasks: NETWORK(ready,224) ETHERNET(notifyWait,124) HEAT(delaying,284) CanReceiv(notifyWait,795) CanSender(notifyWait,359) CanClock(delaying,349) TMC(notifyWait,18) MAIN(running,924) IDLE(ready,20)

The numbers are the remaining stack space. I have never seen the TMC stack space go as low as that. It sometimes happens that the actual stack space used is greater than the reported amount due to the compiler allocating stack but not using the bit that is monitored. So it's possible that the TMC task stack is overflowing. There is no other evidence to suggest this, however I will increase the stack size as a precaution.

There are no indications in the reports of lost CAN messages: no send timeouts in the main board M122, and no 'oos' count in the M122 B3.

I plan to proceed as follows:

Review the changes between the beta1 main board firmware and the version I provided on Dropbox, and the CAN transmit fifo driver that the new firmware uses.
I already planned to add a 3HC board to my tool changer and use it to drive the X and Y axes, so that I have a machine (not just a bench setup) that uses a 3HC to drive axes. I will do that and then try your print.

Three questions for you:

Is the amount of X shift in the photo you posted consistent with the amount of shift being the length of the box it was printing? Or so you think it may have been more?
Have you already shared that GCode file, and if so, where?
You said that you were unable to download the console and you had to copy-and-paste it. Do you mean that you tried to click on the list icon at the top right of the console (to get the "Download as text" option), but it didn't respond?

dc42

@oliof said in Poor print quality with RRF3 - especially 3.2.2.:

@o_lampe still looking to see whether PA routine is skipped if value is set to 0.0 or not. I am not that familiar with the RRF code yet, as I have mainly messed around in contained places (Kinematics) so far.

RRF does not distinguish between PA never having been set, and being set to 0.0.

deckingman

@dc42 said in Poor print quality with RRF3 - especially 3.2.2.:

Three questions for you:

Is the amount of X shift in the photo you posted consistent with the amount of shift being the length of the box it was printing? Or so you think it may have been more?

Have you already shared that GCode file, and if so, where?

You said that you were unable to download the console and you had to copy-and-paste it. Do you mean that you tried to click on the list icon at the top right of the console (to get the "Download as text" option), but it didn't respond?

#1 The print is actually a box plus lid. The lid is only 3mm thick (tall) and an unremarkable rectangle, so I haven't included any pics of that before. But at the point of failure, the machine had long finished the lid and was printing just the box. That box is a tad over 70mm wide in X. From the witness mark on the bed, I can see that the right hand edge of that box is about 110mm from the right hand edge of the build plate. So for the head to crash into the frame and try to repeatedly beat the hell out of it, means that there would have to have been a shift of >110 mm - significantly larger than the width of the part it was printing.

#2. No I hadn't shared the file. I have now uploaded it to the folder on Google drive that I last linked too (the one that has the console dump file). In fact, I've uploaded the original version as sliced plus the version which has UVAB moves added. The latter is distinguishable by the "UVAB" suffix which is added to the end of the file name - that's the one I've been using but unless you can simulate a CoreXYUVAB, it won't be much use (hence the inclusion of the original).

Note. You'll need to remove the call to the "pre-print" macro. You'll then have to heat the bed and nozzle and send a "T0" before attempting to print or simulate that file (as well as home the printer). I could post the "pre-print" macro but it calls other macros to set the tool temperatures, home the printer, purge and wipe the nozzle etc, so it all gets complicated and it'll just be easier to manually heat the bed and hot end.

#3. That's correct. In all other instances, I've simply selected the "download as text" option but after pressing "pause", that didn't work. Firefox shows me when a file is being downloaded but that didn't happen - there was no indication on Firefox that at download was happening. And no additional files appeared in the list of downloads (only those console.text files that were downloaded prior to the crash).

I can't remember if I pressed cancel before or after attempting to download the console text. I think, I pressed pause, tried to download, then pressed cancel and tried to download again, but I can't be 100% sure of that. I've uploaded both the pause.g and cancel.g files to the same place. I can't see anything in either of those that would have prevented downloads from working but you might spot something I've missed.

deckingman

For anyone watching this thread, and for those who have contributed, I just want to say that the Duet team and I have opened up the communication medium that we used at the very start (when Gen 3 was still at the pre-production stage), in order to work together to resolve these issues. That's nothing personal - just that these forums are maybe not the best way to post messages rapidly back and forth between us.

I thought it important to state that fact, in case anyone got the impression that I had been abandoned by the Duet team - that isn't the case and we are working together to try and get to the bottom of wtf is happening.

jens55

Thank you for letting us know.

Luke'sLaboratory

@deckingman

Keep at it and let us know what you find out.

I'm likely affected by many of the issues that you're discussing - also have a large machine with many extruders and i'd love to continue using it

@dc42 keep it up as well! I'm happy and willing to give some tests as well, but I won't have the time to dedicate to all of the detailed troubleshooting that deckingman is doing.

Luke

hackinistrator

@dc42
why are there different driver reports for main board and 3hc . for example main board does not show steps /mm and step req / done .

@deckingman sorry to see your machine damaged , at least its nothing serious . did you consider using some king of fail safe between the carriages ? for example connect a wire that is shorter then rest of the stuff that will disconnect earlier in case of misalignment and stop the machine .

i looked at your gcode , xy and ab positions should be the same right ?
are you using different step/mm for xy and ab ?

i cannot see any relation between the xy driver position and ab position on the main board . maybe this can explain something .

deckingman

@hackinistrator Forget about the AB gantry. That's a force cancelling/load balancing gantry that is completely separate to the XY and UV gantries. It simply carries a container with lumps of lead. It's just another CoreXY mechanism that sits at the very top of the machine and is in no way connected to the hot end or the extruders. The motor directions are reversed so it does the exact opposite of the XY gantry to stop the printer rocking around when I throw my heavy hot end around at high speed. It does nothing for print quality and the only reason it's still there is that I can't be bothered to remove it.

The XY gantry carries the hot end. The UV gantry carries the 6 extruders and two of the expansion boards. The extruders are connected to the hot end with short Bowden tubes. All the wires for the heater, fans, temp sensor, end stop etc for the hot end are connected to the expansion boards that are fitted to the UV gantry above. The UV gantry is the one that "follows" the XY gantry with an "envelope" that is +- 20mm of the XY position.

I've never before seen a wild excursion of the XY gantry that was not mimicked by the UV gantry. Nor did I foresee that possibility.

fcwilt

@deckingman said in Poor print quality with RRF3 - especially 3.2.2.:

That's a force cancelling/load balancing gantry that is completely separate to the XY and UV gantries. It simply carries a container with lumps of lead. It's just another CoreXY mechanism that sits at the very top of the machine and is in no way connected to the hot end or the extruders. The motor directions are reversed so it does the exact opposite of the XY gantry to stop the printer rocking around when I throw my heavy hot end around at high speed.

That is most interesting. Where did you hear about that idea or was it something you devised on your own?

Thanks.

Frederick

hackinistrator

Forget about the AB gantry. That's a force cancelling/load balancing gantry that is completely separate to the XY and UV gantries.

i think in your case its important .
x is related to a , and y is related to b .

so i would expect these relations to stay the same during print . your reports shows otherwise (at least i think so)

deckingman

@fcwilt It was my own idea - another world first that has since been copied. I did a write up of it on my blog. I'm using my phone right now so can't easily provide a link but there is a link to my blog in my signature.

deckingman

@hackinistrator I can't say that I noticed, but yes A should always be the same as X and B should always be the same as Y. The X and Y motors are on expansion board 3 and the A and B motors are on the main board. That might be a clue.....

o_lampe

@deckingman
about the XYUVAB axis':
I read in the Wiki that you have to use definitions of the extra axis in a certain order. E.g. you can't define V-axis without having a U-axis. If you still do, the FW automatically defines a virtual axis.
I believe, that you have to define UVW before using ABC or you have an unknown W-axis in your system, which might do spooky things.
Your printer is running for a long time with this axis', but maybe a newer FW treats theses ghosts different?