Where is the difference - 10 times X1 vs 1 times X10

Danal

Just for a super quick test, I ran the following on a CoreXY:

G1 X10 Y10 F10000
G91
G1 X10
... repeat enough times to move 300...
G90 
G1 Y20
G1 X10

So 30 G1 X10 moves relative moves for 300mm, then a sidestep to force two 90 degree corners on the planner, and then a single move 300mm back. All at F10000 , roughly 167mm/sec.

Listening to this, each move sounds IDENTICAL.

Then ran:

G1 X10 Y10 F10000
G91
G1 X1
... repeat enough times to move 300...
G90 
G1 Y20
G1 X10

That is 300 individual X1 moves.

This was interesting. Started the same, then sagged, badly, then dipped even more, then came back up just a little and finished with less change, but VERY clearly slowed down, maybe half, from the single long moves or the X10 moves.

Not scientific, but VERY clear.

Duet 3 + Pi, running RC8 and DSF 2.0.0

arhi

@Danal I would call that a bug ... I'm some ~12 hours to finish the print and will check myself on the cartesian

I'm not familiar with RRF source yet but for e.g. if you look how smoothieware (the one I am kinda familiar with) does it if the angle between the lines is less than ~1 degree ( cos(theta) +-0.95 ) it will not at all calculate junction deviation, no deceleration, it will go through the junction full speed; If the angle is larger than ~1 degree then corner speed is approx
sqrt( acceleration * junction_deviation * sin_theta / (1.0 - sin_theta))

there used to be a bug, don't remember if in this calculation or another one where the calculation was not done only if the angle was zero but floating point is ugly about zero so this was changed to be larger value

(EDIT: I see they changed this to 0.9999 cos(theta) some years ago )

ChrisP

@arhi said in Where is the difference - 10 times X1 vs 1 times X10:

@ChrisP that example firmware should execute exactly the same move (10x X1 vs 1x X100), no "replacement" need to take place, that should just be normal behavior of the firmware.

Yup, we agree then.

@Danal Interesting test. And agree that is a bug. Are you able to try it standalone without the Pi? It's also be interesting to try at slower feedrates to see it there's a limit.

deckingman

Out of curiosity, I can't imagine a scenario where a slicer would generate multiple segmented moves for what would be a single, longer move. So what are the circumstances where such multiple, short, single axis, segmented moves would need to be accomplished? Does CAD software generate such moves and if so why?

dc42

It's a question of throughput. A lot of work has to be done to process each of those G1 X1 commands, and there is a limit to the rate at which the Duet+firmware can read and process a long run of them and stitch them together.

There is also a limit to the length of the movement queue, and all the time the firmware has to allow for the possibility that it may have to decelerate to zero speed when it reaches the end of the queue, because there may be no more moves. On Duet 3 the movement queue is 60 moves long. So with 1mm long moves, the speed won't exceed sqrt(2 * 60 * A) where A is the configured X acceleration.

DaBit

I am not aware of CAM-software that generates such moves. However, just like slicers they might generate short line segment code. Deskproto does that, Fusion360 does that too in some circumstances (user error, that is). LinuxCNC has a special path following setting called 'the naive CAM detector' to cope with that.

I like LinuxCNC's method; they replace part of adjacent line segments with a tangent arc. Allows nice and smooth fullspeed traversal of splines chopped into many short segments without violating acceleration constraints.

DocTrucker

The slicer is about generating paths. In my opinion the machine should deal with as much machine specific stuff as possible. I would say gold standard is to be able to take a build file and run it on any machine. This of course means slicing and path generation on the machine which would of course allow much better control over segmentation.

This is a good use case for a python parsing script running on the single board computer as there are also issues from making the vectors/segmentation too large. This is a machine level problem that needs to be delt with as close to the coal face as possible, as that is where the information on instant speed, junction deviation, acceleration or intergrals/derivatives thereof reside.

@deckingman many perfectly straight vectors are unlikely but I've seen plenty of instances where contours around a curved feature have been broken down into hundreds, if not thousands of submicron length vectors with only a tiny angle between them.

dc42

@DocTrucker said in Where is the difference - 10 times X1 vs 1 times X10:

@deckingman many perfectly straight vectors are unlikely but I've seen plenty of instances where contours around a curved feature have been broken down into hundreds, if not thousands of submicron length vectors with only a tiny angle between them.

Most slicers are aware of the limitations of GCode throughput, which is of course worse on older electronics and much worse when printing over USB on older electronics; so they have a minimum length of output segment. If you produce a curved object (e.g. cylinder) with very tiny segments in the STL file, the slicer will attempt to combine segments until the minimum segment length is reached.

If users hit the GCode throughput limit on real prints, then I'm willing to look at improving it.

DocTrucker

@dc42 that's logical, you need not fix a 'problem' until it presents itself in a real world example. I had plenty of cases presented to me that were obscure and twisted scenarios to demonstrate 'a serious problem' which were nothing of the sort. More a weakness under certain circumstances that operators needed to be aware of until the point where all the bigger issues had been resolved and the less frequent issues could be tackled.

I do think this sort of problem is best dealt with on the computer - rather than the controller - as there is no need for this to be done real time.

gloomyandy

I wonder how klipper would handle this? In theory it should be able to use a much longer gcode queue. Similarly I suppose the the SBC version of RRF could in theory pre-process the gcode inside of dsf to merge the line segments. Whether it is worth it or not is of course another question.

On a related note, is there a description anywhere of what processing of gcode is performed by dsf (if any)?

dc42

On Duet 3 we have enough RAM to use a longer queue too. However, we limit the number of moves in the queue to 2 seconds of moves + 1 move, to prevent pauses being delayed too much in the event that we can't schedule a pause between moves already in the queue.

arhi

@dc42 said in Where is the difference - 10 times X1 vs 1 times X10:

If users hit the GCode throughput limit on real prints, then I'm willing to look at improving it.

I tested some rather "bad" g-code generated by s3d from "too precise" stl that would kill octoprint+marlin combo (stutters, blobs, crazy bad print quality), and sometimes even marlin from sd card without octoprint and duet ate it without a problem printed it perfectly (duet2eth, 3.01RC1) so I don't think RRF is anywhere close to the problem here

arhi

@gloomyandy TearTime for e.g. does all the calculation directly in slicer and sends "precompiled" code to the firmware so firmware only execute the stepping. Files are huge (file looks like data klipper sends to the stepping boards) but the approach has it's benefits. Major problem is that their slicer is POS and that format is closed, but the system does work rather good.

dc42

@arhi said in Where is the difference - 10 times X1 vs 1 times X10:

@dc42 said in Where is the difference - 10 times X1 vs 1 times X10:

If users hit the GCode throughput limit on real prints, then I'm willing to look at improving it.

I tested some rather "bad" g-code generated by s3d from "too precise" stl that would kill octoprint+marlin combo (stutters, blobs, crazy bad print quality), and sometimes even marlin from sd card without octoprint and duet ate it without a problem printed it perfectly (duet2eth, 3.01RC1) so I don't think RRF is anywhere close to the problem here

FWIW, Smoothieware also had a problem with short segments generated by S3D a few years ago. Duet/RRF ran OK on the same GCode files. For a long time the Smoothieware devs blamed S3D, which was reasonable except that it didn't help users. Eventually they put some sort of fix or workaround in Smoothieware.

droftarts

@arhi said in Where is the difference - 10 times X1 vs 1 times X10:

@gloomyandy TearTime for e.g. does all the calculation directly in slicer and sends "precompiled" code to the firmware so firmware only execute the stepping. Files are huge (file looks like data klipper sends to the stepping boards) but the approach has it's benefits. Major problem is that their slicer is POS and that format is closed, but the system does work rather good.

Some 3DSystems printers do (or, at least, did) that, too. Leveraged the host CPU for the complex stuff, sent individual step commands in what was basically a big spreadsheet to a dumb microcontroller on board. Then the board only has to run the 'spreadsheet', and monitor temperatures and any other inputs.

Ian

arhi

@dc42 yes, s3d 3.0 and earlier were very bad ... and smoothieware didn't know how to handle those ... now both do better, s3d from 3.1 do not generate code as bad as 3.0 and before and smoothieware fixed the issue that they had so they can parse way more codes/sec compared to earlier. IIRC that's also when that "do not calculate junction if angle less than.." came to be

DocTrucker

I did the same with the control system on the MCP/MTT/Renishaw machine. The computer read the whole build file and parsed it into exposure points a controlled distance apart. These were then sent to the optics system as single slice files. Yeah, some where vastly larger than the source data but it also cleaned up small vector issues which the real time controllers really struggled with.

But this did make alsorts of things very easy, such as part suppression, moving and offsetting parts or slice data, changing processing parameters, and reloading build files mid print for more serious changes.

arhi

@droftarts said in Where is the difference - 10 times X1 vs 1 times X10:

Some 3DSystems printers do (or, at least, did) that, too. Leveraged the host CPU for the complex stuff, sent individual step commands in what was basically a big spreadsheet to a dumb microcontroller on board. Then the board only has to run the 'spreadsheet', and monitor temperatures and any other inputs.

that's the dudes that purchased bitsfrombytes? the first 32bit (pic32mx based) electronics in reprap/repstrap world

When I first came in contact with TT machines (UP Plus 2) I did some research as they were in some things age in front of the reprap community and what I found is that they used the same approach on these small "home" printers as professional 500+k machines are using, and found that most of those huge machines do exactly that - just execute the "spreadsheet" as you call it, and that everything is done in the "slicer". On the other hand, UP Plus 2 uses ncp 32bit arm to execute that spreadsheet while most of home 3d printers try to do everything from parsing to planing to executing with 8bit atmega

droftarts

@arhi It makes sense when you've got a dedicated PC connected to the printer (like in the CNC world), but less sense when you've got a general purpose PC doing it. Because if you're doing all the computation up front and creating a huge file, you've either got to stream the data to the microcontroller (so the PC shouldn't really be used for other things to avoid hiccups, which ties up a potentially expensive PC), or your microcontroller gets more complicated as you have to add things like storage and ways of listing what's on the storage. Or you get a second PC to handle the data streaming. Or you get a smarter microcontroller. All options have their merits and deficiencies!

Ian

DocTrucker

@droftarts haven't we essentially got what we need already with the Duet 3 and single board computer? The Raspberry pi is likely more powerful than the Windows XP 32 bit system I was using.

This would be easy enough to write in a way that the cleaned gcode could be recomipiled back to a complete gcode file modifications and all. This would mean the software could seperately preparse data offline, making it usefull to all generations of duet and duet 3 without single board computers. ...also appeasing those who get twitchy at not seeng all the gvode prior to run.