Can we have a revised release process?

gnydick

@dc42 well, in general terms, figuring out input/output pairs and how to evaluate the output against expected results.

I can take a stab at some guesses...

For axis movement, there could be encoders connected to a belt to confirm acceleration, velocity, jerk, etc.

For current and voltage regulation, probes connected to those outputs.

Run print routines through probes by replacing motors with probes.

I actually have a friend who is brilliant at hardware, I can ask him if he has some ideas.

But basically, control is easy since there are network interfaces for input. Comparing output to expected results is easy, that's just some software. The only link in the chain as far as I see it that's missing is data acquisition. We could easily setup an automated build, deploy, test loop.

elmoret

The problem is that it isn't really like simple software where one can write test cases. Sometimes bugs come up from very obscure printer setups or G-code. I'm really not sure automating testing is practical here.

Basically what you're asking for would be nice, but probably not practical at the price point and volume Duet sells at. There's just not enough money to add the staff to accomplish what you are describing, so a system of RCs with effectively "beta testers" is used. There's a reason "real" motion controllers with similar features to Duet are an order of magnitude more expensive.

As for solving your predicament, why not just wait until a stable release (as in not a RC) comes out?

gnydick

@elmoret I don't have a choice at this point.

But I don't agree that it's not feasible to automate this testing. I worked in a confidential hardware lab doing hardware test automation. Frame rates, touch screens, game consoles, set top boxes; managing these things from output quality, all the way to managing the devices when they won't reboot and need a remote physical recycling.

The obscure setups is one thing, yes, that is hard to get coverage on, but everything else is doable.

First of all, we can automate the sending of g-code to the box, that's trivial. We can also get 100% coverage of all g-code permutations. It's as simple as developing the use cases for all of the options and a simple script that iterates over them, generating every possible permutation. I've done this before, so I'm speaking from experience.

Assuming there is a measurable response somewhere to receiving g-code, then that closes the loop on g-code testing.

Whether or not the hardware actually does the right thing, that's a separate thread to discuss, but at least we can cover things like "having to send G29 twice because the first one isn't honored." That was a real bug.

Also, i'm guessing there is software somewhere that can simulate PCBs with certain chips on board. That is really, really not my area of expertise, so that might be prohibitively expensive or complicated to set up. But I thought I'd throw it out there in case anybody knows.

So, the only assumption we need to be true is that the device will output some response to g-code that we can measure. If that's the case, we certainly can build a CI/CD system quite easily.

bot

@gnydick I'm sure the cost of your elaborate testing scheme would far outweigh any benefits. I would wager that even the best testing scheme you could devise would both miss important bugs, and provide false positives far too often.

Danal

@gnydick said in Can we have a revised release process?:

@dc42 well, in general terms, figuring out input/output pairs and how to evaluate the output against expected results.
...snip...
But basically, control is easy since there are network interfaces for input. Comparing output to expected results is easy, that's just some software. The only link in the chain as far as I see it that's missing is data acquisition. We could easily setup an automated build, deploy, test loop.

Which would work great for the specific configuration of the "test harness" proposed. And absolutely miss bugs encountered in other kinematics and/or modes.

Danal

I've been a Duet customer for about two years. Every release I've downloaded, "Release" or "Release Candidate", has fundamentally worked on the Delta/Kossel printers where I have Duets. "Fundamentally" means I could print, and get high quality prints. That tells me those releases would have passed muster on a "Delta Kinematic" test harness. Physical proof that, during the period wherein I personally have facts at hand, such a test harness would have been pointless.

There have been bugs, subtle ones, in some of those releases, from the things that I've read, and/or the "fixes" in the next release.

But... again... I've ALWAYS been able to print... and if there was something I "did not find acceptable" in a release, then the "prior" release was less than five minutes away. To be clear, I never regressed... but I always could have...

In short: The existing release process has worked for me. Therefore, I have a very blunt question for @gnydick: Have you actually encountered an issue, in a release marked 'stable' (not an RC) that caused you to choose to regress to a prior release?

If yes, I'm curious what release, and what caused you to make that decision.
If no, then this is a tempest in a teapot.

elmoret

@gnydick Sorry if I wasn't clear. Yes, it would be possible (though really time consuming) to do what you propose. But it isn't practical within the financial/logistical confines of the Duet project.

If it helps it sounds like we have similar experience, I have done consulting work involving automated testing/writing of test cases for electronics running firmware, much like the Duet. Writing all that was about 100 man-hours of work, and I'd say the Duet is roughly 2 orders of magnitude more complicated than the device I was working on. 10000 man-hours of this type of work would cost $1M. That's no problem for National Instruments or Galil Motion Control, but its probably not tenable for Duet.

Even if ignoring my estimate, most sources (for example: https://stackoverflow.com/questions/174880/ratio-of-time-spent-on-coding-versus-unit-testing) estimate equal time devoted to development and testing. RepRapFirmware has been dc42's main focus for several years now, which comes to a number similar to the 10000 hour estimate above.

FWIW, Duet3D has/is going to move to fully automated testing on the hardware itself, at the assembly line. That's a lot easier to write test cases for though, since the order of things doesn't matter. If a solder joint is bad, it will show up, as opposed to firmware bugs which often require certain configurations or sequences of events to show up.

gnydick

@danal you completely missed the point. I haven't had a problem with the stable that caused me to revert to the previous stable. But that doesn't mean that I didn't want fixes that were upstream.

My point is, if someone fixes something and the fixes are never applied to the current release, that's what I disagree with. That's like having to get Windows 11-RCs/beta for fixes to Windows 10.

The fact that the stable releases are, well STABLE, is kind of the point of being marked stable.

It's also meaningless to say that you could still print after applying any and all releases. Anything released to the public to use should be fundamentally functional. If you couldn't, Duet would not be in business.

If I'm remembering correctly, the double G29 bug was in a stable release where as the fix was only available in an RC, or waiting months for the next stable.

I'm not sure if you read my entire original post, if not, you should.

I don't know what everyone's backgrounds are, but there's nuance and experience when it comes to software development that tells me applying fixes to the current stable for bugs in the current stable will almost universally not be difficult or risky when compared to applying those fixes to future releases.

gnydick

@elmoret to do what I described, I could implement it in a few days, provided I was brought up to speed on the hardware. It's quite possible we're envisioning different scope and scale.

I think it's more important to have a test harness that at least covers each g-code and regressions.

I agree, Unit tests are always a PITA. But if it were my job, I would be embarrassed to have certain bugs slip out that are the equivalent of forgetting to make sure your servers' disks don't fill.

elmoret

@gnydick said in Can we have a revised release process?:

@elmoret to do what I described, I could implement it in a few days, provided I was brought up to speed on the hardware.

OK then. Here's the hardware:

1x https://www.mccdaq.com/usb-data-acquisition/USB-QUAD08.aspx
5x https://www.omc-stepperonline.com/Nema-17-Closed-Loop-Stepper-Motor-13Ncm184ozin-Encoder-1000CPR.html?search=encoder&sort=p.price&order=ASC

That covers all your steppers. Then you need a DAQ for DIO/AIO:

2x https://www.mccdaq.com/data-acquisition/low-cost-daq (the USB-200, specifically)

Two of the 8 channel DAQs would be plenty to cover fans, thermistors, endstops, heaters.

Tell you what - if you complete the project and dc42 finds it useful, I'll buy all the hardware back from you for original retail price, so you're only out the few days invested.

gnydick

@elmoret it'll take more than that to learn all of those parts. I'm not experienced with embedded. I don't have the time or money to learn a ton of new things. But I'd be happy to take APIs provided and demonstrate what I'm talking about.

Are they high level interfaces or would I have to learn a ton of stuff just to get those probes bootstrapped and recording, synced, etc?

Are there simulators? I've found the KiCad code, but have no idea how to use it.

Long story short, if my knowledge can be bootstrapped, I can help.

elmoret

Those products all have APIs and drivers already, you'd just call a function and get back stepper positions. They have example code you'd use to configure the DAQs. Here's documentation:

https://www.mccdaq.com/pdfs/manuals/Mcculw_WebHelp/ULStart.htm

And here's a specific example, reading an analog input (for checking state of fans and heaters, for example). I picked Python but they have examples in many programming languages:

https://github.com/mccdaq/mcculw/blob/master/examples/console/a_in.py

Not sure what you mean by simulators? KiCad is a PCB layout program, like Altium.

Btw: if you think RRF has bugs in stable releases, check out Prusa's firmware - and they have roughly 100x the staff/resources of Duet3D!

https://github.com/prusa3d/Prusa-Firmware/issues/1362

"We found out what was the problem caused by. There was an antient bug (related to those errors) which was hot fixed by limiting the possible temp at which the error can be displayed. We removed the limitation in order to prevent dangerous behaviour and forgot it was originaly a hot fix. It should be fixed in FW 3.5.1."

So they're not even commenting their code when they put in hot fixes. And then they released 3.5.1 apparently without fixing the issue.

Not saying anything is excusable/allowable, but just saying I'll take RRF over any of the alternatives any day.

bjdchwr created this issue in prusa3d/Prusa-Firmware

closed PREHEAT ERROR after printing is started. #1362

gnydick

@elmoret cool. I'll check it out. By simulator, I mean just that. There are simulators for all sorts of things. There are circuit simulators, from my college days there was SPICE, for example.

There are PCB simulators, as well. I just have no familiarity with the field.

gnydick

@elmoret that all looks high-ish level enough to work with, but it's obvious there's a great deal of background information needed.

So to try to make my ideas concrete, here's what I would do that could be done in a week.

First Pass - no motors, execution, etc.

Define all of the g-code instructions I want to test
Define all parameters to be tested for each g-code
Define the expected output in terms of just response from the board that it properly received and ingested the command properly. NOT the action taken by the board
I'm assuming there is only one handler needed initially for sending commands
For each type of output, write a function to read from that endpoint
Go back and categorize each g-code by which output endpoint is needed to read the result
Using the API's, write a test case executor that reads the inputs from your definition file, detects the category, calls the proper function to read the output, and compares it to the expected output

This would be very easily accomplished with the right background info about the ecosystem.

Second Pass
Same approach, but now we do the mechanical, electrical, etc. that those probe boxes afford.

I've done this many times.

Phaedrux

I'm all for improvements, and I'm not saying things couldn't be better, but I don't think you're being reasonable considering that the dev team in this case is a one man show with volunteer testers and a release cycle of maybe a month or two. As it stands, if you have a bug and report it, David is likely to have a bug fix release within days for you to try. An RC and a point release technically aren't the same thing, I agree, but for all intents and purposes, in this case it might as well be.

What exactly is the driver for this? Are you upset about the launch bugs of the maestro? When any new hardware comes in contact with a diverse user base there's going to be some bugs found. And fixes have been applied in a very respectable time frame. Does it make sense to delay 2.02 so that a point release for 2.01 can be issued with certain fixes but no new features?

And honestly, of all the things that could improve reprapfirmware usability, I don't think more released versions is one of them. People already have enough trouble keeping track, now you'd suggest having 2.01.1, 2.01.2 as well as 2.02 RCs?

If you want stable, stay on the latest full release. If you want the fix from an RC, evaluate the RC, and if it fixes what you want, run with it, if not, wait for the next full release, it's only a month or so away.

gnydick

@phaedrux you're basically saying, tough noogies, you don't get fixes to the current stable release. That's just not good practice.

I understand being resource constrained, but I also know how software development goes.

I don't think anyone would get confused, your not giving enough credit to people. It's reasonable to expect people to be able to tell the difference between something called latest stable and latest RC, no matter what the numbering scheme.

I have a somewhat rare skill in being able to see how things can play out. In my view, the longer this practice goes on, the harder and riskier it will be should adoption of the hardware start to grow.

Don't get me wrong, I love the hardware, but it's not being supported correctly as a commercial product. I know the pain is low now, again, if adoption grows, you will have much more noisy and very much less educated people demanding fixes.

It is just good business and engineering practice to fix stable and port to upstream, not ignore stable. There can't be two truths, and I'm saying the reasons for doing it as it is now are not valid nor sensible.

If everyone is going to just ignore the potential value with my premises, then we're just going to talk past each other. How about somebody engage in talking to me about my premises and discuss them and why they might actually be true. We'll collectively get nowhere if nobody is willing to put themselves in the other's shoes. I can and have our myself in the mindset of being dc, has anyone tried to argue my side in their head?

elmoret

@gnydick said in Can we have a revised release process?:

First Pass - no motors, execution, etc.

Define all of the g-code instructions I want to test

Define all parameters to be tested for each g-code

Define the expected output in terms of just response from the board that it properly received and ingested the command properly. NOT the action taken by the board

I'm assuming there is only one handler needed initially for sending commands

For each type of output, write a function to read from that endpoint

Go back and categorize each g-code by which output endpoint is needed to read the result

Using the API's, write a test case executor that reads the inputs from your definition file, detects the category, calls the proper function to read the output, and compares it to the expected output

If you can implement this ~24 man hours ("a few days", at 8hrs/day), I know several companies that would hire you. Keep in mind there are roughly ~200 G-code commands, so that's less than 7 minutes per command to flesh out a test. I don't know if I could even write all the possible permutations of a G-code (for example, M587 should realistically test for all possible types of SSIDs/passwords, to detect issues with escaped characters, etc) in less than 10 minutes, much less generate the expected response and save it to a comparison file.

Keep in mind that many g-codes would be interdependent, a simple example is M667, which selects CoreXY mode - but after selecting the mode, don't you have to test all the G-code again? What if selecting CoreXY mode and then probing the bed and then homing triggers a bug? For example in your G29 example - the bug only surfaced if z-compensation was previously active. Now all of the sudden it isn't 200 unit tests, since there are many, many possible permutations.

Again - I don't think anyone is ignoring the potential value with what you propose, we're just pointing out it may not be practical given the time/resource limitations of Duet3D. But if you want to try please don't let me discourage you! I'm sure dc42 appreciates all the help he can get.

Danal

@gnydick said in Can we have a revised release process?:

@danal I'm not sure if you read my entire original post, if not, you should.

Yes. And understood the proposal and its effects. Very thoroughly.

I don't know what everyone's backgrounds are,

I am Enterprise Architect for a Fortune 50 during the day, specializing in development process optimization. I also own/operate an electronics company at night, and our products involve firmware. I generally try to stay away from "I'm qualified because..." discussions, but, in this case, there are probably very few people who experience both sides of the coin (large/small, client, mobile, server, firmware, etc.) to the extent that I do, every day.

If everyone is going to just ignore the potential value with my premises, then we're just going to talk past each other. How about somebody engage in talking to me about my premises and discuss them and why they might actually be true. We'll collectively get nowhere if nobody is willing to put themselves in the other's shoes. I can and have our myself in the mindset of being dc, has anyone tried to argue my side in their head?

I can't speak for anyone else, but you are not talking past me, and I have modeled your process ina its effects "in my head", and I believe am fully grasping both your proposed process and its value proposition. See next post.

Danal

Summary of @gnydick's proposal:
Change from releases marked "Stable" + releases that contain both feature and bug fix, that continue toward the next "Stable". Repeat.

Change to releases marked Stable + two parallel paths, literally a code fork (in whatever VCS), one that is the prior Stable with bug fixes as they become available (and/or "hot fix" only on this path, let's not quibble over fix priority), and parallel path that contains prior Stable plus new features. These are merged at the next "Stable". Repeat.

===================

Here is where I may be "talking past you": I absolutely do understand all the effects of these two development cycles. (As an aside, I would pick a modified form of the second one for old style large teams, and/or modern "agile" teams.) I find the second one to be a negative value proposition given a one man development team.

The person-hours that go into the fork, and the merge, are a waste of a very constrained resource.

Customers, you, me, we should all move forward to, and/or regress to whatever release produces the best results for each of us. One person's desire for a new announced feature may drive them forward to stable, or even RCs, at a very different velocity than someone else. There is an individual choice to move forward or backwards to get the desired results.

All above is opinion... but there is one glaring fact: David can move forward with every kind of release, feature, fix, RC, stable, whatever, can move faster if hours do not disappear into forks and merges and most importantly the repeated regression testing after the merges.

It is therefore my opinion that velocity is more important than the existence of a separate hot fix path, again because there is always the release a given customer was running to succesfully print. Therefore, the only pressure to put oneself in a more precarious position, where hotfixes might be needed separately, is desire for a feature... and that feature will come more quickly (and arguably more stably) if David's time is not used to manage fork/merge/test.

Danal

And, like all great postings on forums where people have technical passion, multiple conversations here.

RE: A "test harness".

I just read through the release logs for about 15 minutes, looking at the bug fixes. It seems that very few would be caught by a "send this command, validate these outputs" style harness. The issue is in defining "expected output". Random examples, from recent release notes:

Fixed potential buffer overflow issues in 12864 menu code

No amount of "external harness" would catch this.

The scheduled move count was too high by 1 after an emergency pause

No amount of "external harness" would catch this.

The print progress calculated from filament used was incorrect when using a mixing tool if the sum of mix values was not 1 (e.g. when ditto printing)

Unlikely that the "test case" expected output would be coded "better" than the code itself.

On SCARA printers, a G30 command immediately after homing the proximal and distal arms could fail due to rounding errors

G30 moves Z until it hits something. i.e. a probe. A value is then stored internally. A test harness would see it probe and stop... and could never know if the internal state stored is correct or not.

Heaters were turning on momentarily when the Duet was reset

A proper "test harness" MIGHT catch this, if the work was done to parse out all outputs vs. time in a "reset" test case.

The above random sample seems to follow my reading. A "test harness" would represent a HUGE investment of time/effort, and would factually not catch the vast majority of things that are receiving bug fixes. In fact, I sort of had to "cherry pick" to find even one (the last one) that it could even theoretically catch.

It seems much better to test via printing (or machining, or whatever).