Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.
-
@bearer said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
At the end of the day its up to the user to choose something tried and true, or accept that early adoption comes with a price tag in more than one sense.
Disconnects in the way that RRF vs DSF are being handled by Duet the company are equally applicable to the 'full' releases.
-
@bearer said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
and with sudo no less!
this was a topic way back when, sort of intermingled with permissions on the /opt/dsf/sd folder and /dev/spi nodes and the priority was to get it working first, then revisit.
as such i didn't poke in great detail, but as access to the spi node can be solved by group permissions, listening to port 80 (or any port below 1024) sounds like the last hurdle. the easy woraround would be nginx and a reverse proxy which would also ease setting up ssl with sometihng like letsencrypt (even if not exposed to the internet)
I withdraw my comment regarding sudo. It diverted attention from the real issue: What is the GCODE to restart the system ?
-
@droftarts said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
I agree. It's just taking time to get DSF (which is pretty much brand new) up to speed with the rest of the firmware (painstakingly developed over many years). But without community interest and expertise getting it working, reporting bugs and fixing, it will take much longer. So once again, thank you all for your continued support.
This completely misses the points being discussed by at least three or four vocal users. It is a powerful indication of the "blind spot" within Duet that causes me to invest the energy in typing these responses:
Any rational person would expect a new major section of software to climb a maturity curve. Totally agree with you on that. And introducing major new architecture and function in V3.x, I believe we all expect it to take time to stabilize. Regardless of where it runs or what tech stack it uses or... it will just take time, testing, feedback, and improvement. Agreed, D'accord.
All of that has nothing to do with the pervasive attitude that RRF and DSF are two separate things. From the viewpoint of the end user they are one thing with defined external interactions (Gcode, Web API, etc).
There are numerous examples of this mis-perception. All of which are seriously complicating the ability to build, deploy, test with the community, upgrade, downgrade, and generally "deal with" the product that fits under the general header of Duet V3. And the folks at Duet appear to be unable to see or acknowledge this is even happening.
When I said "sort of." above, this is what I meant. I don't believe I am ranting anymore; yet there is still more to discuss. I sincerely hope these strong words are read with the intent they are written: To help Duet get better.
-
@droftarts said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
I agree. It's just taking time to get DSF (which is pretty much brand new) up to speed with the rest of the firmware (painstakingly developed over many years). But without community interest and expertise getting it working, reporting bugs and fixing, it will take much longer. So once again, thank you all for your continued support.
Sadly, RRF3 (Duet3 standalone) is significantly more functional than RRF3 (Duet3+SBC.) As long as this is the case, people (such as myself) will use (and test) the standalone code and ignore the SBC/DSF/etc.
For example, do a search for threads where people request PanelDue working properly (for file commands) with a SBC attached to the duet. I've seen "it's easy", "will add that soon" and "it's already there, just need the duet to send the command." Yet.. it hasn't happened. Until that "easy", "will be added soon" and "is already there" bit of functionality works, I won't attach the ribbon cable to my RPi4. (I don't have a computer near my printer, and I won't start a print or macro unless I'm standing near the printer.)
How about the conditional gcode stuff? Is that working in SBC mode yet?
The lack of SBC functionality isn't about community interest, reporting bugs, etc. It's about getting DSF up to a more usable state. I'd happily attach my RPi4 to my duet3 board if I could get a similar level of (even untested) functionality - assuming, of course, I'd have reasonable expectations of bugs getting fixed as fast as dc42 fixes RRF3 bugs. (Which is another gripe: There have been long spans of time where DSF has gone untouched while RRF3 has been moving along.)
I just think it's important to get the ordering of "cause" and "result" correct. The lack of community is a result of lack of development. Not the other way around.
Edit: Just to be clear, I'm not really complaining. I'm happy using my duet3 in stand-alone mode while things move along. However, don't imply that I'm part of the reason that DSF (collectively used to mean all the duet s/w running on the SBC) is lagging so far behind RRF3.
-
I'll just chime in with one more thing:
When Duet 2 was being developed, near the beginning, it was much like this (except not separated so much -- only DWC and RRF, with the original wifi server or whatever too).
I waited very patiently until the code was mature. I bought several Duet 2 boards (before they were called Duet 2) immediately in their infancy. However, RRF was not at a point that it could really be used for what I wanted to do (IDEX printer).
I just waited! I worked on my own stuff and waited. I felt this was fine. I didn't feel I was owed anything by the developers. If anything, I was super gracious that the developers were working so hard on the code to make it work.
Finally, RRF2 got to a point where it was complete enough and reliable enough to use! Hallelujah!
Then, immediately, all the developers decided to abandon RRF2 in favour of RRF3! RRF2 is not as stable a rock as we think it is, but the developers are going full-bore restarting the "wait and see" cycle for RRF3 users.
What about us RRF2 users? Why abandon that so abruptly?
We need a team that is still working on RRF2, while RRF3 is developed! I don't think RRF2 can or should be left in the state it is in.
-
@Danal said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
All of that has nothing to do with the pervasive attitude that RRF and DSF are two separate things. From the viewpoint of the end user they are one thing with defined external interactions (Gcode, Web API, etc).
Surely not ideal; but RRF and DWC were two separate things before DSF as well, just more mature, and I don't see any reason to suspect it will not return to that state. I'm also pretty sure its on the Duet3d agenda to unify things as much as possible - the new duet3d github is probably a sign of things to come. As such given maturity who wrote what or who supports what will matter less when the resources to develop and support have a workload thats more matched to the capability.
Meanwhile the user can choose how to deal with.
@Danal said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
I withdraw my comment regarding sudo. It diverted attention from the real issue: What is the GCODE to restart the system ?
This is also as far as I recall a conscious decision to limit gcode's ability to affect the Pi on a system level . At least M550, M552 and and a few others was at least a topic. And to some degree it boils down to lacking SUDO M997 gcode.
-
@bearer said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
This is also as far as I recall a conscious decision to limit gcode's ability to affect the Pi on a system level. At least M550, M552 and and a few others was at least a topic.
I don't doubt that was a conscious design decision. It fits very firmly with the blind spot that Duet sees these two as somehow separate or different. Again, gcode in one end, movement out the other. Gcode configuration codes in one end, immediate effect on the running device out the other.
Given the statements in the image below, gcodes should "affect the Pi on the system level". M550 (set name) absolutely should set the name of the Duet system network interface, i.e. the Pi itself. Same with 552 (set IP address). And many more such codes.
Or, is Duet explicitly changing the philosophies stated below? Particularly "All settings are done through G-Code"?
-
@bot said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
What about us RRF2 users? Why abandon that so abruptly?
We need a team that is still working on RRF2, while RRF3 is developed! I don't think RRF2 can or should be left in the state it is in.What do you perceive that state to be, and why the reference to "while RRF3 is being developed"? We get few complaints these days of RRF 2.05.1 not doing what it is supposed to do. RRF3 core development was completed months ago, to beyond the point where it provided all the features available in RRF2. Nor did I abandon RRF 2 users: I did the RRF 2.05.1 release when I found important bugs in RRF 2.05. But it no longer makes sense to add new features to RRF 2.
If you look at the bug fix lists in the release notes for the last several RRF 3.01-RC releases, you will see that they are almost all either related to new features in RRF3 that Duet 2 and Duet 3 standalone users don't have to use, or are fixes for minor bugs (some of which are also present in RRF 2). [The exceptions were a couple of new bugs in the 12864 display code for the Duet Maestro.] So RRF 3.01-RC is very nearly as stable and reliable as RRF 2.05.1 even though it provides a lot more functionality. You can run RRF 3.01-RC7 on a Duet 2 or Duet 3 in standalone mode with various versions of DWC, so if you don't feel ready to try DWC 2.1.2 yet you can stay with 2.0.7 or even 1.22.6. Many Duet 2 owners are already running RRF 3.01-RC versions.
The only reason that RRF 3.01 stable hasn't been released yet is that I wanted to finalise changes to the communication mechanism between RRF and DSF, which has meant waiting for some major changes to DSF to be implemented and to settle down. With the release of DSF 1.3.0 and now 1.3.2, that is a step closer.
I realise that some users find the change from RRF2 to (standalone) RRF3 painful. We had to make the major changes between the RRF2 and RRF3 configuration mechanisms both to support the new architecture of Duet 3 and because the RRF2 architecture had become too limiting. Specifically, the fixed pin allocations in RRF2 had become problematic for many users, the kludge of "virtual heaters" had passed its best-before date, the status response returned to DWC was getting too large, and the solution to many of the features that users were asking us for was to implement the object model and provide access to it.
I'm sorry that users of Duet 3 + SBC are having to wait longer than we hoped to get access to all the features now provided by RRF 3 in standalone mode, notably conditional GCode. However, DSF 1.3.x + RRF 3.01 RC6/7 now provide the necessary foundation for conditional GCode in DSF, and implementation of the conditional GCode processor in DSF has now started.
-
RRF3 in standalone may be far more mature than I assumed it to be by observing the forums. Sorry for that assumption.
But RRF2 is, IMO, by no means fully-mature. There are some nearly critical bugs that have yet to be solved (such as this networking issue that I have been trying to document).
I also am receiving lots of M122 responses with "error status: 10" and "error status: 18" but which don't seem to be affecting performance.
I'm sure there are also more bugs to be found, some likely very critical too. I wish dearly that I was as talented as you in the areas of coding and the logic behind the firmware. I would very much volunteer my time to analyzing the firmware for bugs and correct behaviour.
Speaking of correct behaviour, I feel that there have been recent changes made to fundamental behaviour that have not been tested enough, and may be contributing to less-than-desirable results that users are still figuring out.
One such change that I feel was implemented hastily recently, and has not undergone enough testing is this one from 2.02:
Fixed behaviour when moves call for extrusion amounts smaller than one microstep
Another change was the recent removal of quad/octal fallback when step generation was approaching the limits. This wasn't even documented anywhere except by you casually on the forums, I believe.
Until a release of RRF2 has gone through at least a year of "community testing," I would expect there to be a need to have prompt response to bug reports and addressing behaviour that is sub-optimal. Not necessarily by you (dc42) directly, but definitely with your coordination, since I doubt anyone is as familiar with the firmware as you, at this point.
The solid rock that I wish RRF2 to be, is for the behaviour of it to be documented, well tested and understood by its users. I think it's a great chance to create a LTS version of RRF, so that RRF3 can play with whatever it wishes to, while users requiring reliability and predictability can stay with RRF2.
-
I wanted to respond to @Danal , @bot, @gtj0 and others who have expressed your frustrations with how we are both conducting and communicating firmware and software development.
Firstly thanks for all your input, its genuinely valued.
Without getting bogged down in the history its is correct I, and the rest of the team, saw RRF and DSF as two separate entities. This has then manifested itself in user experience that was at times illogical or awkward. It also allowed features to be deployed with a large gap in time between when there were implemented in RRF and DSF, causing further frustration. We spent this afternoon discussing this, confirming that it was an issue, and what we can do to resolve it. @Danal we used your M999 resetting DCS pull request as an example of the blind spot and we now plan to incorporate the change shortly.
Not all of this is fixable overnight. Aspects such as making networking changes to the Pi via gcode have security implications we want to think through before deciding how to implement them. Christian is working hard on closing the feature gap (and thanks to those who are helping with testing).
I also want to reiterate the point made by David that RRF 2 is not abandoned for those that want to use it as a LTS version while RRF 3 is developed further. We will continue to fix bugs found in RRF 2 (@bot David has the networking issue on his list).
Thanks all once again for the feedback, it has triggered us to look at how we coordinate, package, publish and communicate our software and firmware releases going forward.
-
@T3P3Tony said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
We will continue to fix bugs found in RRF 2
On this note, the build instructions say the master branch of RRF is the current RRF2; but it lags the 2.05.1 tag. Similar confusion around the v3-dev branch and the 3.01-RC7 tag.
All thumbs up on the rest of the post.
-
@T3P3Tony Thanks for listening!
-
@T3P3Tony said in Incident report: RRF 3 RC6 DWC 2.1.0 Lockup during print.:
Not all of this is fixable overnight.
No one expects everything to be resolved overnight. This weekend would be soon enough for most of us, I think.
-
Thank you for that response!
I hope I did not come across as too demanding or anything. I'm perfectly happy with the work you all have done and continue to do. You have gone above and beyond what would be expected of you.
I'm excited to see the future of RRF/Duet, and will be happy to continue to support your products no matter the direction you choose. After all, you always seem to end up in the right place.
The responsiveness and methodologies of the duet team, and in particular dc42's contributions, are what drove me to the Duet ecosystem in the first place -- way back to Duet 0.8.5, when he was (I think) working on RRF of his own volition, with no financial interest tied to the Duet 0.8.5 boards.
Please, keep up the good work and don't take my comments as anything but hopeful encouragement.
-
Thanks for considering these topics. Timing "is what it is" with changes this big, that's understood. Thanks for taking a look at some of the fundamental directions.