Duet 2.05 memory leak?



  • I have started doing triplication printing over the last 2 weeks with my quad printer. I have noticed that 2.05 firmware has some sort of a memory leak that manifests in a very inconvenient way. I am printing large quantities of face shields for out local hospitals. I have been not inclined to post and start digging into what's the problem, as I need the printer operational, and keep producing face shields, but it's getting to be pretty frustrating at this point. It seems that if I do anything before I start a print that has to do with movement -- home my axis -- even not all of them, I hit this issue. After about 30-40 minutes into a tripplication print -- 3 of the 4 hotends are printing (they're fitted with a 0.6 nozzle, the 4th is 0.4). I start getting phase warnings about various steppers -- and phase A & B warnings -- wiring is fine I triple checked it, and the complaints are about the driver for the extruder that is not even doing anything -- sitting there parked. And the print starts stuttering and basically becomes unusable. I've been getting around this issue by simply resetting the duet between prints, either a reset or a full hard power off, and that fixes the issue. But that's not a permanent solution, and I'd like to figure what's going on. I can't bring the machine down for too long to investigate -- it's producing over 50 shields per day.
    I have done repeated duplication prints without issue -- but this leak happens during triplication -- so obviously more steppers are involved.


  • administrators

    There is a bug fix in 2.05.1 that might possibly resolve this.



  • Did I miss a release, I was looking around 2 weeks ago when I was starting printing the shields, I build my own and I got the code from 12/19 -- when was this fix released?



  • I guess I did, looks like february -- I will take the code this weekend -- I have several shield orders to fill, so I'll do firmware resets until then.



  • c8c86ced-5be0-401f-926f-48213f0d9315-image.png

    you'll get notified about both rrf3 and rrf2 mind you, but should save you missing out; i'm sure your email client filter the rrf3 stuff if you really have to.



  • @dc42 I am looking at the release notes, and I don't see a bug regarding my issue -- I see something regarding tool changes (or is there something more to it), but there are no tool changes -- this is straight up select T5 and proceed printing triplication.


  • administrators

    There is a fix in 2.05.1 for a 1-byte buffer overflow which has unknown consequences. That's why I wrote that it "might possibly" resolve it.



  • @dc42 ok, will apply it on Saturday and see if it fixes the issue. Seems curious that I ran multiple duplication prints, no problems, a triplication print only works with a clean boot. I started checking all the drivers it was complaining about until I realized the phase A and B error is something weird -- especially on a driver to a stepper that isn't even doing anything.



  • for now I have promised donations of 100 shields to 2 different hospitals today and tomorrow -- so I need to keep the thing printing. It's my main workhorse -- since it can turn out 50 shields with visors during one day. I will have window this weekend to try and resolve this. I'd love to pause -- and switch to duet 3 with a few expansion boards, but that's too much down time, that I can't have now.


  • Moderator

    @kazolar said in Duet 2.05 memory leak?:

    I build my own and I got the code from 12/19

    2.05.1 was released 9th February: https://github.com/dc42/RepRapFirmware/releases/tag/2.05.1
    Any particular reason you build your own? Might introduce errors?
    If you can capture a M112 M122 output once you get the errors, that may be useful, and/or turn on debug.

    On a practical note, with all those motors wiring away, maybe static is building up, and shorting out through the idle motor? What's your grounding like?

    Ian



  • @droftarts there are some behaviors with tool changes I don't like, and don't work with my slicers. What you are saying doesn't make sense. I run the print 40 times over the last 2 weeks using the soft reset method between prints, nothing else, if I just home one of the axis before starting the same print the problem starts happening. If it was anything electrical, the problem wouldn't go away with a soft reset. My build changes are the same patch I've been moving since version 1.21 and I've done triplicate printing before, and have done it back to back and the first time that I've had issues with it is with some point Release of 2.0. I'm almost tempted to say the problem started with the 2.05 release. I've had the machine make over 300 shields and it worked perfectly fine last night after a soft reset. I had a brain fart prior to that and did an extra home of a multiple axis, and ran that print and it started complaining about one driver or another and studdering. Soft reset, clean the bed, start again, perfect print.


  • Moderator

    @kazolar my advice would be to return to the firmware version that causes you least problems, probably 2.04, if you want reliability, until you get the chance to update and test your version of 2.05.1.

    If it ‘works perfectly’, but not after doing x, y or z, and that is repeatable, then it should be traceable. I appreciate you don’t have time to look into it now, though. I was just trying to suggest other things I know can cause similar issues.

    Ian



  • @droftarts I am also kind of in a crunch to dig out the last working build. Soft resets work now...I just was starting a print late last night, and had a brain fart and ran my macro to home axis besides z. Then started a print, came back 30 minutes later to studdering. I opened the ticket, and in the mean time adjusted some nozle calibration, studdering is pretty violent, hit soft reset and have a perfect set of 9 shields on the bed ready this morningQuadZilla Print of 0__gcodes_Visor_Top_Face_Shield_4_Hole_Punch_Prusa-LighterTripple.gcode finished in 4_03_12s.jpg



  • @dc42 so the consequence to the 1-byte buffer overflow is the issues I was experiencing. I have applied the firmware this morning. I specifically ran my macro which homes everything except Z to check all movement was fine, then did M18. After that I ran my triplicate print of shields -- worked, 4 hour print, no issues. After that, without a reset I started another triplicate print of 9 shields and I am almost 1 1/2 hours into it, well past the predictable point when errors would have occurred. I could literally time exactly 30 minutes after the lead screw compensation would finish phase warning would start happening. So far, the fix looks like it has worked. Since today is a holiday, I could get the firmware updated and keep printing shields. I'll be able to kick another set of 9 tonight before going to bed, so if 3 in a row succeed...I call it good.


  • Moderator

    @kazolar glad you got it fixed. And thank you for the support you’re giving health workers!

    Ian



  • @droftarts doing what I can. I'm right outside of NYC, I'm in contact with a lot of healthcare professionals. Getting a shield out shows them that we care, and they're not alone in this, as well as protecting them. I've personally heard too many stories from front line workers in my area of preventable tragedies, kids who should still have both their parents. The printer is running now. I've given away 350 shields (full sets with acetate and elastic) just since this weekend. It's overwhelming. I'm glad to have resolved this issue and the printers have produced over 100 shields in the last 2 days. I'm doing shields with visors based on feedback from nurses and ems workers. Since Ive gotten such quick turnarounds as a set of 50 being picked up in the morning and end up in the field by night time. I'm getting very fast feedback for design changes. The visor design is the winner. Not the quickest to print, but it's what works. Thank you for the help. The machines have gone through almost 20kg of PETg this week and more is on its way. I hope to be printing more fun things eventually.


  • Moderator

    @kazolar cnc kitchen has done a good video on speeding up prints, might help churn more out! https://youtu.be/_bt1UZAnxnA

    Which design are you finding most popular with healthcare professionals? Link to yours or similar?

    Ian



  • @droftarts yes, I was already doing all those tips, that's how I am getting the yields I am.
    I actually use the design CNC kitchen as inspiration to iterate on the design we started with. I had published the design I've been printing:
    Enhanced low weight Modified Prusa Face Shield with a Visor found on #Thingiverse https://www.thingiverse.com/thing:4273009
    There is 3 of us in our group, I have much more capacity, so I've distributed many more shields, but other members have been catching up after adding new machines to their effort. Our employer has stepped up to cover our expenses, even paying for other members to get additional machines (sidewinder x1). Since I've switched to the visor design, our other members have also started printing it, and I'm passing on the tips to squeeze as much speed as possible out of every print. These don't have to be pretty, they just have to work.



  • @dc42 the problem is not fixed. This time it doesn't complain about any driver issues -- just starts stuttering exactly 30 minutes after starting the print. I am trying it after board reset. I may go back to an earlier version. Not sure what's going on.

    EDIT:
    Went back to 2.03 RC2 -- so far all is good.
    Gonna stay on that for now


  • Moderator

    Did you check a M122 to see if there were hiccups?



  • @Phaedrux no hickups -- it just starts stuttering. I am back to version 2.03 RC2 -- 2nd print with no resets is running fine -- now that's no indication that it's bug free, but it's working, so until I see a reason to move from it I'm staying on this build



  • So doing a bunch of M122s during the print, and finally caught the issue -- underruns -- @dc42 the count resets too often, and would be nice to get an error on the screen when it gets critical. I switched to a brand new class 10 sd cards, and stuttering and all weirdness stopped -- back to version 2.05.1. As smart as Duet is -- the fact that an SD card is not up to snuff, and/or is dying, should be something you can detect. Took me over 2 weeks hunting for the issue. Underruns keep resetting, so it's almost impossible to go on that. Now underruns are 0,0 -- and UI on the LCD is more responsive, shows the list of files and macros in an instant.


  • Moderator

    @kazolar Thanks for your persistence, and your report. SD card problems can have strange, and often not very obvious, effects. I don't know if the firmware can be set to detect SD card issues, that's one for @dc42. You can test an SD card with M122 P104 S[file size in MB], usually between 2 and 2.5Mbytes/sec. For me: Duet 2 WiFi - 2.23Mbytes/sec, Duet Maestro 2.42Mbytes/sec for a 10MB file.

    Ian



  • @kazolar underruns, and any of the other stats like that, are reset each time you run M122.



  • @droftarts there is gotta be something to respond to underruns of some level. Clearly underruns were getting out of hand, if the firmware simply starts complaining about underruns how it complains about stepper phase warnings and other things of that nature, then it makes troubleshooting a lot easier, and resetting underruns seems to happen more often than just running m122. I canceled the print and all the stats in m122 underrun line was cleared out.


Log in to reply