Should M999 terminate the DSF core application?
-
@garyd9 said in Should M999 terminate the DSF core application?:
@Danal said in Should M999 terminate the DSF core application?:
@garyd9 said in Should M999 terminate the DSF core application?:
Why keep debating it immediately before stating you didn't want to reopen it?
Because you keep posting about it and I do wish that bystanders know there are two, widespread, operational practices. I don't want people to change in either direction without more research on their own.
That's fair. I hope you don't take offense at my debating with you. Discussion is the best way to share different views/opinions. Once in a while, new and better ideas are the result.
Actually, I enjoy the heck out of these discussions! I'm always a little worried about the person at the other end, so it is very nice to hear your statement.
-
Just to save @chrishamm some reading, a consensus emerged above:
M999 should:
- With no parms restart RRF on the board, and DSF (at least duetcontrolserver) on the Pi.
- With optional parms, it should also be capable of restarting:
- RRF on Board only
- DSF (dcs?) only
- Rebooting the entire Pi.
Anyone correct anything if it is wrong, or even in the slightest misleading.
-
To continue this discussion a bit further...
https://github.com/chrishamm/DuetSoftwareFramework/issues/120The current implementation seems to create a race condition....
- M999 is issued.
- RRF is restarted and DCS is killed.
- RRF looks for config.g from DCS but DCS isn't there.
- RRF hangs at "Off".
- DCS starts but RRF is Off so there's nothing you can do.
- Issuing M999 again restarts the cycle.
I'd like to suggest that on startup, DCS check RRF state and if it hasn't already loaded config.g, load it.
@dc42 is there a way for DCS to tell if RRF has or has not run config.g? I don't think we'd want DCS to restart RRF if it had run config.g but was Off for some other reason. Just if it hadn't initialized the first time.
-
Interesting. Shouldn't RRF, on every startup, look for the SD files first, and if not found, wait in a known state for DSF to contact it? That was my impression of how ordinary power on worked.
On a restart, what would be different?
-
@Danal Yeah, now that I think about it, that's what should have happened I guess. I've got a print going now but I'll test some other scenarios in a bit.
-
@dc42 FYI...
After M122 DCS had shut down while RRF was still starting...
=== Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.01-RC8 running on Duet 3 MB6HC v0.6 or 1.0 Board ID: 08DGM-9T66A-G63SJ-6J9F4-3SD6S-1U03BUsed output buffers: 1 of 40 (7 max) === RTOS === Static ram: 154580 Dynamic ram: 160520 of which 20 recycled Exception stack ram used: 308 Never used ram: 77788 Tasks: NETWORK(ready,2084) HEAT(blocked,1452) CanReceiv(suspended,3824) CanSender(suspended,1484) CanClock(blocked,1464) TMC(suspended,216) MAIN(running,5108) IDLE(ready,80) Owned mutexes: === Platform === Last reset 00:00:08 ago, cause: software Last software reset at 2020-04-18 18:08, reason: User, spinning module LinuxInterface, available RAM 75940 bytes (slot 3) Software reset code 0x0010 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x04432000 BFAR 0x00000000 SP 0xffffffff Task 0x4e49414d Error status: 0 Free file entries: 10 SD card 0 not detected, interface speed: 37.5MBytes/sec SD card longest block write time: 0.0ms, max retries 0 MCU temperature: min 40.1, current 40.4, max 40.6 Supply voltage: min 2.3, current 2.3, max 25.4, under voltage events: 0, over voltage events: 0, power good: no 12V rail voltage: min 0.4, current 0.4, max 12.2, under voltage events: 1 Driver 0: standstill, reads 20771, writes 11 timeouts 0, SG min/max 0/0 Driver 1: standstill, reads 20771, writes 11 timeouts 0, SG min/max 0/0 Driver 2: standstill, reads 20771, writes 11 timeouts 0, SG min/max 0/0 Driver 3: standstill, reads 20771, writes 11 timeouts 0, SG min/max 0/0 Driver 4: standstill, reads 20771, writes 11 timeouts 0, SG min/max 0/0 Driver 5: standstill, reads 20771, writes 11 timeouts 0, SG min/max 0/0 Date/time: 1970-01-01 00:00:00 Slowest loop: 2.68ms; fastest: 0.13ms === Move === Hiccups: 0(0), FreeDm: 375, MinFreeDm: 375, MaxWait: 0ms Bed compensation in use: none, comp offset 0.000 === MainDDARing === Scheduled moves: 0, completed moves: 0, StepErrors: 0, LaErrors: 0, Underruns: 0, 0 CDDA state: -1 === AuxDDARing === Scheduled moves: 0, completed moves: 0, StepErrors: 0, LaErrors: 0, Underruns: 0, 0 CDDA state: -1 === Heat === Bed heaters = -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1 === GCodes === Segments left: 0 Movement lock held by null HTTP is idle in state(s) 0 Telnet is idle in state(s) 0 File is idle in state(s) 0 USB is ready with "M122" in state(s) 0 Aux is idle in state(s) 0 Trigger is idle in state(s) 0 Queue is idle in state(s) 0 LCD is idle in state(s) 0 SBC is idle in state(s) 0 Daemon* is idle in state(s) 0 0, running macro Autopause is idle in state(s) 0 Code queue is empty. === Network === Slowest loop: 0.66ms; fastest: 0.01ms Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0) HTTP sessions: 0 of 8 - Ethernet - State: disabled Error counts: 0 0 0 0 0 Socket states: 0 0 0 0 0 0 0 0 === CAN === Messages sent 0, longest wait 0ms for type 0 === Linux interface === State: 0, failed transfers: 0 Last transfer: 8745ms ago RX/TX seq numbers: 0/3 SPI underruns 0, overruns 0 Number of disconnects: 1 Buffer RX/TX: 0/0-0 ok
After DCS restarts. Note that RRF recognizes that the DCS is running but doesn't do anything about it.
Connection to Linux established! M122 === Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.01-RC8 running on Duet 3 MB6HC v0.6 or 1.0 Board ID: 08DGM-9T66A-G63SJ-6J9F4-3SD6S-1U03BUsed output buffers: 1 of 40 (12 max) === RTOS === Static ram: 154580 Dynamic ram: 160520 of which 20 recycled Exception stack ram used: 308 Never used ram: 77788 Tasks: NETWORK(ready,2084) HEAT(blocked,1452) CanReceiv(suspended,3824) CanSender(suspended,1484) CanClock(blocked,1464) TMC(suspended,216) MAIN(running,4468) IDLE(ready,80) Owned mutexes: === Platform === Last reset 00:01:26 ago, cause: software Last software reset at 2020-04-18 18:08, reason: User, spinning module LinuxInterface, available RAM 75940 bytes (slot 3) Software reset code 0x0010 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x04432000 BFAR 0x00000000 SP 0xffffffff Task 0x4e49414d Error status: 0 Free file entries: 10 SD card 0 not detected, interface speed: 37.5MBytes/sec SD card longest block write time: 0.0ms, max retries 0 MCU temperature: min 40.0, current 40.1, max 40.6 Supply voltage: min 0.1, current 0.2, max 2.3, under voltage events: 0, over voltage events: 0, power good: no 12V rail voltage: min 0.2, current 0.3, max 0.4, under voltage events: 1 Driver 0: standstill, reads 0, writes 0 timeouts 0, SG min/max not available Driver 1: standstill, reads 0, writes 0 timeouts 0, SG min/max not available Driver 2: standstill, reads 0, writes 0 timeouts 0, SG min/max not available Driver 3: standstill, reads 0, writes 0 timeouts 0, SG min/max not available Driver 4: standstill, reads 0, writes 0 timeouts 0, SG min/max not available Driver 5: standstill, reads 0, writes 0 timeouts 0, SG min/max not available Date/time: 1970-01-01 00:00:00 Slowest loop: 2.07ms; fastest: 0.13ms === Move === Hiccups: 0(0), FreeDm: 375, MinFreeDm: 375, MaxWait: 0ms Bed compensation in use: none, comp offset 0.000 === MainDDARing === Scheduled moves: 0, completed moves: 0, StepErrors: 0, LaErrors: 0, Underruns: 0, 0 CDDA state: -1 === AuxDDARing === Scheduled moves: 0, completed moves: 0, StepErrors: 0, LaErrors: 0, Underruns: 0, 0 CDDA state: -1 === Heat === Bed heaters = -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1 === GCodes === Segments left: 0 Movement lock held by null HTTP is idle in state(s) 0 Telnet is idle in state(s) 0 File is idle in state(s) 0 USB is ready with "M122" in state(s) 0 Aux is idle in state(s) 0 Trigger is idle in state(s) 0 Queue is idle in state(s) 0 LCD is idle in state(s) 0 SBC is idle in state(s) 0 Daemon* is idle in state(s) 0 Autopause is idle in state(s) 0 Code queue is empty. === Network === Slowest loop: 1.04ms; fastest: 0.01ms Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0) HTTP sessions: 0 of 8 - Ethernet - State: disabled Error counts: 0 0 0 0 0 Socket states: 0 0 0 0 0 0 0 0 === CAN === Messages sent 0, longest wait 0ms for type 0 === Linux interface === State: 0, failed transfers: 0 Last transfer: 27ms ago RX/TX seq numbers: 595/597 SPI underruns 0, overruns 0 Number of disconnects: 1 Buffer RX/TX: 0/0-0 ok
After pressing reset button with DCS running. All good.
M122 === Diagnostics === RepRapFirmware for Duet 3 MB6HC version 3.01-RC8 running on Duet 3 MB6HC v0.6 or 1.0 Board ID: 08DGM-9T66A-G63SJ-6J9F4-3SD6S-1U03BUsed output buffers: 1 of 40 (10 max) === RTOS === Static ram: 154580 Dynamic ram: 162360 of which 36 recycled Exception stack ram used: 300 Never used ram: 75940 Tasks: NETWORK(ready,2076) HEAT(blocked,1196) CanReceiv(suspended,3824) CanSender(suspended,1484) CanClock(blocked,1464) TMC(blocked,216) MA IN(running,4840) IDLE(ready,80) Owned mutexes: === Platform === Last reset 00:00:07 ago, cause: reset button Last software reset at 2020-04-18 18:08, reason: User, spinning module LinuxInterface, available RAM 75940 bytes (slot 3) Software reset code 0x0010 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x04432000 BFAR 0x00000000 SP 0xffffffff Task 0x4e49414d Error status: 0 Free file entries: 10 SD card 0 not detected, interface speed: 37.5MBytes/sec SD card longest block write time: 0.0ms, max retries 0 MCU temperature: min 37.4, current 39.7, max 39.8 Supply voltage: min 0.1, current 25.4, max 26.0, under voltage events: 0, over voltage events: 0, power good: yes 12V rail voltage: min 0.3, current 12.2, max 12.2, under voltage events: 0 Driver 0: standstill, reads 27124, writes 11 timeouts 0, SG min/max 0/0 Driver 1: standstill, reads 27125, writes 11 timeouts 0, SG min/max 0/0 Driver 2: standstill, reads 27125, writes 11 timeouts 0, SG min/max 0/0 Driver 3: standstill, reads 27126, writes 11 timeouts 0, SG min/max 0/0 Driver 4: standstill, reads 27126, writes 11 timeouts 0, SG min/max 0/0 Driver 5: standstill, reads 27126, writes 11 timeouts 0, SG min/max 0/0 Date/time: 2020-04-18 18:10:13 Slowest loop: 3.90ms; fastest: 0.13ms === Move === Hiccups: 0(0), FreeDm: 375, MinFreeDm: 375, MaxWait: 0ms Bed compensation in use: none, comp offset 0.000 === MainDDARing === Scheduled moves: 0, completed moves: 0, StepErrors: 0, LaErrors: 0, Underruns: 0, 0 CDDA state: -1 === AuxDDARing === Scheduled moves: 0, completed moves: 0, StepErrors: 0, LaErrors: 0, Underruns: 0, 0 CDDA state: -1 === Heat === Bed heaters = 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1 === GCodes === Segments left: 0 Movement lock held by null HTTP is idle in state(s) 0 Telnet is idle in state(s) 0 File is idle in state(s) 0 USB is ready with "M122" in state(s) 0 Aux is idle in state(s) 0 Trigger* is idle in state(s) 0 Queue is idle in state(s) 0 LCD is idle in state(s) 0 SBC is idle in state(s) 0 Daemon* is idle in state(s) 0 Autopause is idle in state(s) 0 Code queue is empty. === Network === Slowest loop: 0.47ms; fastest: 0.01ms Responder states: HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) HTTP(0) FTP(0) Telnet(0) Telnet(0) HTTP sessions: 0 of 8 - Ethernet - State: disabled Error counts: 0 0 0 0 0 Socket states: 0 0 0 0 0 0 0 0 === CAN === Messages sent 27, longest wait 0ms for type 0 === Linux interface === State: 0, failed transfers: 0 Last transfer: 16ms ago RX/TX seq numbers: 1770/287 SPI underruns 0, overruns 0 Number of disconnects: 0 Buffer RX/TX: 0/0-0
-
Also running
M98 P"config.g"
Fixes the issue if you can't get to the reset button
-
@jay_s_uk said in Should M999 terminate the DSF core application?:
Also running
M98 P"config.g"
Fixes the issue if you can't get to the reset button
Good point!
-
The best way to solve this might just be for the DCS to defer it's shutdown until RRF has finished reading the config.g file.
-
@gtj0 said in Should M999 terminate the DSF core application?:
The best way to solve this might just be for the DCS to defer it's shutdown until RRF has finished reading the config.g file.
A restart is to get out of being hung. Waiting for things to happen is a way to get hung again (or stay hung).
DCS really needs to "fire" a reset at the SPI interface, and then exit as directly as possible (no cleanup). At one point, I coded this, literally just 'exit(8)' right after the function call to send the board the restart, and it was tested by me and one other (maybe even you, @gtj0? whoever it was re-built it into a 64 bit build of DCS) and it worked fine. 'Worked' meaning that it force restarted both the board and DCS, and it resulted in a fully running system (with no other actions).
I haven't looked at the RC8 code for this... but at one point Chris rejected the pull because it didn't do cleanup (plus some other more philosophic reasons). To me, that red button in the upper corner of the screen should work very much like the 'reset' pin on a microcontroller CPU. Not one more instruction... just restart. Or as close as reasonably possible.
-
@Danal Yeah it was me that tested it. I agree that it should be an immediate reset, at least from an RRF perspective. When you press the button it's usually because something bad/dangerous is happening and you just want to STOP. The problem is getting it going again.
-
@Danal You need to ensure that the reset package actually makes it to RRF so I had to change a few extra spots in DCS. But I'll check again if I actually exit the SPI loop when requested and if that is not the case, I'll enforce it now. Either way, I'll try to fix it in DSF next.
With the new code,
Environment.Exit
is still the last resort in case the internal termination request fails after 4 seconds. See the last few lines in Program.cs for further details. -
@chrishamm said in Should M999 terminate the DSF core application?:
@Danal You need to ensure that the reset package actually makes it to RRF so I had to change a few extra spots in DCS. But I'll check again if I actually exit the SPI loop when requested and if that is not the case, I'll enforce it now. Either way, I'll try to fix it in DSF next.
With the new code,
Environment.Exit
is still the last resort in case the internal termination request fails after 4 seconds. See the last few lines in Program.cs for further details.Chris, thanks for the info.
-
This appears to be fixed in the forthcoming DSF 2.1.0.