M291 S4 J2 T0 problem
-
I think there's a logical flaw in M291 that can lead to files being left open. Or maybe I'm not reading the docs correctly.
Here's my analysis
S 4 (or greater) - these are non-blocking. My understanding of this means that other code can be run while the message box is being displayed.
J 1 or 2 - Enables the T parameter
T0 - No timeout.
With any of these combinations, macro file (foo.g) can post a message and wait for user input. Then, another macro (bar.g) can start (because foo's message is non-blocking). If bar (the second macro) posts a message, then foo's message box (without timeout) will no longer be displayed. The macro foo.g now seems unrecoverable and is seems lost forever.
Or does bar's message box request cause bar's input request to be cancelled (per the J parameter)?
I suggest that any when the T0 parameter is used with those S and J parameters, the message should become blocking.
-
@mikeabuilder S4 and later are blocking. The M291 command blocks execution of subsequent lines of the macro until either the user enters a response or they time out. Only S0 and S1 are non blocking, i.e the message is displayed and execution continues immediately without waiting for the user to acknowledge the message or for the message to time out.
-
@dc42 - thank for clarifying that. Maybe the doc page should be explicit about this on S4-7.
I still have a concern about other macros pushing a blocking message off the user interface and leaving the first macro in a hung state. Can you clarify whether it's possible for a second macro to start execution while the first is waiting for a message response?
Maybe this is asking if macros can started asynchronously. I've recently been having issues where it appears that macros are started multiple times from PanelDu, causing blocking messages to replace each other, and ultimately leading to "too many files open" errors. Maybe I should develop a simple repro case.
-
@mikeabuilder said in M291 S4 J2 T0 problem:
Maybe the doc page should be explicit about this on S4-7.
Thanks, I've updated the M291 entry. https://docs.duet3d.com/en/User_manual/Reference/Gcodes#m291-display-message-and-optionally-wait-for-response
Ian
-
@mikeabuilder said in M291 S4 J2 T0 problem:
Maybe this is asking if macros can started asynchronously.
Macros can be started asynchronously from different input channels, but not from the same input channel.
-
I think maybe I am asking about starting macros asynchronously. Here's an repro case for the problem I'm experiencing.
HW - MB6HC
RRF - 3.6.0-rc.2
DWC - 3.6.0-rc.2
PanelDu FW - 3.5.1-v3-7.0This macro logs it's starting time, waits a bit (10 seconds), then executes an M291. It also logs the M291 response and its exit time and cause.
; before starting this test, be sure a global variable named "counter" has been created with a value of 0 var my_id = global.counter +1 set global.counter = var.my_id var my_log_offset = "" while iterations < var.my_id set var.my_log_offset = (var.my_log_offset^ " ") echo >>"0:/macros/async_log.txt" state.time, var.my_log_offset, var.my_id, "Instance starting" ; wait for a bit to allow async start attempts G4 S10 M291 S4 T0 J2 K{"continue",} P"select any option" echo >>"0:/macros/async_log.txt" state.time, var.my_log_offset, var.my_id, ("Result: "^result^" Input: "^input) if result = -1 ; user cancelled echo >>"0:/macros/async_log.txt" state.time, var.my_log_offset, var.my_id, "exiting because user cancelled" M99 echo >>"0:/macros/async_log.txt" state.time, var.my_log_offset, var.my_id, "exiting at EOF"
I did two experiments with this macro.
First, I used PanelDu to start the macro three times, touching the button at intervals of about one second. I saw the user interface message appear three times and each time I pressed the "continue" button (input 0) This resulted in the following test log entries. The logs show that instances 2 and 3 were running before instance 1 completed. I think this is an example of asynchronous macros from the same input channel (assuming PanelDu is one channel). Based on the 10 second interval between the logged start times, I think the PanelDu queued the macros to RRF and each subsequent macro started after the G4 timer timed out. So now I see a potential conflict between wanting PanelDu to be able to queue commands for things like modifying temperatures, and not wanting it to be able to queue commands so macros can't be started over and over.The logs also imply that the M291 messages overwrote each other and the first one I responded to was from instance 3, not instance 1.
2025-05-16T09:22:59 1 Instance starting 2025-05-16T09:23:09 2 Instance starting 2025-05-16T09:23:19 3 Instance starting 2025-05-16T09:23:42 3 Result: 0 Input: 0 2025-05-16T09:23:42 3 exiting at EOF 2025-05-16T09:23:42 2 Result: 0 Input: 0 2025-05-16T09:23:42 2 exiting at EOF 2025-05-16T09:23:42 1 Result: 0 Input: 0 2025-05-16T09:23:42 1 exiting at EOF
My second experiment was the same as the first, but instead of touching the PanelDu button to start the macro 3 times, I touched it 10 times. In this case, I saw one M291 window pop up and after the M291 window closed nothing happened for a while (maybe 20 seconds), then I got an overflow message on PanelDu and after closing that message, PanelDu continued to show "Busy" in the upper right corner. After a couple minutes, I went back indoors to where my DWC connection is and saw an M291 message there. I responded to it and several others. Here's the log of that sequence - I added a couple of lines to show where I switched to DWC. After this, PanelDu shows "Idle". The system log from this time is below my test logs where you can see the stack overflow messages.
2025-05-16T09:23:51 4 Instance starting 2025-05-16T09:24:01 5 Instance starting 2025-05-16T09:24:11 6 Instance starting 2025-05-16T09:24:21 7 Instance starting 2025-05-16T09:24:31 8 Instance starting 2025-05-16T09:27:25 8 Result: 0 Input: 0 2025-05-16T09:27:25 8 exiting at EOF 2025-05-16T09:27:25 7 Result: 0 Input: 0 2025-05-16T09:27:25 7 exiting at EOF 2025-05-16T09:27:25 6 Result: 0 Input: 0 2025-05-16T09:27:25 6 exiting at EOF 2025-05-16T09:27:25 5 Result: 0 Input: 0 2025-05-16T09:27:25 5 exiting at EOF 2025-05-16T09:27:25 4 Result: 0 Input: 0 2025-05-16T09:27:25 4 exiting at EOF
System log entries:
2025-05-16 09:23:09 [info] M291: - [no title] - select any option 2025-05-16 09:23:19 [info] M291: - [no title] - select any option 2025-05-16 09:23:29 [info] M291: - [no title] - select any option 2025-05-16 09:23:29 [info] M292: cancelled=false shouldAbort=false 2025-05-16 09:23:31 [info] M292: cancelled=false shouldAbort=false 2025-05-16 09:23:42 [info] M292: cancelled=false shouldAbort=false 2025-05-16 09:24:01 [info] M291: - [no title] - select any option 2025-05-16 09:24:11 [info] M291: - [no title] - select any option 2025-05-16 09:24:21 [info] M291: - [no title] - select any option 2025-05-16 09:24:31 [info] M291: - [no title] - select any option 2025-05-16 09:24:41 [info] M291: - [no title] - select any option 2025-05-16 09:24:41 [warn] Error: Push(): stack overflow on Aux 2025-05-16 09:24:41 [warn] Error: Push(): stack overflow on Aux 2025-05-16 09:24:41 [warn] Error: Push(): stack overflow on Aux 2025-05-16 09:24:41 [warn] Error: Push(): stack overflow on Aux 2025-05-16 09:24:41 [warn] Error: Push(): stack overflow on Aux 2025-05-16 09:27:21 [info] M292: cancelled=false shouldAbort=false 2025-05-16 09:27:22 [info] M292: cancelled=false shouldAbort=false 2025-05-16 09:27:23 [info] M292: cancelled=false shouldAbort=false 2025-05-16 09:27:24 [info] M292: cancelled=false shouldAbort=false 2025-05-16 09:27:25 [info] M292: cancelled=false shouldAbort=false
I subsequently reproduced the same results using DWC and starting the macro with mouse clicks, except that DWC did not get stuck in a "Busy" state.
I now have a theory about the root cause of issues I've had with my system that relies on PanelDu. I have had people (including me) click on a macro button on PanelDu while a print is running and then having a subsequent tool dock macro fail (resulting in a crash). My theory is that the macro button press filled the "Push()" and when one of the tool docking macros tried to run there was an overflow and it got skipped.
I don't think those behaviors should be OK. Am I missing something?
The only thing I can think of as a way around this is for me to maintain some kind of global lock token that gets set in start.g (or whenever a critical macro is going to start), then have all user initiated macros not start if the global lock is set. Stop.g would need to release the global lock. I can imagine this would get complicated with multiple locks for some macros that might be called either by a print file or on their own - like tool change macros.
Sorry for the long post, but this is bothersome.
-
@mikeabuilder thanks, looks like you may have found a bug. I think RRF should ignore any commands received while it is waiting for the M292 response to the M291 command. I will take a look on Monday.
-
@dc42 - thanks for having a look at this. I see three possible issues (maybe trickiness in the way RRF works)
-
RRF allowing PanelDu or DWC to launch (queue?) a macro start while the one they just launched is being executed ( in my test code, paused by G4 ). This seems to be the thing leading to the stack overflow.
-
The behavior when an asynchronously launched macro calls for a message window when another macro already has one displayed.
I also suspect that some of the queuing behavior is desirable for some of the controls (like temperature adjustments) in DWC and PanelDu. No idea how to reconcile the two.
-
-
@mikeabuilder I think what's leading to the stack overflow is that RRF stacks the current state when it executes M291 and then waits for the M292 acknowledgement. If we have RRF discard any input from the same channel apart from M292 while in this state, that should resolve the stack overflow.
I created https://github.com/Duet3D/RepRapFirmware/issues/1117.