Just replying to myself with some difficulties with the above approach after thinking about it some more.
Sending a move and expecting it to take exactly the expected time slice. Unless you mirror exactly how the Duet is calculating velocities etc (not to mention latency of processing the move), then the time will be off. This delta t error is then cumulative which is why it becomes an issue. Reading back the M400 to know it is complete and then proceeding is better, but you then introduce an additional latency of read M400, compute target, send position.
It is pretty desirable to keep the Duet respecting acceleration and jerk limits, but you need at least 3 moves going at all times, let's call them A, B, C.
Move A is executing, B is next in the queue, and C does not exist
We compute the target destination using data available right now (while A is executing) and send it as move C. This must complete before B begins executing
Duet receives move C, adds it to the queue, and adjusts the speeds of move B
Move A finishes and B begins
Repeat
This could get shortened down to 2 moves A and B if the Duet could modify an in-progress move (before some deadline) once B got added. But that seems super complex.
The downside is that we have to compute move C with information at the moment while A is executing, but things will change while B is executing.
Some difficulties in doing the above are:
knowing the true machine position at the time we compute move C. The duet will be somewhere between A's start position and A's end position but we ideally would know exactly where
knowing where the Duet is actually in the queue currently, ie. we think A is still executing but is it now executing B because our time estimate was off? (as in the first difficulty mentioned). Its almost like we want a sync pulse to be sent whenever we start a new move (and would be extra nice to have the Duet send the move duration whenever it starts a move