Wifi 2.1beta6 from 3.5.0-rc.2/3 still disconnecting
-
Glad to hear that you are recovered from your sick leave. I was on vacation in the meantime so I was not "waiting" for a reply.
Please give me some days to test with the new WiFi firmware. Do you remember that I had a other printer with RC3?:
2: RC3 without problem, without PenalDue and not printing (VCast)That one is printing since yesterday and has developed the same problem since this morning. I think that I will upgrade this printer to beta7 first. I will use that printer more frequently in the next days, please let me know if you want me to stick on the other printer which had the problem first.
I have to admit that I'm more than happy that both printers with beta6 have the same problem now, I was a bit concerned about the observation that one encountered the problem while the other one was fine.Cheers, Chriss
-
@rechrtb I have no access to the file. I requested it a minute ago.
-
@Chriss Granted you access to the file. Tell me if you still have problems accessing it.
Regarding your question, I would advise to put beta7 on the board on which the issue seems to manifest most often.
I would also advise putting the two boards near each other if you can, so that they roughly get the same wifi signal strength, same wifi devices in proximity, etc. I recommend moving the beta6 board to the beta7 board location (again, because this might be a 'goldilocks' location w/ respect to the access point for the issue to manifest more frequently).
-
Cheers, I have the file. It seems to me that the printer I use is getting the error... Let me see... I print on both at the moment. I will wait till tomorrow and I hope that one of them will be in the failed state than. This is the chosen one than.
The printer stand next to each other and the AP is in the same room about 5m away.
Do you want me to to the drill via the serial interface on the board with the new firmware too? Or do I need to do if the new fw will have the same problem? (And I hope that this is totally hypothetic because you found the problem and fixed it!)
Cheers, Chriss
-
Do you want me to to the drill via the serial interface on the board with the new firmware too?
For now, not yet. Only when beta7 also displays the same issues.
-
@rechrtb OK, cool for me... My apologies, it took me almost a day to get the printer back into the failed state. Just to make sure that the problem is still present after the very latest reboot of my WiFi infra.
So we have:
now. I will print for a while now, but I can not tell you: Yes it is working now.
Simply because the problem does not show up frequently. I had the impression that it happens at least one in 24h. But the very last issue came up after more than 30h. So when could we say: "Yes solved" than?Do you want to tell us how you have fixed it? Is it by doing a full reset of the WiFi module after a connection lost? Or was there a real issue with the board firmware? (I'm just curious because the disconnects are not new, the not recovery was new.)
I will update the thread as soon as the problem is back or I will ping you on Friday or Monday when I have the feeling that the problem is gone for good.
Cheers, Chriss
-
@Chriss said in Wifi 2.1beta6 from 3.5.0-rc.2/3 still disconnecting:
Do you want to tell us how you have fixed it? Is it by doing a full reset of the WiFi module after a connection lost? Or was there a real issue with the board firmware? (I'm just curious because the disconnects are not new, the not recovery was new.)
From a conversation we had with @rechrtb
Ok, I think I may have found a fix to the issue. The reason I say 'may' is because as you might imagine, this issue seems to be very intermittent - I have only been able to reproduce it that one time last week.
But in trying to debug this problem, I inserted a bunch of debug printf's that I was able to get the same symptoms namely:- board seems to disconnect and never reconnect again - unless module is disabled and re-enabled
- module led is still on, but board is unpingable
- repeating "responsebusy" and "bad recv status size"
Ok, so the issue I found is that one of the tasks block indefinitely on https://github.com/Duet3D/WiFiSocketServerRTOS/blob/dev/src/Connection.cpp#L694.
Inserting the printf's must've slowed things down enough that simulated the connectionQueue to be backed up. There is supposed to be a task that consumes events from this queue, but since this callback occurs on the lwip task - if that consumer task calls and lwip function, it might also lock up.
Increasing the queue size seems to have alleviated the issue. I have set the size to MaxConnections * 3 , since there are three types of connection events that can be enqueued in connectionQueue :
Accept, Close and Terminate.
That said, it is probably still needed to verify if this is the issue @Chriss encountered. Since they have multiple boards, I'll probably advise them to load this firmware onto one of the boards, while the other retains the current firmware - to see if the 'fixed' version has reduced occurrences of the disconnects.
Though long term, I'll probably think of potentially better ways to refactor this part of the code.The relevant fix is here: https://github.com/Duet3D/WiFiSocketServerRTOS/commit/0f8bdc18f2968ee357cdb09d1319590abb7cdd08
Ian
-
@Chriss Hello, as in @droftarts response, I might have managed to re-create the instance in which the WiFi module firmware locks up - which is consistent with the symptoms you displayed. With the fix, I wasn't able to recreate the lock up anymore 'artificially'. Now, we wait to confirm maybe we can't recreate the lock-up 'naturally'.
As it is a very intermittent issue, it's hard to say when we say the issue is fixed. It has to be long term test, but when the beta6 board encounters the issue and beta7 still does not, I think it can be a good sign.
-
Thanks for the information, I appreciate that very much.
I was very busy during the last days so I skipped to tell you: "Yes, it still works" every day.What I can tell so far is that the error is gone. I was printing a lot with the beta7 board in since the upgrade (many under 30minutes prints) and the WiFi connection was very stable. I would vote with a "hump up" and would say that the problem is gone.
I work with my beta6 board at the moment a lot (IDEX setup is a pain) and I saw the problem here twice since yesterday evening.
I guess you guys want to close the case now and release beta7 officially. Thank you very much for your good support, I felt very compy during the process. And I'm more than happy that it was not a stupid wrong config on my site this time.
Cheers, Chriss
-
@Chriss I'm not quite sure why this thread is in the STM category. I'll move it to the beta firmware category.
Ian
-
-
@droftarts Hahaha... I started it in the Beta, somebody moved it to here.
-
-
-
-
@droftarts Maybe my bad than. I only remember the one of my threads where moved, maybe a other one.
Thanks you very much! Glad that we found it and it is stable now and all of us can concentrate on other things.
-
@rechrtb Do you give me the 2.1 beta7, please? I have terrible issues with my connection. That is so bad, that I have to restart the printer on every time.
-
@jensus11 here's a copy. DuetWiFiServer_beta7.bin
-
Thanks, for the first time it looks really better.