Wifi 2.1beta6 from 3.5.0-rc.2/3 still disconnecting
-
Cheers, I have the file. It seems to me that the printer I use is getting the error... Let me see... I print on both at the moment. I will wait till tomorrow and I hope that one of them will be in the failed state than. This is the chosen one than.
The printer stand next to each other and the AP is in the same room about 5m away.
Do you want me to to the drill via the serial interface on the board with the new firmware too? Or do I need to do if the new fw will have the same problem? (And I hope that this is totally hypothetic because you found the problem and fixed it!)
Cheers, Chriss
-
Do you want me to to the drill via the serial interface on the board with the new firmware too?
For now, not yet. Only when beta7 also displays the same issues.
-
@rechrtb OK, cool for me... My apologies, it took me almost a day to get the printer back into the failed state. Just to make sure that the problem is still present after the very latest reboot of my WiFi infra.
So we have:
now. I will print for a while now, but I can not tell you: Yes it is working now.
Simply because the problem does not show up frequently. I had the impression that it happens at least one in 24h. But the very last issue came up after more than 30h. So when could we say: "Yes solved" than?Do you want to tell us how you have fixed it? Is it by doing a full reset of the WiFi module after a connection lost? Or was there a real issue with the board firmware? (I'm just curious because the disconnects are not new, the not recovery was new.)
I will update the thread as soon as the problem is back or I will ping you on Friday or Monday when I have the feeling that the problem is gone for good.
Cheers, Chriss
-
@Chriss said in Wifi 2.1beta6 from 3.5.0-rc.2/3 still disconnecting:
Do you want to tell us how you have fixed it? Is it by doing a full reset of the WiFi module after a connection lost? Or was there a real issue with the board firmware? (I'm just curious because the disconnects are not new, the not recovery was new.)
From a conversation we had with @rechrtb
Ok, I think I may have found a fix to the issue. The reason I say 'may' is because as you might imagine, this issue seems to be very intermittent - I have only been able to reproduce it that one time last week.
But in trying to debug this problem, I inserted a bunch of debug printf's that I was able to get the same symptoms namely:- board seems to disconnect and never reconnect again - unless module is disabled and re-enabled
- module led is still on, but board is unpingable
- repeating "responsebusy" and "bad recv status size"
Ok, so the issue I found is that one of the tasks block indefinitely on https://github.com/Duet3D/WiFiSocketServerRTOS/blob/dev/src/Connection.cpp#L694.
Inserting the printf's must've slowed things down enough that simulated the connectionQueue to be backed up. There is supposed to be a task that consumes events from this queue, but since this callback occurs on the lwip task - if that consumer task calls and lwip function, it might also lock up.
Increasing the queue size seems to have alleviated the issue. I have set the size to MaxConnections * 3 , since there are three types of connection events that can be enqueued in connectionQueue :
Accept, Close and Terminate.
That said, it is probably still needed to verify if this is the issue @Chriss encountered. Since they have multiple boards, I'll probably advise them to load this firmware onto one of the boards, while the other retains the current firmware - to see if the 'fixed' version has reduced occurrences of the disconnects.
Though long term, I'll probably think of potentially better ways to refactor this part of the code.The relevant fix is here: https://github.com/Duet3D/WiFiSocketServerRTOS/commit/0f8bdc18f2968ee357cdb09d1319590abb7cdd08
Ian
-
@Chriss Hello, as in @droftarts response, I might have managed to re-create the instance in which the WiFi module firmware locks up - which is consistent with the symptoms you displayed. With the fix, I wasn't able to recreate the lock up anymore 'artificially'. Now, we wait to confirm maybe we can't recreate the lock-up 'naturally'.
As it is a very intermittent issue, it's hard to say when we say the issue is fixed. It has to be long term test, but when the beta6 board encounters the issue and beta7 still does not, I think it can be a good sign.
-
Thanks for the information, I appreciate that very much.
I was very busy during the last days so I skipped to tell you: "Yes, it still works" every day.What I can tell so far is that the error is gone. I was printing a lot with the beta7 board in since the upgrade (many under 30minutes prints) and the WiFi connection was very stable. I would vote with a "hump up" and would say that the problem is gone.
I work with my beta6 board at the moment a lot (IDEX setup is a pain) and I saw the problem here twice since yesterday evening.
I guess you guys want to close the case now and release beta7 officially. Thank you very much for your good support, I felt very compy during the process. And I'm more than happy that it was not a stupid wrong config on my site this time.
Cheers, Chriss
-
@Chriss I'm not quite sure why this thread is in the STM category. I'll move it to the beta firmware category.
Ian
-
-
@droftarts Hahaha... I started it in the Beta, somebody moved it to here.
-
-
-
-
@droftarts Maybe my bad than. I only remember the one of my threads where moved, maybe a other one.
Thanks you very much! Glad that we found it and it is stable now and all of us can concentrate on other things.
-
@rechrtb Do you give me the 2.1 beta7, please? I have terrible issues with my connection. That is so bad, that I have to restart the printer on every time.
-
@jensus11 here's a copy. DuetWiFiServer_beta7.bin
-
Thanks, for the first time it looks really better.