Wifi 2.1beta6 from 3.5.0-rc.2/3 still disconnecting
-
@Chriss Sorry, didn't read your instructions about it being a tar file. I can read them now! Can't see anything immediately obvious, but hopefully there's something there for @rechrtb
Yes, by "WiFi LED" I mean the green (when on) LED next to the WiFi module, labelled "ESP" on the board silkscreen. The WiFi module itself doesn't have an LED on it.
Ian
-
Good that it is clear now... I did not see any unexpected too. The module looks good, the AP thinks that there is a connection with the module etc. The only thing is that the module does not react anymore.
Thanks for the confirmation, we spoke about the same LED than. And it was on.. I was a bit confused by the wiring diagram it looks slightly as the LED is on the module but it is next to it and there is "ESP" printed next to it.
Please let me I will reboot the board now and will wait for the next disconnect to do the same drill, this time without the typo, hopefully. Let me know if I can stop that drill if you think that you have enough information.
Btw: The disconnect happened while the printer was not printing. Just fyi
Cheers, Chriss
-
@droftarts @gloomyandy @rechrtb
Good morning... The printer is again in the failed state.... Before I forget it: The WiFi LED is blinking
It was off after "M552 s-1" (not a surprise) and turned back on after "M552 s1" which is not a surprise too. My apologies, I was not precise enough in the past about the LED, I wrote about "on or off" only. Bot about the blinking.
Do you guys want to have the log files again? (I can tell you so far that they do not look different.)
Please let me know, I will be off for 6 days from tomorrow early morning.
Cheers, Chriss
-
@Chriss Yes, please post the logs, as this is different from what I have experienced (WiFi LED stays lit) and what others have experienced (lots of 'ResponseBusy' messages in the serial console). So any information would be useful. Hopefully you did a M122 from the serial console while it was in the failed state?
Ian
-
You will find the file attached, same drill as last time.
no3.binI'm not so convinced anymore that the WiFi LED was permanently on last time. It could be that it was blinking... My apologies but is was early in the morning before my first cup of tee and I wrote the message a bit later.
And one more thing: I have not seen that before... I the M552 drill, and the board is pingable since than. But I did not tried to reach a higher layer. I tried to download the enventlog for you the FTP server did not reacted really:
The port is obviously open but the application seams not to react anymore. Telnet has very much the same behaviour:
And the webui is very much the same, I can curl it, I get an connection but the daemon behind the port is not responding. I had to cut the power to solve that.
I have here a other mini 5+ on my work bench which does literally nothing. There is the stepper extension board attached but no sensor, no stepper. What do you think? Would it make sense to push RC3 to that board and copy the config, just to see whether the problem will pup up there too? Just to remove the dependency from my physical printer.
And I may be able to connect that board to a other wifi to remove that variable too.Or we test with a very simple config file to strike that out. But I think that I should do all of that test with my test rick than with my main printer.
Cheers, Chriss
-
@Chriss I haven't seen your config.g in a while, perhaps you can post that too? I assume FTP and telnet are turned on in that? That would use up a couple more sockets, though they are supposed to be released when not in use, for other connections. I'll pass on all the info to @rechrtb for him to look at.
Ian
-
@droftarts Sure, the services are enabled, you will get a "connection refused" if not. And they work after the reboot. I wanted to indicate that it seems that the WiFi modules get stuck when this failed state happens. And the "turn in off and on again" drill did not brought it back to normal operations. It is better (back bingable) but not fully working. I have seen that behaviour often on Un*x aor Linux systems when the OS had not enough resources for the demon behind to port. So the TCP handshake worked but the process behind it was unable to respond.
Here is the current config.g. (Kindly ignore the typos, I'm German)
; Hardware: Duet Mini 5+ ; Toolboard 1.1 LC ; Stepper XY = LDO 0,9° 2Amax LDO-42STH40-2004MAC ; Stepper Z = LDO 1,8° 2Amax LDO-42STH48-2004AC ; Stepper E = LDO 1,8 1Amax LDO-42STH20-1004ASH ; Enable network if {network.interfaces[0].type = "ethernet"} M552 P0.0.0.0 S1 else M552 S1 ; Network M586 P0 S1 ; enable HTTP M586 P1 S1 ; enable FTP M586 P2 S1 ; enable Telnet G90 ; send absolute coordinates... M83 ; ...but relative extruder moves M550 P"v2" ; set printer name M669 K1 ; 1=select CoreXY mode 0=Cadasian ;; Helpful Toolboards commands ; M115 B121 ; Show board 121 ; M997 B121 ; Update tool 121 ; M122 B121 ; Detailed status of toolboard G4 S1 ; wait 1s for expansion boards to start ;;; Drives ;X M569 P0.2 S1 D3 ; physical drive 0.2 goes forward M584 X0.2 ; Map the stepper to X ;Y M569 P0.1 S0 D3 ; physical drive 0.1 goes backward M584 Y0.1 ; Map the stepper to Y ;; Z ; - front left M569 P0.5 S1 D3 ; physical drive 0.5 goes forward ; - front right M569 P0.6 S0 D3 ; physical drive 0.6 goes backward ; - back right M569 P0.0 S1 D3 ; physical drive 0.0 goes forward ; - back left M569 P0.4 S0 D3 ; physical drive 0.4 goes backward M584 Z0.5:0.4:0.0:0.6 ; Mapping ;; E M569 P121.0 S0 D3 ; Extruder stepper goes backward M584 E121.0 ; Map the E stepper to E ; Stepper settings M350 X16 Y16 Z16 E32 I1 ; configure microstepping with interpolation M92 X160 Y160 Z400 E823 ; set steps per mm (800 from manuall, measured 823 M98 P"/macros/print_scripts/speed_printing.g" ; Accelerations and speed M906 X1400 Y1400 Z1000 E700 I30 ; set motor currents (mA) and motor idle factor in per cent (E stepper max 1A) M84 S120 ; Idle timeout ; Axis Limits M208 X0 Y0 Z0 S1 ; set axis minima M208 X250 Y258 Z210 S0 ; set axis maxima ;; Endstops -- Display status with: M119 M574 Y2 S1 P"0.io5.in" ; Y M574 X2 S1 P"!0.io6.in" ; X M574 Z0 P"nil" ; No endstop we have the switch and a probe M574 Z1 S2 ; configure Z-probe endstop for low end on Z ; Z probe M98 P"/macros/print_scripts/activate_z_probe.g" ; Z-level settings ;M671 X-75:-75:288:289 Y0:320:320:0 S20 ; Define Z belts locations (Front_Left, Back_Left, Back_Right, Front_Right) ;M671 X-75:-75:288:289 Y0:328:328:0 S20 ; Define Z belts locations (Front_Left, Back_Left, Back_Right, Front_Right) M671 X-75:-75:288:289 Y0:358:358:0 S20 ; Define Z belts locations (Front_Left, Back_Left, Back_Right, Front_Right) ;; Define the mesh ;M557 X5:245 Y22:245 S35 ; spacing ;M557 X5:245 Y22:245 P9 ; grid (points per axis) M557 X5:245 Y22:220 P9 ; grid (points per axis) ;; Heaters :: Tune with: M303 H0 S110 ; Bed M308 S0 P"0.temp0" Y"thermistor" A"Bed" T100000 B4138 ; configure sensor 0 as thermistor on pin temp0 M950 H0 C"out5+out6" T0 Q10 ; create bed heater outputs for both SSRs on out0 and map it to sensor 0 M307 H0 B0 S1.00 ; disable bang-bang mode for the bed heater and set PWM limit M140 H0 ; map heated bed to heater 0 M143 H0 S120 ; set temperature limit for heater 0 to 120C ;; Bed Corner temp sensor (2=Orange, 3=Brown, 4=Green, 5=Yellow, 6=Purple 7=Black, ) ; Configure Bed corner temp sensor as thermistor on pin temp2 M308 S5 P"0.temp2" Y"thermistor" A"Bed-Corner" T100000 B4138 ; Hotend ; Tune in with: M303 H1 S270 (270=Temp) (M500 to save) ; Show current settings M307 H1 ;M308 S1 P"121.temp0" Y"thermistor" A"Hotend" T500000 B4702 C1.171057e-7 ; configure sensor 1 as thermistor on pin temp1 Mosquito ;M308 S1 P"121.temp0" A"Hotend" Y"thermistor" T100000 B4725 C7.06e-8 ; define E0 temperature sensor Rapido Argo M308 S1 P"121.temp0" A"Hotend" Y"thermistor" T100000 B4725 C7.060000e-8 ; define E0 temperature sensor e3d revo M950 H1 C"121.out0" T1 ; create nozzle heater output on 0.out3 and map it to sensor 1 M143 H1 S300 ; set temperature limit for heater 1 to 300C ;; Fans ; Fan for the printed part: M950 F0 C"121.out1" Q500 ; create fan 0 on pin 0.out9 and set its frequency M106 P0 S0 H-1 C"Part" ; set fan 0 value. Thermostatic control is turned off ; Fan for the Hotend: M950 F1 C"121.out2" Q500 ; create fan 1 on pin 0.out9 and set its frequency M106 P1 S1 H1 T45 C"Hotend" ; P="set fan 1" S="value" H="Thermostatic control Heater No." T=" is turned on at 45°C" ;; Tool M563 P0 S"Tool" D0 H1 F0 ; define tool G10 P0 X0 Y0 Z0 ; set tool 0 axis offsets G10 P0 R0 S0 ; set initial tool 0 active and standby temperatures to 0C ; Filament sensor : Status M591 D0 ;M591 D0 P7 C"io4.in" L7 R50:150 E5 S0 ;pulse, disabled, 7 mm/pulse, measure every 22 sec, minimum 50 maximum 250, S1 = Enabled S0 = Disabled ;M591 D0 P1 C"io4.in" S1 M950 J3 C"!io4.in" ; Create a trigger on io4.in (NC) M581 P3 T3 S0 R1 ; R1=Trigger only while printing ;; Chamber temp sensor M308 S4 P"0.temp1" Y"thermistor" A"Chamber" T100000 B4138 ; configure Chamber temp sensor as thermistor on pin temp1 ;; Input Shaping ; Accelerometer https://duet3d.dozuki.com/Wiki/Input_shaping M955 P121.0 I05 ; specify orientation of accelerometer on Toolboard 1LC with CAN address 121 ; Input Shaping ;M593 P"zvd" F40.5 ; use ZVD input shaping to cancel ringing at 40.5Hz ;M593 P"none" ; disable input shaping ;M593 P"custom" H0.4:0.7 T0.0135:0.0135 ; use custom input shaping ; PA https://duet3d.dozuki.com/Wiki/Pressure_advance M572 D0 S0.025 ;;;;;;;;;;;; Setup Only ;M564 S0 H0 ; Allow movement over the endstops ;M302 P1 ; allow cold extrusion ;M302 S1 ; deny cold extrusion ;;;;;;;;;;;; Setup Only END ;; Case Cooling ; Temps M308 S9 P"mcu-temp" Y"mcu-temp" A"Mainboard" ; define sensor 9 to be mcu temperature ; Case Fans M950 F3 C"!0.out3" Q50 ; Fan on out3 ground on top pin, plus on 3rd pin from top (V_OUTLC1) M106 P3 C"Base" S120 ; Setup the FAN and slow it down ; Nevermore m950 F4 C"0.out0" Q50 m106 P4 C"Nevermore" S0 ; Define the LED stripe and turn it off M950 F5 C"0.out1" Q100 ; LED on out1 M106 P5 C"LED" S0 ; Make sure that the LEDs are off ; Trigger on the toolboard ;#M950 J5 C"^121.button0" ;#M581 P4 T5 S0 ;######################################## M950 J1 C"^0.io1.in" M581 P1 T2 S0 ;M572 D0 S0.037 ; Set preasure Advance Gemessen M501 ; Load config-override.g ;; Serial interface ; Duet M575 P1 S1 B57600 ;;;;;; Old Display ;M575 P1 B115200 S1 ;; Mini 12864 ;M918 P2 ;M918 P2 E4 R3 C100 ;M150 X2 R255 U255 B255 S3 ; set all 3 LEDs to white ;M150 X2 R0 U255 B0 S3 ; set all 3 LEDs to red T0 ; Select the tool 0 as default ; Make sure that all heaters are off M104 S0 ; Extruder temp to 0 M568 P0 A0 ; Extruder heater off M140 S0 ; Set the bed temp to 0 M140 S-276 ; Bed heater off ; Some variables for later global tool_temp_initial=0 global bed_temp_initial=0 global debug=false ; AutoZ global klicky_home=true global qgl_done=false global nozzle_cleaned=false global Zswitch_homed=false global probetype="euclid" global clickystatus = "none" global probe_settingsH=10 global probe_settingsA=1 global autoz_temp2=20 # Stealthburner LEDs: global sb_leds="n-off" M98 P"/macros/sb_leds/sb_leds.g" set global.sb_leds="hot" set global.sb_leds="n-off" ;set global.sb_logo="red" ;set global.sb_leds="n-off" ;global sb_nozzle="off" ; M307 H0 R0.327 C227.635:227.635 D5.48 S1.00 V24.4 B0 I0 ; R altered for a firmware bug ; EOF[chriss@leela sys]$
-
@Chriss Hello, I might have a fix for this issue. But as you can guess, this issue seems to be intermittent and highly network dependent. So I'd like your help in order to verify it really works.
Are you able to setup two boards:
- One board has 2.1beta6 and 3.5rc3 from the release https://github.com/Duet3D/RepRapFirmware/releases/tag/3.5.0-rc.3
- The other board also has 3.5rc3 from the release, but has experimental wifi server firmware: https://drive.google.com/file/d/1NgssWNSS3xGL99hWwfgYbY5YVXF-f4jH/view?usp=drive_link.
Please verify the versions are correct for each board; for the first board it should say "2.1beta6" and on the second one it should be "2.1beta7".
The idea is simple - to run and use these board normally and see if the 2.1beta7 board has none of these disconnections you previously encountered, compared to the 2.1beta6 one.
Sorry by the way for the delay on this issue. I was sick for the last few days (from last week) and was only able to resume work yesterday.
-
Glad to hear that you are recovered from your sick leave. I was on vacation in the meantime so I was not "waiting" for a reply.
Please give me some days to test with the new WiFi firmware. Do you remember that I had a other printer with RC3?:
2: RC3 without problem, without PenalDue and not printing (VCast)That one is printing since yesterday and has developed the same problem since this morning. I think that I will upgrade this printer to beta7 first. I will use that printer more frequently in the next days, please let me know if you want me to stick on the other printer which had the problem first.
I have to admit that I'm more than happy that both printers with beta6 have the same problem now, I was a bit concerned about the observation that one encountered the problem while the other one was fine.Cheers, Chriss
-
@rechrtb I have no access to the file. I requested it a minute ago.
-
@Chriss Granted you access to the file. Tell me if you still have problems accessing it.
Regarding your question, I would advise to put beta7 on the board on which the issue seems to manifest most often.
I would also advise putting the two boards near each other if you can, so that they roughly get the same wifi signal strength, same wifi devices in proximity, etc. I recommend moving the beta6 board to the beta7 board location (again, because this might be a 'goldilocks' location w/ respect to the access point for the issue to manifest more frequently).
-
Cheers, I have the file. It seems to me that the printer I use is getting the error... Let me see... I print on both at the moment. I will wait till tomorrow and I hope that one of them will be in the failed state than. This is the chosen one than.
The printer stand next to each other and the AP is in the same room about 5m away.
Do you want me to to the drill via the serial interface on the board with the new firmware too? Or do I need to do if the new fw will have the same problem? (And I hope that this is totally hypothetic because you found the problem and fixed it!)
Cheers, Chriss
-
Do you want me to to the drill via the serial interface on the board with the new firmware too?
For now, not yet. Only when beta7 also displays the same issues.
-
@rechrtb OK, cool for me... My apologies, it took me almost a day to get the printer back into the failed state. Just to make sure that the problem is still present after the very latest reboot of my WiFi infra.
So we have:
now. I will print for a while now, but I can not tell you: Yes it is working now.
Simply because the problem does not show up frequently. I had the impression that it happens at least one in 24h. But the very last issue came up after more than 30h. So when could we say: "Yes solved" than?Do you want to tell us how you have fixed it? Is it by doing a full reset of the WiFi module after a connection lost? Or was there a real issue with the board firmware? (I'm just curious because the disconnects are not new, the not recovery was new.)
I will update the thread as soon as the problem is back or I will ping you on Friday or Monday when I have the feeling that the problem is gone for good.
Cheers, Chriss
-
@Chriss said in Wifi 2.1beta6 from 3.5.0-rc.2/3 still disconnecting:
Do you want to tell us how you have fixed it? Is it by doing a full reset of the WiFi module after a connection lost? Or was there a real issue with the board firmware? (I'm just curious because the disconnects are not new, the not recovery was new.)
From a conversation we had with @rechrtb
Ok, I think I may have found a fix to the issue. The reason I say 'may' is because as you might imagine, this issue seems to be very intermittent - I have only been able to reproduce it that one time last week.
But in trying to debug this problem, I inserted a bunch of debug printf's that I was able to get the same symptoms namely:- board seems to disconnect and never reconnect again - unless module is disabled and re-enabled
- module led is still on, but board is unpingable
- repeating "responsebusy" and "bad recv status size"
Ok, so the issue I found is that one of the tasks block indefinitely on https://github.com/Duet3D/WiFiSocketServerRTOS/blob/dev/src/Connection.cpp#L694.
Inserting the printf's must've slowed things down enough that simulated the connectionQueue to be backed up. There is supposed to be a task that consumes events from this queue, but since this callback occurs on the lwip task - if that consumer task calls and lwip function, it might also lock up.
Increasing the queue size seems to have alleviated the issue. I have set the size to MaxConnections * 3 , since there are three types of connection events that can be enqueued in connectionQueue :
Accept, Close and Terminate.
That said, it is probably still needed to verify if this is the issue @Chriss encountered. Since they have multiple boards, I'll probably advise them to load this firmware onto one of the boards, while the other retains the current firmware - to see if the 'fixed' version has reduced occurrences of the disconnects.
Though long term, I'll probably think of potentially better ways to refactor this part of the code.The relevant fix is here: https://github.com/Duet3D/WiFiSocketServerRTOS/commit/0f8bdc18f2968ee357cdb09d1319590abb7cdd08
Ian
-
@Chriss Hello, as in @droftarts response, I might have managed to re-create the instance in which the WiFi module firmware locks up - which is consistent with the symptoms you displayed. With the fix, I wasn't able to recreate the lock up anymore 'artificially'. Now, we wait to confirm maybe we can't recreate the lock-up 'naturally'.
As it is a very intermittent issue, it's hard to say when we say the issue is fixed. It has to be long term test, but when the beta6 board encounters the issue and beta7 still does not, I think it can be a good sign.
-
Thanks for the information, I appreciate that very much.
I was very busy during the last days so I skipped to tell you: "Yes, it still works" every day.What I can tell so far is that the error is gone. I was printing a lot with the beta7 board in since the upgrade (many under 30minutes prints) and the WiFi connection was very stable. I would vote with a "hump up" and would say that the problem is gone.
I work with my beta6 board at the moment a lot (IDEX setup is a pain) and I saw the problem here twice since yesterday evening.
I guess you guys want to close the case now and release beta7 officially. Thank you very much for your good support, I felt very compy during the process. And I'm more than happy that it was not a stupid wrong config on my site this time.
Cheers, Chriss
-
@Chriss I'm not quite sure why this thread is in the STM category. I'll move it to the beta firmware category.
Ian
-
-
@droftarts Hahaha... I started it in the Beta, somebody moved it to here.
-
-
-