Restart DuetControlServer (SBC)
-
Was it working before at any point? Did anything recently change?
Could you try a fresh image of Duet pi and updating again?
https://pkg.duet3d.com/DuetPi-lite.zip
https://pkg.duet3d.com/DuetPi.zip
Can you check what firmware is installed on the Duet via USB?
-
Reseated the ribbon cable...no difference.
No, it's never connected (duet status: Off) when the pi has been running before the Duet has power. Just to be clear: when duet is powered and I reboot the Pi or restart the duetcontrolservice it will connect.
Duet 3 firmware is 3.2.2 (read via usb)
Redid the DuetPi image to SD card (image verified). Had the duet off and booted the pi to go through the first time initialization. Turned the duet on and it connected. Turned the duet off and waited a little bit and turned it back on and it didn't connect. Turned off duet and rebooted Pi, turned on the duet and it connected.
It looks like it will do it the first time the pi is rebooted but not after that. Also did the apt upgrades after the above, also no change.
-
Is it possible that after the first time it waits to connect and then the duet is off that it doesn't check/wait again?
-
DSF does indeed wait for the Duet 3 to up and ready. In your case it looks like the might be dodgy connection between the Duet and the Pi which could cause the warnings from your log (all 0x00 or 0xFF could mean a timing issue on the TfrRdy pin).
Can you post the output of M122 when connected over USB to the Duet (with the Pi attached)?
-
With Pi attached and running and Duet off.
Plugged in the USB cable to the Duet, waited a few minutesSENT: M122 READ: === Diagnostics === READ: RepRapFirmware for Duet 3 MB6HC version 3.2.2 running on Duet 3 MB6HC v1.01 or later (SBC mode) READ: Board ID: 08DJM-956L2-G43S8-6J9D6-3S46L-KU2YD READ: Used output buffers: 1 of 40 (1 max) READ: === RTOS === READ: Static ram: 149788 READ: Dynamic ram: 60988 of which 0 recycled READ: Never used RAM 148056, free system stack 190 words READ: Tasks: Linux(ready,423) HEAT(blocked,368) CanReceiv(blocked,927) CanSender(blocked,371) CanClock(blocked,360) TMC(blocked,65) MAIN(running,1281) IDLE(ready,19) READ: Owned mutexes: USB(MAIN) READ: === Platform === READ: Last reset 00:00:33 ago, cause: power up READ: Last software reset details not available READ: Error status: 0x00 READ: Aux0 errors 0,0,0 READ: Aux1 errors 0,0,0 READ: MCU temperature: min 35.8, current 43.4, max 43.4 READ: Supply voltage: min 0.2, current 0.2, max 0.3, under voltage events: 0, over voltage events: 0, power good: no READ: 12V rail voltage: min 0.1, current 0.2, max 0.2, under voltage events: 0 READ: Driver 0: position 0, ok, reads 0, writes 0 timeouts 0, SG min/max 0/0 READ: Driver 1: position 0, ok, reads 0, writes 0 timeouts 0, SG min/max 0/0 READ: Driver 2: position 0, ok, reads 0, writes 0 timeouts 0, SG min/max 0/0 READ: Driver 3: position 0, ok, reads 0, writes 0 timeouts 0, SG min/max 0/0 READ: Driver 4: position 0, ok, reads 0, writes 0 timeouts 0, SG min/max 0/0 READ: Driver 5: position 0, ok, reads 0, writes 0 timeouts 0, SG min/max 0/0 READ: Date/time: 1970-01-01 00:00:00 READ: Slowest loop: 0.10ms; fastest: 0.00ms READ: === Storage === READ: Free file entries: 10 READ: SD card 0 not detected, interface speed: 37.5MBytes/sec READ: SD card longest read time 0.0ms, write time 0.0ms, max retries 0 READ: === Move === READ: DMs created 125, maxWait 0ms, bed compensation in use: none, comp offset 0.000 READ: === MainDDARing === READ: Scheduled moves 0, completed moves 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 READ: === AuxDDARing === READ: Scheduled moves 0, completed moves 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 READ: === Heat === READ: Bed heaters = -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1 READ: === GCodes === READ: Segments left: 0 READ: Movement lock held by null READ: HTTP is idle in state(s) 0 READ: Telnet is idle in state(s) 0 READ: File is idle in state(s) 0 READ: USB is ready with "M122" in state(s) 0 READ: Aux is idle in state(s) 0 READ: Trigger is idle in state(s) 0 READ: Queue is idle in state(s) 0 READ: LCD is idle in state(s) 0 READ: SBC is idle in state(s) 0 READ: Daemon is idle in state(s) 0 READ: Aux2 is idle in state(s) 0 READ: Autopause is idle in state(s) 0 READ: Code queue is empty. READ: === CAN === READ: Messages queued 0, send timeouts 165, received 0, lost 0, longest wait 0ms for reply type 0, free buffers 48 READ: === SBC interface === READ: State: 0, failed transfers: 0 READ: Last transfer: 33171ms ago READ: RX/TX seq numbers: 0/1 READ: SPI underruns 0, overruns 0 READ: Number of disconnects: 0, IAP RAM available 0x2c8a8 READ: Buffer RX/TX: 0/0-0 READ: ok
-
Oh, and the above message was with pi dwc saying the Duet status is Off.
And just incase you want to see what it says when the status is Idle:
SENT: m122 READ: === Diagnostics === READ: RepRapFirmware for Duet 3 MB6HC version 3.2.2 running on Duet 3 MB6HC v1.01 or later (SBC mode) READ: Board ID: 08DJM-956L2-G43S8-6J9D6-3S46L-KU2YD READ: Used output buffers: 2 of 40 (17 max) READ: === RTOS === READ: Static ram: 149788 READ: Dynamic ram: 63684 of which 76 recycled READ: Never used RAM 145284, free system stack 146 words READ: Tasks: Linux(ready,81) HEAT(blocked,297) CanReceiv(blocked,834) CanSender(blocked,352) CanClock(blocked,352) TMC(blocked,19) MAIN(running,1147) IDLE(ready,19) READ: Owned mutexes: USB(MAIN) READ: === Platform === READ: Last reset 00:03:50 ago, cause: power up READ: Last software reset details not available READ: Error status: 0x00 READ: Aux0 errors 0,0,0 READ: Aux1 errors 0,0,0 READ: MCU temperature: min 40.3, current 40.5, max 40.7 READ: Supply voltage: min 31.2, current 31.3, max 31.3, under voltage events: 0, over voltage events: 0, power good: yes READ: 12V rail voltage: min 11.9, current 12.0, max 12.0, under voltage events: 0 READ: Driver 0: position 800, standstill, reads 45086, writes 3 timeouts 0, SG min/max 0/298 READ: Driver 1: position 800, standstill, reads 45087, writes 3 timeouts 0, SG min/max 0/263 READ: Driver 2: position 0, standstill, reads 45087, writes 3 timeouts 0, SG min/max 0/160 READ: Driver 3: position 0, standstill, reads 45086, writes 3 timeouts 0, SG min/max 0/169 READ: Driver 4: position 0, standstill, reads 45086, writes 3 timeouts 0, SG min/max 0/207 READ: Driver 5: position 0, standstill, reads 45086, writes 3 timeouts 0, SG min/max 0/208 READ: Date/time: 2021-03-01 09:45:31 READ: Slowest loop: 88.98ms; fastest: 0.04ms READ: === Storage === READ: Free file entries: 10 READ: SD card 0 not detected, interface speed: 37.5MBytes/sec READ: SD card longest read time 0.0ms, write time 0.0ms, max retries 0 READ: === Move === READ: DMs created 125, maxWait 206056ms, bed compensation in use: none, comp offset 0.000 READ: === MainDDARing === READ: Scheduled moves 10, completed moves 10, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 READ: === AuxDDARing === READ: Scheduled moves 0, completed moves 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1 READ: === Heat === READ: Bed heaters = 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = -1 -1 -1 -1 READ: === GCodes === READ: Segments left: 0 READ: Movement lock held by null READ: HTTP* is idle in state(s) 0 READ: Telnet is idle in state(s) 0 READ: File is idle in state(s) 0 READ: USB is ready with "m122" in state(s) 0 READ: Aux is idle in state(s) 0 READ: Trigger* is idle in state(s) 0 READ: Queue is idle in state(s) 0 READ: LCD is idle in state(s) 0 READ: SBC is idle in state(s) 0 READ: Daemon is idle in state(s) 0 READ: Aux2 is idle in state(s) 0 READ: Autopause is idle in state(s) 0 READ: Code queue is empty. READ: === Filament sensors === READ: Extruder 0 sensor: no filament READ: Extruder 1 sensor: no filament READ: === CAN === READ: Messages queued 360, send timeouts 0, received 2232, lost 0, longest wait 0ms for reply type 0, free buffers 48 READ: === SBC interface === READ: State: 4, failed transfers: 0 READ: Last transfer: 0ms ago READ: RX/TX seq numbers: 7559/7559 READ: SPI underruns 0, overruns 0 READ: Number of disconnects: 0, IAP RAM available 0x2c8a8 READ: Buffer RX/TX: 0/0-0 READ: ok
-
@BeosDoc Can you take a look at the syslog file (/var/log/syslog) and see if you have any lines that look similar to:
systemd[1]: duetcontrolserver.service: Start request repeated too quickly.
-
Yes there is, only 1 occurrence. I have seen it before the other day as well.
Mar 1 09:39:52 duet3 DuetControlServer[30541]: [info] Settings loaded Mar 1 09:39:52 duet3 DuetControlServer[30541]: [info] Environment initialized Mar 1 09:39:53 duet3 DuetControlServer[30541]: [fatal] Could not connect to Duet (Timeout while waiting for transfer ready pin) Mar 1 09:39:53 duet3 systemd[1]: duetcontrolserver.service: Failed with result 'protocol'. Mar 1 09:39:53 duet3 systemd[1]: Failed to start Duet Control Server. Mar 1 09:39:53 duet3 systemd[1]: duetcontrolserver.service: Service RestartSec=100ms expired, scheduling restart. Mar 1 09:39:53 duet3 systemd[1]: duetcontrolserver.service: Scheduled restart job, restart counter is at 5. Mar 1 09:39:53 duet3 systemd[1]: Stopped Duet Control Server. Mar 1 09:39:53 duet3 systemd[1]: duetcontrolserver.service: Start request repeated too quickly. Mar 1 09:39:53 duet3 systemd[1]: duetcontrolserver.service: Failed with result 'protocol'. Mar 1 09:39:53 duet3 systemd[1]: Failed to start Duet Control Server. Mar 1 09:39:53 duet3 DuetWebServer[589]: #033[40m#033[1m#033[33mwarn#033[39m#033[22m#033[49m: DuetWebServer.Services.ModelObserver[0] Mar 1 09:39:53 duet3 DuetWebServer[589]: Failed to synchronize machine model Mar 1 09:39:53 duet3 DuetWebServer[589]: System.Net.Internals.SocketExceptionFactory+ExtendedSocketException (111): Connection refused /var/run/dsf/dcs.sock Mar 1 09:39:53 duet3 DuetWebServer[589]: at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress) Mar 1 09:39:53 duet3 DuetWebServer[589]: at System.Net.Sockets.Socket.Connect(EndPoint remoteEP) Mar 1 09:39:53 duet3 DuetWebServer[589]: at DuetAPIClient.BaseConnection.Connect(ClientInitMessage initMessage, String socketPath, CancellationToken cancel$ Mar 1 09:39:53 duet3 DuetWebServer[589]: at DuetWebServer.Services.ModelObserver.Execute() in /home/christian/Duet3D/DuetSoftwareFramework/src/DuetWebServe$
And when I do a systemctl status duetcontrolserver, duet on and pi duet status of Off, I get:
Mar 01 09:53:30 duet3 DuetControlServer[31177]: [warn] Note: RepRapFirmware didn't receive valid data either (code 0xffffffff) Mar 01 09:53:30 duet3 DuetControlServer[31177]: [warn] Restarting transfer because the number of maximum retries has been exceeded Mar 01 09:53:30 duet3 DuetControlServer[31177]: [warn] Bad header CRC16 (expected 0x0000, got 0x0272) Mar 01 09:53:30 duet3 DuetControlServer[31177]: [warn] Note: RepRapFirmware didn't receive valid data either (code 0xffffffff) Mar 01 09:53:31 duet3 DuetControlServer[31177]: [warn] Bad header CRC16 (expected 0x0000, got 0x0272) Mar 01 09:53:31 duet3 DuetControlServer[31177]: [warn] Note: RepRapFirmware didn't receive valid data either (code 0xffffffff) Mar 01 09:53:31 duet3 DuetControlServer[31177]: [warn] Bad header CRC16 (expected 0x0000, got 0x136c) Mar 01 09:53:31 duet3 DuetControlServer[31177]: [warn] Note: RepRapFirmware didn't receive valid data either (code 0xffffffff) Mar 01 09:53:31 duet3 DuetControlServer[31177]: [warn] Restarting transfer because the number of maximum retries has been exceeded Mar 01 10:00:55 duet3 DuetControlServer[31177]: [warn] RepRapFirmware got a bad header checksum
-
If you get that message it basically means that systemd will no longer restart the duetcontrolserver service.
The systemd documentation is a little tricky to follow but this message is generated if the service needs to be restarted more than a certain number of times over a period of time. In this case I think the defaults are 5 times in 10 seconds. I think this can happen if the Duet is turned off and the transfer ready pin floats high.
Given that the main purpose of the SBC is to talk to the Duet I'm not sure it really makes sense to disable the service in this way. If you are happy making changes to the rPi config files (worse case you just reinstall everything), you can tell systemd not to disable a service in this situation by adding the following line:
StartLimitIntervalSec=0
to the [unit] part of the service control file. In this case that file is:
/etc/systemd/system/multi-user.target.wants/duetcontrolserver.service
if you modify to look like this:
[Unit] Description=Duet Control Server StartLimitIntervalSec=0 [Service] ExecStart=/opt/dsf/bin/DuetControlServer TimeoutStopSec=60 Restart=always Type=notify User=dsf Group=dsf UMask=0002 CapabilityBoundingSet=CAP_SYS_PTRACE CAP_DAC_READ_SEARCH CAP_SYS_TIME AmbientCapabilities=CAP_SYS_PTRACE CAP_DAC_READ_SEARCH CAP_SYS_TIME [Install] WantedBy=sysinit.target
Then the action that disables the service should be turned off. Note that after making the change you should run the following commands:
pi@duet3:~ $ sudo systemctl daemon-reload pi@duet3:~ $ sudo systemctl restart duetcontrolserver
May be a good idea to make a copy of the file before you edit it.
I've seen the service be disabled when updating firmware on the STM32/LPC port of RRF (that I look after). In that case flashing new firmware is different to the Duet and the firmware restart takes longer which can sometimes trigger systemd to disable the control server. Making the above change seems to fix that problem for me. It would be interesting to see if it helps with your problem.
-
@BeosDoc said in Restart DuetControlServer (SBC):
Reseated the ribbon cable...no difference.
Do you have another ribbon cable to test with or possibly make some jumper cables?
@gloomyandy said in Restart DuetControlServer (SBC):
StartLimitIntervalSec=0
I've asked christian to take a look at this. Regardless, the Duet and Pi should be making a connection so the retry shouldn't come into play. Seems like either a bad cable or damaged pins.
-
I don't have that file in /etc/systemd/system/multi-user.target.wants
I did find it in /etc/systemd/system/sysinit.target.wants/
Mine at that location is currently:
[Unit] Description=Duet Control Server [Service] ExecStart=/opt/dsf/bin/DuetControlServer TimeoutStopSec=15 Restart=always Type=notify User=dsf Group=dsf UMask=0002 CapabilityBoundingSet=CAP_SYS_PTRACE CAP_DAC_READ_SEARCH CAP_SYS_TIME AmbientCapabilities=CAP_SYS_PTRACE CAP_DAC_READ_SEARCH CAP_SYS_TIME [Install] WantedBy=sysinit.target
-
@Phaedrux said in Restart DuetControlServer (SBC):
@BeosDoc said in Restart DuetControlServer (SBC):
Reseated the ribbon cable...no difference.
Do you have another ribbon cable to test with or possibly make some jumper cables?
No other ribbon cable, just the one supplied with your board and I have reseated it. I have short jumper wires I can use. I want to verify the required pins: 19,21,22,23,34 and 20 for ground. Is pin 17 (3.3V) and/or Pin 2/4 (5v) required? SBC has it's own power supply and the duet no jumpers on 5V -> SBC and also SBC ->5V
-
@Phaedrux My understanding is that problem occurs when the rPi is started but the Duet board does not have any power. In that situation it is possible that the duetcontrolserver service has been suspended (due to the too many restarts), before the Duet board is even started. If that happens there is no way that the Duet is ever going to connect, because it has nothing to connect to. Certainly if that "duetcontrolserver.service: Start request repeated too quickly" message is in the syslog it would indicate that the service is being suspended. Either way the change is easy to make and then test to see if it helps with the problem.
@BeosDoc That's odd, perhaps it has been moved at some point, my rPi image was created probably a year or so ago (but has been updated continuously since then).
-
To be honest I'm not sure about the 3.3 and 5v pins. Try without if you're short on jumper wires.
-
I'll try the jumpers first. Then I'll try the changes to that file.
-
First part. I replaced the ribbon cable with very short, 9cm (ribbon is 20cm), and brand new jumpers. Initial boot worked. After that it still had the status of Off.
FYI: You do need pin 17 (3.3V) hooked up. Without it, it didn't see it at all.
For future reference Duet 3 > rPi you need pins 17, 19, 21, 22, 23, 24, 26 (not sure about 26) and one of the grounds (like 20 and/or 25
Now for the second part with the file
-
@gloomyandy said in Restart DuetControlServer (SBC):
Given that the main purpose of the SBC is to talk to the Duet I'm not sure it really makes sense to disable the service in this way. If you are happy making changes to the rPi config files (worse case you just reinstall everything), you can tell systemd not to disable a service in this situation by adding the following line:
StartLimitIntervalSec=0to the [unit] part of the service control file. In this case that file is:
/etc/systemd/system/multi-user.target.wants/duetcontrolserver.serviceI modified the file /etc/systemd/system/sysinit.target.wants/duetcontrolserver.service
I tried it several times and it didn't see it
-
@Phaedrux said in Restart DuetControlServer (SBC):
I've asked christian to take a look at this. Regardless, the Duet and Pi should be making a connection so the retry shouldn't come into play. Seems like either a bad cable or damaged pins.
If it was a bad cable or damaged pins, then it probably wouldn't work at all or there would be lots of comm errors. If I reboot the pi or restart DuetControlServer it works find, I've done several prints.
-
@BeosDoc Are you getting any
systemd[1]: duetcontrolserver.service: Start request repeated too quickly.
messages now in your syslog file? Are you rebooting the rPi to test this?
-
@gloomyandy Thanks for pointing out this limitation, I'll check if I can integrate your change in the next software version. The automatic fail detection wasn't an issue in earlier DSF versions because that always had a restart delay of 5 seconds.
@BeosDoc What I find odd about your report is that you do get occasional transfers (albeit with bad checksums) so I'm wondering if there is something wrong with the SPI peripheral either on the Pi or the Duet. Can you run the spidev_test procedure as described here? I'll update the docs with better troubleshooting instructions soon.