Assistance reading diagnostics after failure- 3.3
-
We just experienced a print failure where the printer reset. I grabbed the diagnostics in hopes it contains useful information on what just happened. This is a 6HC running 3.3 stable with an attached SPC. Thanks
=== Diagnostics ===
RepRapFirmware for Duet 3 MB6HC version 3.3 (2021-06-15 21:45:47) running on Duet 3 MB6HC v0.6 or 1.0 (SBC mode)
Board ID: 08DJM-956L2-G43S4-6J1DD-3SJ6S-TB62H
Used output buffers: 3 of 40 (40 max)
=== RTOS ===
Static ram: 150904
Dynamic ram: 62836 of which 140 recycled
Never used RAM 137456, free system stack 114 words
Tasks: SBC(ready,75.9%,314) HEAT(delaying,0.9%,325) Move(notifyWait,10.5%,250) CanReceiv(notifyWait,0.8%,774) CanSender(notifyWait,0.6%,366) CanClock(delaying,0.3%,339) TMC(notifyWait,112.5%,59) MAIN(running,59.1%,922) IDLE(ready,0.0%,29), total 260.6%
Owned mutexes: HTTP(MAIN)
=== Platform ===
Last reset 51:53:37 ago, cause: software
Last software reset at 2021-08-14 07:26, reason: User, none spinning, available RAM 137620, slot 2
Software reset code 0x0012 HFSR 0x00000000 CFSR 0x00000000 ICSR 0x0043c000 BFAR 0x00000000 SP 0x00000000 Task SBC Freestk 0 n/a
Error status: 0x04
Aux0 errors 0,0,0
Step timer max interval 156
MCU temperature: min 41.0, current 47.6, max 48.0
Supply voltage: min 2.4, current 23.5, max 24.0, under voltage events: 0, over voltage events: 0, power good: yes
12V rail voltage: min 0.4, current 12.1, max 12.2, under voltage events: 1
Heap OK, handles allocated/used 99/7, heap memory allocated/used/recyclable 2048/2032/1938, gc cycles 1
Driver 0: position 42141, standstill, reads 5320, writes 68 timeouts 0, SG min/max 0/1023
Driver 1: position 296, standstill, reads 5366, writes 22 timeouts 0, SG min/max 0/0
Driver 2: position 7510, standstill, reads 5320, writes 68 timeouts 0, SG min/max 0/1023
Driver 3: position 0, standstill, reads 5215, writes 173 timeouts 0, SG min/max 0/1023
Driver 4: position 0, standstill, reads 5369, writes 22 timeouts 0, SG min/max 0/0
Driver 5: position 0, standstill, reads 5218, writes 173 timeouts 0, SG min/max 0/1023
Date/time: 2021-08-16 11:20:14
Slowest loop: 174.44ms; fastest: 0.03ms
=== Storage ===
Free file entries: 10
SD card 0 not detected, interface speed: 37.5MBytes/sec
SD card longest read time 0.0ms, write time 0.0ms, max retries 0
=== Move ===
DMs created 125, maxWait 12375997ms, bed compensation in use: mesh, comp offset 0.000
=== MainDDARing ===
Scheduled moves 460228, completed moves 460228, hiccups 0, stepErrors 0, LaErrors 0, Underruns [1, 0, 75], CDDA state -1
=== AuxDDARing ===
Scheduled moves 0, completed moves 0, hiccups 0, stepErrors 0, LaErrors 0, Underruns [0, 0, 0], CDDA state -1
=== Heat ===
Bed heaters = 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1, chamberHeaters = 2 -1 -1 -1
Heater 0 is on, I-accum = 0.3
Heater 1 is on, I-accum = 0.0
Heater 2 is on, I-accum = 0.0
=== GCodes ===
Segments left: 0
Movement lock held by null
HTTP* is doing "M122" in state(s) 0
Telnet is idle in state(s) 0
File* is idle in state(s) 0
USB is idle in state(s) 0
Aux is idle in state(s) 0
Trigger* is idle in state(s) 0
Queue* is idle in state(s) 0
LCD is idle in state(s) 0
SBC is idle in state(s) 0
Daemon is idle in state(s) 0
Aux2 is idle in state(s) 0
Autopause is idle in state(s) 0
Code queue is empty.
=== CAN ===
Messages queued 4576948, received 2249460, lost 0, longest wait 3ms for reply type 6029, peak Tx sync delay 31273, free buffers 49 (min 38), ts 934087/934073/0
Tx timeouts 0,0,13,0,0,0 last cancelled message type 30 dest 127=== SBC interface ===
State: 4, failed transfers: 0, checksum errors: 0
Last transfer: 3ms ago
RX/TX seq numbers: 6237/3836
SPI underruns 0, overruns 0
Disconnects: 1, timeouts: 1, IAP RAM available 0x2c83c
Buffer RX/TX: 0/0-0
=== Duet Control Server ===
Duet Control Server v3.3.0
Code buffer space: 4096
Configured SPI speed: 8000000Hz
Full transfers per second: 41.43, max wait times: 86.5ms/0.0ms
Codes per second: 17.18
Maximum length of RX/TX data transfers: 4200/1368 -
@oozebot you need to get the logs from syslog as well on the pi
-
@jay_s_uk grabbing them now
-
Here is the relevant section. SD Card failure?
Aug 16 11:18:25 oozeBot kernel: [188744.509778] mmc1: Timeout waiting for hardware interrupt. Aug 16 11:18:25 oozeBot kernel: [188744.509850] brcmfmac: mmc_submit_one: CMD53 sg block write failed -110 Aug 16 11:18:25 oozeBot kernel: [188744.509869] brcmfmac: brcmf_sdio_txfail: sdio error, abort command and terminate frame Aug 16 11:18:25 oozeBot kernel: [188744.510081] ieee80211 phy0: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110 Aug 16 11:18:25 oozeBot kernel: [188744.511323] brcmfmac: brcmf_sdio_hdparse: HW header checksum error Aug 16 11:18:25 oozeBot kernel: [188744.511338] brcmfmac: brcmf_sdio_rxfail: terminate frame Aug 16 11:18:25 oozeBot kernel: [188744.512497] brcmfmac: brcmf_sdio_hdparse: HW header checksum error Aug 16 11:18:25 oozeBot kernel: [188744.512512] brcmfmac: brcmf_sdio_rxfail: terminate frame Aug 16 11:18:25 oozeBot kernel: [188744.512686] brcmfmac: brcmf_sdio_hdparse: HW header checksum error Aug 16 11:18:25 oozeBot kernel: [188744.512714] brcmfmac: brcmf_sdio_rxfail: terminate frame Aug 16 11:18:25 oozeBot kernel: [188744.512891] brcmfmac: brcmf_sdio_hdparse: HW header checksum error Aug 16 11:18:25 oozeBot kernel: [188744.512905] brcmfmac: brcmf_sdio_rxfail: terminate frame Aug 16 11:18:27 oozeBot DuetControlServer[1043]: [info] System time has been changed Aug 16 11:18:27 oozeBot kernel: [188747.069866] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout Aug 16 11:18:27 oozeBot kernel: [188747.070224] brcmfmac: brcmf_sdio_readshared: invalid sdpcm_shared address 0x00000000 Aug 16 11:18:27 oozeBot kernel: [188747.070241] brcmfmac: brcmf_sdio_readshared: unable to obtain sdpcm_shared info: rv=-22 (addr=0x0) Aug 16 11:18:27 oozeBot kernel: [188747.070268] ieee80211 phy0: brcmf_cfg80211_get_station: GET STA INFO failed, -110 Aug 16 11:18:27 oozeBot DuetControlServer[1043]: [warn] Controller has been reset Aug 16 11:18:27 oozeBot DuetControlServer[1043]: [info] Aborted job file Aug 16 11:18:27 oozeBot DuetControlServer[1043]: [warn] Trigger: Out-of-order reply: '' Aug 16 11:18:31 oozeBot kernel: [188750.655249] brcmfmac: brcmf_sdio_hdparse: HW header checksum error Aug 16 11:18:31 oozeBot kernel: [188750.655268] brcmfmac: brcmf_sdio_rxfail: terminate frame Aug 16 11:18:34 oozeBot kernel: [188753.630009] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout Aug 16 11:18:34 oozeBot kernel: [188753.630360] brcmfmac: brcmf_sdio_readshared: invalid sdpcm_shared address 0xFFB70048 Aug 16 11:18:34 oozeBot kernel: [188753.630377] brcmfmac: brcmf_sdio_readshared: unable to obtain sdpcm_shared info: rv=-22 (addr=0xffb70048) Aug 16 11:18:36 oozeBot DuetControlServer[1043]: [info] System time has been changed Aug 16 11:18:36 oozeBot kernel: [188756.190060] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout Aug 16 11:18:36 oozeBot kernel: [188756.190424] brcmfmac: brcmf_sdio_readshared: invalid sdpcm_shared address 0xFFB70048 Aug 16 11:18:36 oozeBot kernel: [188756.190441] brcmfmac: brcmf_sdio_readshared: unable to obtain sdpcm_shared info: rv=-22 (addr=0xffb70048) Aug 16 11:18:36 oozeBot kernel: [188756.190468] ieee80211 phy0: brcmf_cfg80211_get_station: GET STA INFO failed, -110 Aug 16 11:18:41 oozeBot kernel: [188760.659861] brcmfmac: brcmf_sdio_hdparse: HW header checksum error Aug 16 11:18:41 oozeBot kernel: [188760.659879] brcmfmac: brcmf_sdio_rxfail: terminate frame Aug 16 11:18:43 oozeBot kernel: [188762.760085] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout Aug 16 11:18:43 oozeBot kernel: [188762.766964] brcmfmac: brcmf_sdiod_ramrw: membytes transfer failed Aug 16 11:18:43 oozeBot kernel: [188762.766974] brcmfmac: brcmf_sdio_readshared: unable to obtain sdpcm_shared info: rv=-84 (addr=0x20000010) Aug 16 11:18:43 oozeBot kernel: [188762.767421] brcmfmac: brcmf_sdio_hdparse: HW header checksum error Aug 16 11:18:43 oozeBot kernel: [188762.767428] brcmfmac: brcmf_sdio_rxfail: terminate frame Aug 16 11:18:43 oozeBot kernel: [188762.775888] ieee80211 phy0: brcmf_cfg80211_get_station: GET STA INFO failed, -52 Aug 16 11:18:43 oozeBot dhcpcd[628]: wlan0: carrier lost
-
It's a bit confusing since brcmfmac is a wifi driver.
-
@oliof agreed. The RPi did not reboot, we just lost connection to it through DWC for ~30 seconds. Once the connection was restored, the print was canceled. I wonder if the change in System time (line 13) had anything to do with it? That's the first thing reported by DuetControlServer.
-
CMD53 is some kind of configuration command for a wifi chip; which is where some error was triggered. This seems to have blocked some subsystem for about 2 seconds, long enough for a realtime system to decide it is in an irrevocable state of de-sync.
Sleuthing a bit, it could be related to an unfixed issue with the wifi chip and/or driver on the Raspberry Pi which has been around since 2018 ...
I'd suggest moving to Ethernet if possible.
-
@oliof I think CMD53 actually related to the SD card interface. It looks to me that there was a timeout waiting for an interrupt from the SD card subsystem.
-
@oozebot said in Assistance reading diagnostics after failure- 3.3:
brcmfmac: mmc_submit_one
Thank you both. Based on what we are finding, we are starting to think we overtaxed the wifi module with several webcam streams going at once.
-
@dc42 I'm confused why the wifi driver does SD card interface management, but that may be the vagaries of integrated electronics. It still seems to be resulting in a failure of the wifi interface twice (
Aug 16 11:18:27 oozeBot kernel: [188747.070268] ieee80211 phy0: brcmf_cfg80211_get_station: GET STA INFO failed, -110
and again at 11:18:43). -
-
on the RPi 3B it seems to be BCM43438 (on RPi4 a BCM4345/6) -- and you're right, they are connected via SDIO interface. I am less confused now.