Pi4 gisb stalls when using genet ethernet #1219

cinaplenrek · 2019-08-05T13:06:20Z

I'm working on plan9 arm64 kernel support for the raspberry pi 4.

I'm observing gisb arbiter errors when operating the ethernet controller on the raspberry pi 4.
In general, ethernet works fine on light traffic but heavy traffic causes sporadic 42 second long
bus stalls. That is, any core accessing mmio registers on the gisb (genet, pcie) hangs and
then continues. Even accesses to the gisb arbiter itself hang.

After such a stall, when i poll (as i dont know the INTID for the arbiter) the gisb arbiter capture
status register (0x7c4007f4) reads 0x3D and the bus address reported in the capture address
registers ([0x7c4007ec] | [0x7c4007e8]<<32) reads strange 12 bit bus addresses like:
0x2a0, 0xfe0, 0xea0, 0xee0, 0x6a0... (they'r all (x-32)%64 == 0)

Normally, when the arm accesses invalid mmio registers on the bus i get an SErr interrupt
and the arbiter capture address registers contain a proper bus address above 0x7c000000.
This is not the case here.

Is it possible for the arm to issue bus access to such addresses? And if so, how?
If not, who could initiate such bus transactions?

Can someone tell me the INTID for the gisb arb error interrupt and how the interrupt
can be enabled besides enabling it in the GIC? Maybe polling the arbiter results in
these bogus addresses?

What i could figure out so far:

stalls happen for both read and write accesses, and the register doesnt matter
hanging mmio write accesses seem to complete fine after the stall. that is i tested
reading back the registers i write in the ethernet driver after write and the new value got updated.
the 42 second stall time is also unrelated to the arbiter timeout value in the arbiter
timer register 0x7c400008
serializing all genet register accesses and placing barriers before and after has no effect
linux works fine, and i made a trace of all mmio register writes to check for differences
in genet initialization but they match: http://felloff.net/usr/cinap_lenrek/pi4iodump.txt

Speculation:

the stall time of 42 seconds is the same time a 32 bit counter would wrap at 100MHz

popcornmix · 2019-08-05T13:08:29Z

@P33M any ideas?

P33M · 2019-08-05T13:16:51Z

What size of accesses are you using to read/write GISB registers?

cinaplenrek · 2019-08-05T13:18:59Z

all 32 bit, naturally aligned.

…

-- cinap

P33M · 2019-08-05T13:31:31Z

Decoding the error capture status register (0x7c4007f4) - the error was not caused by a slave response timeout, the error was not caused by a slave response error, and the bus cycle was a read. Oddly, none of the 4 byte strobes in [5:2] are asserted (1 => not asserted). How can we have a read cycle with no byte strobes?

Does the status register ever change (i.e. is it the same for both read and write)?

Is Plan 9 using the firmware clock setup or have any modifications been made to any of the clock generators?

Edit: also, can you capture the GISB master source register at 0x7c4007f8? It's a bitmask of who generated the address that generated the fault.

cinaplenrek · 2019-08-05T13:51:26Z

the status register always reads 0x3D for reads. when i deliberately do a mmio write of zero to 0x7dfffff0, the gisb status register changes to 0x3F and the proper bus address is reported. the clock manager registers have not been touched. however, we issue firmware request SetClkSpd (0x00038002) with the value returned by firmware request GetClkMax (0x00030004) with clock id 3 (ClkArm) on boot. in timer initialization, we write 0 to 0x40000000 to switch to osc clock and setup the prescaler for 1MHz by writing register 0x40000008 to ((1MHz<<32) / 54MHz)) & ~1 == 0x4bda12e.

cinaplenrek · 2019-08-05T14:10:17Z

the gisb master source register 0x7c4007f8 always reads 1. sorry for not mentioning it.

cinaplenrek · 2019-08-06T15:56:16Z

is there anything else i can try to rule out potential problem sources?
the clock generators where mentioned...
i have core_freq=250 in config.txt for the mini uart console to work.
are there any config.txt properties i can try to change to rule out
clock or power issues?

pelwell · 2019-08-06T15:58:18Z

Try with core_freq=500 and core_freq_min=500 - 250 is possibly too low.

cinaplenrek · 2019-08-06T17:28:44Z

with core_freq=500 and core_freq_min=500 in config.txt, the mini uart breaks as expected. so i enabled pcie and xhci to use usb keyboard to type commands into the machine. but the gisb errors persist. i also removed the setclkrate firmware request so the arm now runs at its initial 600MHz. any other ideas?

pelwell · 2019-08-06T18:04:10Z

I suppose suggesting using Linux is not helpful?

cinaplenrek · 2019-08-07T23:43:54Z

done some experiments. it seems the byte strobe bits [5:2] in the status register are just *not* inverted? read 8 bit addr=0x7dfffff0 status=0x05 strobe=0b0001 read 8 bit addr=0x7dfffff1 status=0x09 strobe=0b0010 read 8 bit addr=0x7dfffff2 status=0x11 strobe=0b0100 read 8 bit addr=0x7dfffff3 status=0x21 strobe=0b1000 read 16 bit addr=0x7dfffff0 status=0x0d strobe=0b0011 read 16 bit addr=0x7dfffff2 status=0x31 strobe=0b1100 read 32 bit addr=0x7dfffff0 status=0x3d strobe=0b1111 also, interestingly, reading bus address 0x7dfffff0 with the dma controller yields master source 0x40 and status 0x103d. so master source 0x1 is the arm and 0x40 is dma controller?

P33M · 2019-08-14T09:09:14Z

The only other thing I can think of would be the cacheability of the address space in question - what page protection bits are being used?

cinaplenrek · 2019-08-16T16:56:25Z

spot on. the pte's for the mmio regions where missing the XN bits. apparently the chip was doing speculative instruction fetches from the device mappings... case closed.

budius mentioned this issue Jun 18, 2019

Intermittent mmal_vll_load issue on Pi-Zero-W using raspivid #1153

Closed

cinaplenrek closed this as completed Aug 16, 2019

brabl2 mentioned this issue Apr 19, 2020

1-Wire in Parasite Power configuration (1-Wire using 2 wires) does not work in 4.19.42 #1143

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pi4 gisb stalls when using genet ethernet #1219

Pi4 gisb stalls when using genet ethernet #1219

cinaplenrek commented Aug 5, 2019 •

edited

Loading

popcornmix commented Aug 5, 2019

P33M commented Aug 5, 2019

cinaplenrek commented Aug 5, 2019 via email

P33M commented Aug 5, 2019 •

edited

Loading

cinaplenrek commented Aug 5, 2019 via email

cinaplenrek commented Aug 5, 2019 via email

cinaplenrek commented Aug 6, 2019

pelwell commented Aug 6, 2019

cinaplenrek commented Aug 6, 2019 via email

pelwell commented Aug 6, 2019

cinaplenrek commented Aug 7, 2019 via email

P33M commented Aug 14, 2019

cinaplenrek commented Aug 16, 2019 via email

Pi4 gisb stalls when using genet ethernet #1219

Pi4 gisb stalls when using genet ethernet #1219

Comments

cinaplenrek commented Aug 5, 2019 • edited Loading

popcornmix commented Aug 5, 2019

P33M commented Aug 5, 2019

cinaplenrek commented Aug 5, 2019 via email

P33M commented Aug 5, 2019 • edited Loading

cinaplenrek commented Aug 5, 2019 via email

cinaplenrek commented Aug 5, 2019 via email

cinaplenrek commented Aug 6, 2019

pelwell commented Aug 6, 2019

cinaplenrek commented Aug 6, 2019 via email

pelwell commented Aug 6, 2019

cinaplenrek commented Aug 7, 2019 via email

P33M commented Aug 14, 2019

cinaplenrek commented Aug 16, 2019 via email

cinaplenrek commented Aug 5, 2019 •

edited

Loading

P33M commented Aug 5, 2019 •

edited

Loading