Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CentOS 8.3.2011 build is broken #6

Closed
galuha opened this issue Dec 18, 2020 · 8 comments
Closed

CentOS 8.3.2011 build is broken #6

galuha opened this issue Dec 18, 2020 · 8 comments

Comments

@galuha
Copy link

galuha commented Dec 18, 2020

Hello!

System: CentOS 8.3.2011
To reproduce:
1 Install the latest CentOS
2 Clone this repo
3 Run scripts/onload_install

Please see this log:

/root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/tc.c: In function ‘efx_tc_indr_setup_cb’:
/root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/tc.c:4776:15: error: implicit declaration of function ‘flow_indr_block_cb_alloc’; did you mean ‘flow_indr_block_call’? [-Werror=implicit-function-declaration]
block_cb = flow_indr_block_cb_alloc(efx_tc_block_cb, binding,
^~~~~~~~~~~~~~~~~~~~~~~~
flow_indr_block_call
/root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/tc.c:4776:13: warning: assignment to ‘struct flow_block_cb ’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
block_cb = flow_indr_block_cb_alloc(efx_tc_block_cb, binding,
^
/root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/tc.c:4804:4: error: implicit declaration of function ‘flow_indr_block_cb_remove’; did you mean ‘flow_indr_block_cb_register’? [-Werror=implicit-function-declaration]
flow_indr_block_cb_remove(block_cb, tcb);
^~~~~~~~~~~~~~~~~~~~~~~~~
flow_indr_block_cb_register
CC [M] /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/farch.o
/root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/tc.c: In function ‘efx_init_tc’:
/root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/tc.c:4951:30: error: passing argument 1 of ‘flow_indr_dev_register’ from incompatible pointer type [-Werror=incompatible-pointer-types]
rc = flow_indr_dev_register(efx_tc_indr_setup_cb, efx);
^~~~~~~~~~~~~~~~~~~~
In file included from ./include/net/sch_generic.h:21,
from ./include/linux/filter.h:25,
from ./include/net/sock.h:64,
from ./include/net/inet_sock.h:26,
from ./include/net/ip.h:31,
from /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/kernel_compat.h:60,
from /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/tc.c:12:
./include/net/flow_offload.h:539:55: note: expected ‘int (
)(struct net_device *, void *, enum tc_setup_type, void )’ but argument is of type ‘int ()(struct net_device *, void *, enum tc_setup_type, void *, void , void ()(struct flow_block_cb *))’
int flow_indr_dev_register(flow_indr_block_bind_cb_t *cb, void cb_priv);
~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
/root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/tc.c: In function ‘efx_fini_tc’:
/root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/tc.c:4972:28: error: passing argument 1 of ‘flow_indr_dev_unregister’ from incompatible pointer type [-Werror=incompatible-pointer-types]
flow_indr_dev_unregister(efx_tc_indr_setup_cb, efx, efx_tc_block_unbind);
^~~~~~~~~~~~~~~~~~~~
In file included from ./include/net/sch_generic.h:21,
from ./include/linux/filter.h:25,
from ./include/net/sock.h:64,
from ./include/net/inet_sock.h:26,
from ./include/net/ip.h:31,
from /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/kernel_compat.h:60,
from /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/tc.c:12:
./include/net/flow_offload.h:540:58: note: expected ‘int (
)(struct net_device *, void *, enum tc_setup_type, void )’ but argument is of type ‘int ()(struct net_device *, void *, enum tc_setup_type, void *, void , void ()(struct flow_block_cb *))’
void flow_indr_dev_unregister(flow_indr_block_bind_cb_t *cb, void cb_priv,
~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
/root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/tc.c:4972:55: error: passing argument 3 of ‘flow_indr_dev_unregister’ from incompatible pointer type [-Werror=incompatible-pointer-types]
flow_indr_dev_unregister(efx_tc_indr_setup_cb, efx, efx_tc_block_unbind);
^~~~~~~~~~~~~~~~~~~
In file included from ./include/net/sch_generic.h:21,
from ./include/linux/filter.h:25,
from ./include/net/sock.h:64,
from ./include/net/inet_sock.h:26,
from ./include/net/ip.h:31,
from /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/kernel_compat.h:60,
from /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/tc.c:12:
./include/net/flow_offload.h:541:27: note: expected ‘int (
)(enum tc_setup_type, void *, void )’ but argument is of type ‘void ()(void *)’
flow_setup_cb_t setup_cb);
~~~~~~~~~~~~~~~~~^~~~~~~~
CC [M] /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/siena.o
/root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/tc.c: In function ‘efx_tc_netdev_event’:
/root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/tc.c:5017:10: error: passing argument 3 of ‘__flow_indr_block_cb_register’ from incompatible pointer type [-Werror=incompatible-pointer-types]
efx_tc_indr_setup_cb, efx);
^~~~~~~~~~~~~~~~~~~~
In file included from ./include/net/sch_generic.h:21,
from ./include/linux/filter.h:25,
from ./include/net/sock.h:64,
from ./include/net/inet_sock.h:26,
from ./include/net/ip.h:31,
from /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/kernel_compat.h:60,
from /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/tc.c:12:
./include/net/flow_offload.h:552:34: note: expected ‘int (
)(struct net_device *, void *, enum tc_setup_type, void )’ but argument is of type ‘int ()(struct net_device *, void *, enum tc_setup_type, void *, void , void ()(struct flow_block_cb *))’
flow_indr_block_bind_cb_t cb,
~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
/root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/tc.c:5027:44: error: passing argument 2 of ‘__flow_indr_block_cb_unregister’ from incompatible pointer type [-Werror=incompatible-pointer-types]
__flow_indr_block_cb_unregister(net_dev, efx_tc_indr_setup_cb,
^~~~~~~~~~~~~~~~~~~~
In file included from ./include/net/sch_generic.h:21,
from ./include/linux/filter.h:25,
from ./include/net/sock.h:64,
from ./include/net/inet_sock.h:26,
from ./include/net/ip.h:31,
from /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/kernel_compat.h:60,
from /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/tc.c:12:
./include/net/flow_offload.h:556:37: note: expected ‘int (
)(struct net_device *, void *, enum tc_setup_type, void )’ but argument is of type ‘int ()(struct net_device *, void *, enum tc_setup_type, void *, void , void ()(struct flow_block_cb *))’
flow_indr_block_bind_cb_t *cb,
~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
CC [M] /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/sriov.o
CC [M] /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/mcdi_mon.o
CC [M] /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/ptp.o
CC [M] /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/mtd.o
CC [M] /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/ef100_vdpa.o
CC [M] /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/ef100_vdpa_ops.o
CC [M] /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/mcdi_vdpa.o
CC [M] /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/ioctl.o
CC [M] /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/ioctl_common.o
CC [M] /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/kernel_compat.o
CC [M] /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/linux_mdio.o
CC [M] /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/sfctool.o
CC [M] /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/aoe.o
CC [M] /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/debugfs.o
CC [M] /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/dump.o
CC [M] /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/driverlink.o
LD [M] /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/sfc_driverlink.o
cc1: some warnings being treated as errors
make[8]: *** [scripts/Makefile.build:315: /root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc/tc.o] Error 1
make[7]: *** [Makefile:1544: module/root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc] Error 2
make[7]: Leaving directory '/usr/src/kernels/4.18.0-240.1.1.el8_3.x86_64'
make[6]: *** [/root/onload/onload/src/driver/linux_net/drivers/net/ethernet/sfc/Makefile:260: modules] Error 2
make[6]: Leaving directory '/root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet/sfc'
make[5]: *** [/root/onload/onload/src/driver/linux_net/drivers/net/ethernet/mmake.mk:8: all] Error 2
make[5]: Leaving directory '/root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net/ethernet'
make[4]: *** [/root/onload/onload/src/driver/linux_net/drivers/net/mmake.mk:8: all] Error 2
make[4]: Leaving directory '/root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers/net'
make[3]: *** [/root/onload/onload/src/driver/linux_net/drivers/mmake.mk:8: all] Error 2
make[3]: Leaving directory '/root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net/drivers'
make[2]: *** [/root/onload/onload/src/driver/linux_net/mmake.mk:8: all] Error 2
make[2]: Leaving directory '/root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver/linux_net'
make[1]: *** [/root/onload/onload/src/driver/mmake.mk:19: all] Error 2
make[1]: Leaving directory '/root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64/driver'
make: *** [/root/onload/onload/src/mmake.mk:25: all] Error 2
make: Leaving directory '/root/onload/onload/build/x86_64_linux-4.18.0-240.1.1.el8_3.x86_64'
onload_build: ERROR: Failed to build driver components.
onload_install: ERROR: Build failed. Not installing.

@shirshen12
Copy link

You need to install kernel-devel and kernel-headers package. Make sure to match them exactly with kernel source, else it wont compile.

@galuha
Copy link
Author

galuha commented Dec 18, 2020 via email

@shirshen12
Copy link

Can you post the details of NiC and its driver ?

@galuha
Copy link
Author

galuha commented Dec 18, 2020

Shirshendu, how do i actually detect my card model by the look of it and from the console? Never could figure it out with SF cards.
From lspci perspective they are mostly the same:
This card shows
lspci | grep Sol
81:00.0 Ethernet controller: Solarflare Communications SFC9220 10/40G Ethernet Controller (rev 02)
81:00.1 Ethernet controller: Solarflare Communications SFC9220 10/40G Ethernet Controller (rev 02)

And on the same card that is in my hands now it is written: Solarflare 2016 SF10-050020
But the seller states that it is SFN8522PLUS

Another SF card on my other server
lspci | grep Sol
01:00.0 Ethernet controller: Solarflare Communications SFC9250 10/25/40/50/100G Ethernet Controller (rev 01)
01:00.1 Ethernet controller: Solarflare Communications SFC9250 10/25/40/50/100G Ethernet Controller (rev 01)

How can i be sure, which model is this?

I have downgraded CentOS from 8.3.2011 to 8.2.2004
kernel 4.18.0-193.el8.x86_64 and build finished successfully.
It now shows driver version:
ethtool -i enp129s0f0
driver: sfc
version: 5.3.3.1001
firmware-version: 6.5.1.1023 rx1 tx1
expansion-rom-version:
bus-info: 0000:81:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: yes

Although I've managed to install latest drivers, the question about card model detection is still very important.
Thanks in advance

--Andrei

@abower-amd
Copy link
Collaborator

Hi @galuha, thanks for trying out this development branch! Please contact [email protected] or use resources at https://support-nic.xilinx.com/ if you need support for your Solarflare adapter or to download supported versions of Onload.

@abower-amd
Copy link
Collaborator

Thanks for raising the build error, which is fixed by e9d90b2.

@abower-amd
Copy link
Collaborator

Andrei,

lspci | grep Sol
81:00.0 Ethernet controller: Solarflare Communications SFC9220 10/40G Ethernet Controller (rev 02)
81:00.1 Ethernet controller: Solarflare Communications SFC9220 10/40G Ethernet Controller (rev 02)

This output gives the ASIC details not the NIC model. If you use lspci -v -d1924: then you can see the adapter type after 'Subsystem'. You can also see this information in dmesg when the sfc driver initializes. The sfreport.pl tool from https://support-nic.xilinx.com/wp/drivers is useful for extracting more information about the networking subsystem or when reporting issues with supported hardware or software.

@galuha
Copy link
Author

galuha commented Dec 21, 2020

Thank you very much,
Now i can see that my card is:
Subsystem: Solarflare Communications SFN8522-R2 8000 Series 10G Adapter

cns-ci-onload-xilinx pushed a commit that referenced this issue Feb 1, 2023
Deferring oo_exit_hook() fixes a stuck C++ application:

    #0  0x00007fd2d7afb87b in ioctl () from /lib64/libc.so.6
    #1  0x00007fd2d80c0621 in oo_resource_op (cmd=3221510722, io=0x7ffd15be696c, fp=<optimized out>) at /home/iteterev/lab/onload_internal/src/include/onload/mmap.h:104
    #2  __oo_eplock_lock (timeout=<synthetic pointer>, maybe_wedged=0, ni=0x20c8480) at /home/iteterev/lab/onload_internal/src/lib/transport/ip/eplock_slow.c:35
    #3  __ef_eplock_lock_slow (ni=ni@entry=0x20c8480, timeout=timeout@entry=-1, maybe_wedged=maybe_wedged@entry=0) at /home/iteterev/lab/onload_internal/src/lib/transport/ip/eplock_slow.c:72
    #4  0x00007fd2d80d7dbf in ef_eplock_lock (ni=0x20c8480) at /home/iteterev/lab/onload_internal/src/include/onload/eplock.h:61
    #5  __ci_netif_lock_count (stat=0x7fd2d5c5b62c, ni=0x20c8480) at /home/iteterev/lab/onload_internal/src/include/ci/internal/ip_shared_ops.h:79
    #6  ci_tcp_setsockopt (ep=ep@entry=0x20c8460, fd=6, level=level@entry=1, optname=optname@entry=9, optval=optval@entry=0x7ffd15be6acc, optlen=optlen@entry=4) at /home/iteterev/lab/onload_internal/src/lib/transport/ip/tcp_sockopts.c:580
    #7  0x00007fd2d8010da7 in citp_tcp_setsockopt (fdinfo=0x20c8420, level=1, optname=9, optval=0x7ffd15be6acc, optlen=4) at /home/iteterev/lab/onload_internal/src/lib/transport/unix/tcp_fd.c:1594
    #8  0x00007fd2d7fde088 in onload_setsockopt (fd=6, level=1, optname=9, optval=0x7ffd15be6acc, optlen=4) at /home/iteterev/lab/onload_internal/src/lib/transport/unix/sockcall_intercept.c:737
    #9  0x00007fd2d7dcb7dd in ?? ()
    #10 0x00007fd2d83392e0 in ?? () from /home/iteterev/lab/onload_internal/build/gnu_x86_64/lib/transport/unix/libcitransport0.so
    #11 0x000000000060102c in data_start ()
    #12 0x00007fd2d8339540 in ?? () from /home/iteterev/lab/onload_internal/build/gnu_x86_64/lib/transport/unix/libcitransport0.so
    #13 0x00000001d85426c0 in ?? ()
    #14 0x00007fd2d7fcbe08 in ?? ()
    #15 0x00007fd2d7a433c7 in __cxa_finalize () from /lib64/libc.so.6
    #16 0x00007fd2d7dcb757 in ?? ()
    #17 0x00007ffd15be6be0 in ?? ()
    #18 0x00007fd2d834f2a6 in _dl_fini () from /lib64/ld-linux-x86-64.so.2

Here, _fini() is a function that calls all library destructors. The
problem is that _fini() decides to run the C++ library destructor
*after* Onload and makes it operate on an invalid Onload state.

The patch leverages the fact that Glibc sets up _fini() after running
the last library constructor, so by manually installing the exit handler
(instead of providing a library destructor), Onload wins the race with
_fini().

There's still an issue if the user library sets a custom exit handler
with atexit() or on_exit() and makes intercepted system calls from
there.

Tested:

* RHEL 7.9/glibc 2.17
* RHEL 8.2/glibc 2.28
* RHEL 9.1/glibc 2.34

Thanks-to: Richard Hughes <[email protected]>
Thanks-to: Siân James <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants