-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenOnload not able to allocate stacks. #3
Comments
Firstly, thanks for trying this out! Our AF_XDP support is a work in progress, so it's great to have someone else having a go. It doesn't look like you're running as root, which you need to be to create the stack, as this is needed for setting up the AF_XDP resources. If that doesn't resolve the problem, then could you please attach your dmesg output? We try to log a bit more information about what's going on there. |
Hello @sianj-xilinx , I did try running as root, as recommended by you but still no luck.
Output of onload_stackdump: |
Hello @sianj-xilinx , I think you edited your README that its a WIP and you are right. The dmesg error logs do point to the following functions not being implemented:
There are link 15 such functions in src/lib/efhw/af_xdp.c Request you to take a look at them. |
Hello @shirshen12, Let us know if this resolves the issue for you. |
Hello @maciejj-xilinx , I did turn on the flow director for 82599 ixgbe driver
I compiled the latest onload driver: I see the same error as above: Also, confirmed via onload_stackdump stacks |
Also @maciejj-xilinx / @sianj-xilinx can you share your env details under which you are testing. Since, if its going through for you I can replicate your environment details. |
I have just checked memcached command line options. It looks Nonetheless, this is genuine issue, and we can see it needs to resolved - that is to allow creating stacks by unpriviledged users. |
Ok let me try that and revert. |
Hello @maciejj-xilinx , It worked, please see below for confirmation.
Stacks being created: I will close this ticket and I will raise a new ticket for non privileged users should be able to create stacks. I will test on some other NiCs like MLNX ConnectX and report on AF_XDP stuff as well. For Intel 82599, it works as of now. |
Deferring oo_exit_hook() fixes a stuck C++ application: #0 0x00007fd2d7afb87b in ioctl () from /lib64/libc.so.6 #1 0x00007fd2d80c0621 in oo_resource_op (cmd=3221510722, io=0x7ffd15be696c, fp=<optimized out>) at /home/iteterev/lab/onload_internal/src/include/onload/mmap.h:104 #2 __oo_eplock_lock (timeout=<synthetic pointer>, maybe_wedged=0, ni=0x20c8480) at /home/iteterev/lab/onload_internal/src/lib/transport/ip/eplock_slow.c:35 #3 __ef_eplock_lock_slow (ni=ni@entry=0x20c8480, timeout=timeout@entry=-1, maybe_wedged=maybe_wedged@entry=0) at /home/iteterev/lab/onload_internal/src/lib/transport/ip/eplock_slow.c:72 #4 0x00007fd2d80d7dbf in ef_eplock_lock (ni=0x20c8480) at /home/iteterev/lab/onload_internal/src/include/onload/eplock.h:61 #5 __ci_netif_lock_count (stat=0x7fd2d5c5b62c, ni=0x20c8480) at /home/iteterev/lab/onload_internal/src/include/ci/internal/ip_shared_ops.h:79 #6 ci_tcp_setsockopt (ep=ep@entry=0x20c8460, fd=6, level=level@entry=1, optname=optname@entry=9, optval=optval@entry=0x7ffd15be6acc, optlen=optlen@entry=4) at /home/iteterev/lab/onload_internal/src/lib/transport/ip/tcp_sockopts.c:580 #7 0x00007fd2d8010da7 in citp_tcp_setsockopt (fdinfo=0x20c8420, level=1, optname=9, optval=0x7ffd15be6acc, optlen=4) at /home/iteterev/lab/onload_internal/src/lib/transport/unix/tcp_fd.c:1594 #8 0x00007fd2d7fde088 in onload_setsockopt (fd=6, level=1, optname=9, optval=0x7ffd15be6acc, optlen=4) at /home/iteterev/lab/onload_internal/src/lib/transport/unix/sockcall_intercept.c:737 #9 0x00007fd2d7dcb7dd in ?? () #10 0x00007fd2d83392e0 in ?? () from /home/iteterev/lab/onload_internal/build/gnu_x86_64/lib/transport/unix/libcitransport0.so #11 0x000000000060102c in data_start () #12 0x00007fd2d8339540 in ?? () from /home/iteterev/lab/onload_internal/build/gnu_x86_64/lib/transport/unix/libcitransport0.so #13 0x00000001d85426c0 in ?? () #14 0x00007fd2d7fcbe08 in ?? () #15 0x00007fd2d7a433c7 in __cxa_finalize () from /lib64/libc.so.6 #16 0x00007fd2d7dcb757 in ?? () #17 0x00007ffd15be6be0 in ?? () #18 0x00007fd2d834f2a6 in _dl_fini () from /lib64/ld-linux-x86-64.so.2 Here, _fini() is a function that calls all library destructors. The problem is that _fini() decides to run the C++ library destructor *after* Onload and makes it operate on an invalid Onload state. The patch leverages the fact that Glibc sets up _fini() after running the last library constructor, so by manually installing the exit handler (instead of providing a library destructor), Onload wins the race with _fini(). There's still an issue if the user library sets a custom exit handler with atexit() or on_exit() and makes intercepted system calls from there. Tested: * RHEL 7.9/glibc 2.17 * RHEL 8.2/glibc 2.28 * RHEL 9.1/glibc 2.34 Thanks-to: Richard Hughes <[email protected]> Thanks-to: Siân James <[email protected]>
I have trying to get OpenOnload to work on a bare-metal box from Vultr.io
Environment Details:
CPU: Intel(R) Xeon(R) CPU E3-1270 v6 @ 3.80GHz
NIC: Intel Corporation 82599 10 Gigabit Network Connection
OS: CentOS Linux release 8.2.2004 (Core)
Kernel: 4.18.0-193.28.1.el8_2.x86_64
How to Build and Install Onload:
wget https://rpmfind.net/linux/centos/8.2.2004/BaseOS/x86_64/os/Packages/kernel-headers-4.18.0-193.28.1.el8_2.x86_64.rpm
wget https://rpmfind.net/linux/centos/8.2.2004/BaseOS/x86_64/os/Packages/kernel-devel-4.18.0-193.28.1.el8_2.x86_64.rpm
yum install binutils gettext gawk gcc sed make bash glibc-common libcap-devel libmnl-devel perl-Test-Harness hmaccalc zlib-devel binutils-devel elfutils-libelf-devel libevent-devel
git clone https://github.com/Xilinx-CNS/onload.git
cd onload
scripts/onload_mkdist --release
cd onload-20201211/scripts
./onload_install
./onload_tool reload
Verify Onload driver has been built/inserted
[root@shirbare06 scripts]# lsmod | grep sfc sfc_char 98304 1 onload sfc_resource 180224 2 onload,sfc_char sfc 581632 0 virtual_bus 16384 1 sfc sfc_driverlink 16384 2 sfc,sfc_resource mtd 69632 1 sfc mdio 16384 2 sfc,ixgbe
**Check if XDP module has been enabled for via xdptools **
Install XDP tools
yum install clang llvm
dnf --enablerepo=PowerTools install libpcap-devel
git clone https://github.com/xdp-project/xdp-tools.git
cd xdp-tools
./configure
make
sudo make install
Check if xdpsock() is enabled:
`[root@shirbare06 scripts]# xdpdump -D
if_index if_name XDP program entry function
1 lo <No XDP program loaded!>
2 enp1s0 xdpsock()`
Start a sample memcached
s.chakrabarti@shirbare06 ~]$ onload -p latency memcached -m 24576 -c 1024 -t 8 -u s.chakrabarti -l 95.179.137.242:11211 oo:memcached[96975]: netif_tcp_helper_alloc_u: ERROR: Failed to allocate stack (rc=-1) See kernel messages in dmesg or /var/log/syslog for more details of this failure
Also verified stack is not being created:
[root@shirbare06 scripts]# onload_stackdump stacks
Can someone please look at this issue and fix :)
The text was updated successfully, but these errors were encountered: