Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ubuntu20.04.2LTS ,ixgbe 5.1.0-K, not working with EF_AF_XDP_ZEROCOPY=1 #18

Open
density1970 opened this issue Apr 7, 2021 · 1 comment

Comments

@density1970
Copy link

Hello ,
I have been testing Onload with AF_XDP support on 10 GBe Intel 82599.
my Environment Details:
CPU: Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
NIC: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
OS: Ubuntu 20.04.2 LTS
Kernel: 5.4.0-65-generic

root@xcsc:# ethtool -i enp4s0f1
driver: ixgbe
version: 5.1.0-k
firmware-version: 0x000161ae
expansion-rom-version:
bus-info: 0000:04:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

before I enable the EF_AF_XDP_ZEROCOPY to 1 in the latency.opf, my application(onload -p ./conf/latency.opf ./zxTrade) works fine:

Recv tickData(15:29:05): { MkDataCounts: 23840 , TradeDataCounts: 0 , OrderDataCounts: 0 }
Recv tickData(15:30:05): { MkDataCounts: 26224 , TradeDataCounts: 0 , OrderDataCounts: 0 }

after I enable the EF_AF_XDP_ZEROCOPY to 1 in the latency.opf:

onload_set EF_AF_XDP_ZEROCOPY 1

stack can be created:
root@xcsc:# onload_stackdump stacks
#stack-id stack-name pids
1 - 9499

EF_AF_XDP_ZEROCOPY enabled:
root@xcsc:# onload_stackdump lots | grep XDP
EF_XDP_MODE: 0
EF_AF_XDP_ZEROCOPY: 1 (default: 0)
env: EF_AF_XDP_ZEROCOPY=1

but my application(onload -p ./conf/latency.opf ./zxTrade) can not work fine,application show no data packets received;

Recv tickData: { MkDataCounts:0, TradeDataCounts:0, OrderDataCounts:0 }
Recv tickData: { MkDataCounts:0, TradeDataCounts:0, OrderDataCounts:0 }

the thing below drived me crazy:
when I delete "the onload_set EF_AF_XDP_ZEROCOPY 1" in latency.opf, and reload my application, the system crashed, need to reboot.
After the Ubuntu system reboot, dmesg show these message:
root@xcsc:~# dmesg | grep onload
[ 14.229749] [onload] Onload
[ 14.229750] [onload] Copyright 2019-present Xilinx, 2006-2019 Solarflare Communications, 2002-2005 Level 5 Networks
[ 14.489442] onload_cp_server[461]: Spawned daemon process 481
[ 14.813890] synth uevent: /devices/virtual/onload/onload: failed to send uevent
[ 14.813893] onload onload: uevent: failed to send synthetic uevent
[ 14.832521] synth uevent: /devices/virtual/onload_epoll/onload_epoll: failed to send uevent
[ 14.832523] onload_epoll onload_epoll: uevent: failed to send synthetic uevent

Is ixgbe driver 5.1.0-K(come with Ubuntu20.04.2 LTS) is not support AF_XDP, ZEROCOPY or other reason ???

thanks.

@ol-alexandra
Copy link
Contributor

@maciejj-xilinx & I have agreed that it is probably ON-12643/SWNETLINUX-3906 internal bugs.
I'm working on it.

cns-ci-onload-xilinx pushed a commit that referenced this issue Feb 1, 2023
Deferring oo_exit_hook() fixes a stuck C++ application:

    #0  0x00007fd2d7afb87b in ioctl () from /lib64/libc.so.6
    #1  0x00007fd2d80c0621 in oo_resource_op (cmd=3221510722, io=0x7ffd15be696c, fp=<optimized out>) at /home/iteterev/lab/onload_internal/src/include/onload/mmap.h:104
    #2  __oo_eplock_lock (timeout=<synthetic pointer>, maybe_wedged=0, ni=0x20c8480) at /home/iteterev/lab/onload_internal/src/lib/transport/ip/eplock_slow.c:35
    #3  __ef_eplock_lock_slow (ni=ni@entry=0x20c8480, timeout=timeout@entry=-1, maybe_wedged=maybe_wedged@entry=0) at /home/iteterev/lab/onload_internal/src/lib/transport/ip/eplock_slow.c:72
    #4  0x00007fd2d80d7dbf in ef_eplock_lock (ni=0x20c8480) at /home/iteterev/lab/onload_internal/src/include/onload/eplock.h:61
    #5  __ci_netif_lock_count (stat=0x7fd2d5c5b62c, ni=0x20c8480) at /home/iteterev/lab/onload_internal/src/include/ci/internal/ip_shared_ops.h:79
    #6  ci_tcp_setsockopt (ep=ep@entry=0x20c8460, fd=6, level=level@entry=1, optname=optname@entry=9, optval=optval@entry=0x7ffd15be6acc, optlen=optlen@entry=4) at /home/iteterev/lab/onload_internal/src/lib/transport/ip/tcp_sockopts.c:580
    #7  0x00007fd2d8010da7 in citp_tcp_setsockopt (fdinfo=0x20c8420, level=1, optname=9, optval=0x7ffd15be6acc, optlen=4) at /home/iteterev/lab/onload_internal/src/lib/transport/unix/tcp_fd.c:1594
    #8  0x00007fd2d7fde088 in onload_setsockopt (fd=6, level=1, optname=9, optval=0x7ffd15be6acc, optlen=4) at /home/iteterev/lab/onload_internal/src/lib/transport/unix/sockcall_intercept.c:737
    #9  0x00007fd2d7dcb7dd in ?? ()
    #10 0x00007fd2d83392e0 in ?? () from /home/iteterev/lab/onload_internal/build/gnu_x86_64/lib/transport/unix/libcitransport0.so
    #11 0x000000000060102c in data_start ()
    #12 0x00007fd2d8339540 in ?? () from /home/iteterev/lab/onload_internal/build/gnu_x86_64/lib/transport/unix/libcitransport0.so
    #13 0x00000001d85426c0 in ?? ()
    #14 0x00007fd2d7fcbe08 in ?? ()
    #15 0x00007fd2d7a433c7 in __cxa_finalize () from /lib64/libc.so.6
    #16 0x00007fd2d7dcb757 in ?? ()
    #17 0x00007ffd15be6be0 in ?? ()
    #18 0x00007fd2d834f2a6 in _dl_fini () from /lib64/ld-linux-x86-64.so.2

Here, _fini() is a function that calls all library destructors. The
problem is that _fini() decides to run the C++ library destructor
*after* Onload and makes it operate on an invalid Onload state.

The patch leverages the fact that Glibc sets up _fini() after running
the last library constructor, so by manually installing the exit handler
(instead of providing a library destructor), Onload wins the race with
_fini().

There's still an issue if the user library sets a custom exit handler
with atexit() or on_exit() and makes intercepted system calls from
there.

Tested:

* RHEL 7.9/glibc 2.17
* RHEL 8.2/glibc 2.28
* RHEL 9.1/glibc 2.34

Thanks-to: Richard Hughes <[email protected]>
Thanks-to: Siân James <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants