Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User build failing with "error: ‘__fxstat’ undeclared here (not in a function); did you mean ‘fstat’" #17

Closed
marascio opened this issue Mar 18, 2021 · 6 comments

Comments

@marascio
Copy link

I am trying to build the user space tools on Arch linux, with GLIBC 2.33. I am getting the following errors when building:

nice cc -I. -I./../../../include -I/home/lrm/c/other/onload/src/include -I/home/lrm/c/other/onload/src/lib/transport/ip    -fPIC  -m64  -Werror -Wall -Wundef -Wpointer-arith -Wstrict-prototypes -Wnested-externs -Wno-stringop-truncation -Wno-format-truncation -Wimplicit-fallthrough=2 -Wno-array-bounds -Wno-stringop-overflow -Wno-deprecated-declarations -DTRANSPORT_CONFIG_OPT_HDR='<ci/internal/transport_config_opt_extra.h>' -include /home/lrm/c/other/onload/build/gnu_x86_64/cp_intf_ver.h -O2 -g $cflags $cppflags -c /home/lrm/c/other/onload/src/lib/transport/ip/syscall.c -o ci_ip_syscall.o
/home/lrm/c/other/onload/src/include/onload/declare_syscalls.h.tmpl:115:28: error: ‘__fxstat’ undeclared here (not in a function); did you mean ‘fstat’?
  115 | CI_MK_DECL(int           , __fxstat   , (int, int, struct stat *));
      |                            ^~~~~~~~
/home/lrm/c/other/onload/src/lib/transport/ip/syscall.c:26:62: note: in definition of macro ‘CI_MK_DECL’
   26 | #define CI_MK_DECL(ret, fn, args)  ret (*ci_sys_##fn) args = fn
      |                                                              ^~
/home/lrm/c/other/onload/src/include/onload/declare_syscalls.h.tmpl:120:28: error: ‘__fxstat64’ undeclared here (not in a function); did you mean ‘fstat64’?
  120 | CI_MK_DECL(int           , __fxstat64 , (int, int, struct stat64 *));
      |                            ^~~~~~~~~~
/home/lrm/c/other/onload/src/lib/transport/ip/syscall.c:26:62: note: in definition of macro ‘CI_MK_DECL’
   26 | #define CI_MK_DECL(ret, fn, args)  ret (*ci_sys_##fn) args = fn
      |                                                              ^~
make[3]: *** [/home/lrm/c/other/onload/mk/after.mk:151: ci_ip_syscall.o] Error 1

I wonder if this is related to this commit in GLIBC: bminor/glibc@8ed005d

@marascio
Copy link
Author

marascio commented Mar 18, 2021

In an effort to just get this to compile I made the following changes:

diff --git a/src/include/onload/syscall_unix.h b/src/include/onload/syscall_unix.h
index d9a2324..94b1922 100644
--- a/src/include/onload/syscall_unix.h
+++ b/src/include/onload/syscall_unix.h
@@ -43,6 +43,20 @@ extern int __ppoll_chk (struct pollfd *__fds, nfds_t __nfds,
 extern __sighandler_t bsd_signal(int signum, __sighandler_t handler);
 extern __sighandler_t sysv_signal(int signum, __sighandler_t handler);
 
+#ifndef _STAT_VER
+ #if defined (__aarch64__)
+  #define _STAT_VER 0
+ #elif defined (__x86_64__)
+  #define _STAT_VER 1
+ #else
+  #define _STAT_VER 3
+ #endif
+#endif
+
+struct stat64;
+extern int __fxstat (int vers, int fd, struct stat *);
+extern int __fxstat64 (int vers, int fd, struct stat64 *);
+
 
 /*! Generate declarations of pointers to the system calls */
 #define CI_MK_DECL(ret,fn,args)  extern ret (*ci_sys_##fn) args CI_HV

I've no idea if these "correct", but it does now build and the unit tests pass.

Running tests
make: Entering directory '/home/lrm/c/other/onload/build/gnu_x86_64/tests/onload/oof'
/usr/bin/timeout 240 prove --exec ' ' \
 "./oof_test sanity" "./oof_test multicast_sanity" "./oof_test namespace_sanity" "./oof_test namespace_macvlan_move" 
./oof_test sanity .................. ok     
./oof_test multicast_sanity ........ ok   
./oof_test namespace_sanity ........ ok     
./oof_test namespace_macvlan_move .. ok     
All tests successful.
Files=4, Tests=158,  0 wallclock secs ( 0.03 usr  0.00 sys +  0.00 cusr  0.01 csys =  0.04 CPU)
Result: PASS
make: Leaving directory '/home/lrm/c/other/onload/build/gnu_x86_64/tests/onload/oof'
make: Entering directory '/home/lrm/c/other/onload/build/gnu_x86_64/tests/onload/cplane_unit'
/usr/bin/timeout 600 prove -j2 --merge --exec '' \
 ./test_route ./test_route_expire ./test_arp_expire ./test_route_stress ./test_teambond ./test_namespace ./test_service_dnat 
./test_route ......... ok                                               
./test_route_expire .. ok                                               
./test_arp_expire .... ok                                               
./test_route_stress .. ok                                               
./test_namespace ..... ok                                               
./test_service_dnat .. ok                                               
./test_teambond ...... ok   
All tests successful.

I would imagine the changes in the diff would need to be conditioned on GLIBC version or something else so as not to always be in effect. I'd be happy to prepare a PR if what I'm doing here isn't totally insane, just let me know.

@ol-alexandra
Copy link
Contributor

Louis, thank you very much for your report. I've created an internal Xilinx issue for this: ON-13062. I can reproduce it with Ubuntu 21.04 preview, and I'll fix it soon.

@ol-alexandra
Copy link
Contributor

Fixed by 5e5201b. And thank you again! Your patch was not the right way to handle it, but it explained me everything about this libc change.

@haelix888
Copy link

@ol-alexandra you said you tried building against Ubuntu 21.04, which contains Linux kernel 5.11. Does this mean this kernel version is "known to work" all the way? What is the way to track which is the latest supported linux version for the package? Thanks.

@ol-alexandra
Copy link
Contributor

Yes, the master branch of this repository it properly tested with Ubuntu 21.04, so it works with Linux kernel 5.11.
One of Onload developers also mentioned that it works with linux-5.12, but I did not try it myself.

I'm afraid there is no reasonable way to track the latest supported kernel. Usually I add such support as soon as the kernel is available from one of the major distros: Fedora, Debian (via Debian testing), Ubuntu. Unfortunately sometimes it takes a lot of effort and time to fix Onload for a new kernel.

Probably I should update README.md when I add new kernel support...

@ol-alexandra
Copy link
Contributor

@haelix888 I've updated README.md with the linux kernel version. I'll try to remember doing this in future.

cns-ci-onload-xilinx pushed a commit that referenced this issue Feb 1, 2023
Deferring oo_exit_hook() fixes a stuck C++ application:

    #0  0x00007fd2d7afb87b in ioctl () from /lib64/libc.so.6
    #1  0x00007fd2d80c0621 in oo_resource_op (cmd=3221510722, io=0x7ffd15be696c, fp=<optimized out>) at /home/iteterev/lab/onload_internal/src/include/onload/mmap.h:104
    #2  __oo_eplock_lock (timeout=<synthetic pointer>, maybe_wedged=0, ni=0x20c8480) at /home/iteterev/lab/onload_internal/src/lib/transport/ip/eplock_slow.c:35
    #3  __ef_eplock_lock_slow (ni=ni@entry=0x20c8480, timeout=timeout@entry=-1, maybe_wedged=maybe_wedged@entry=0) at /home/iteterev/lab/onload_internal/src/lib/transport/ip/eplock_slow.c:72
    #4  0x00007fd2d80d7dbf in ef_eplock_lock (ni=0x20c8480) at /home/iteterev/lab/onload_internal/src/include/onload/eplock.h:61
    #5  __ci_netif_lock_count (stat=0x7fd2d5c5b62c, ni=0x20c8480) at /home/iteterev/lab/onload_internal/src/include/ci/internal/ip_shared_ops.h:79
    #6  ci_tcp_setsockopt (ep=ep@entry=0x20c8460, fd=6, level=level@entry=1, optname=optname@entry=9, optval=optval@entry=0x7ffd15be6acc, optlen=optlen@entry=4) at /home/iteterev/lab/onload_internal/src/lib/transport/ip/tcp_sockopts.c:580
    #7  0x00007fd2d8010da7 in citp_tcp_setsockopt (fdinfo=0x20c8420, level=1, optname=9, optval=0x7ffd15be6acc, optlen=4) at /home/iteterev/lab/onload_internal/src/lib/transport/unix/tcp_fd.c:1594
    #8  0x00007fd2d7fde088 in onload_setsockopt (fd=6, level=1, optname=9, optval=0x7ffd15be6acc, optlen=4) at /home/iteterev/lab/onload_internal/src/lib/transport/unix/sockcall_intercept.c:737
    #9  0x00007fd2d7dcb7dd in ?? ()
    #10 0x00007fd2d83392e0 in ?? () from /home/iteterev/lab/onload_internal/build/gnu_x86_64/lib/transport/unix/libcitransport0.so
    #11 0x000000000060102c in data_start ()
    #12 0x00007fd2d8339540 in ?? () from /home/iteterev/lab/onload_internal/build/gnu_x86_64/lib/transport/unix/libcitransport0.so
    #13 0x00000001d85426c0 in ?? ()
    #14 0x00007fd2d7fcbe08 in ?? ()
    #15 0x00007fd2d7a433c7 in __cxa_finalize () from /lib64/libc.so.6
    #16 0x00007fd2d7dcb757 in ?? ()
    #17 0x00007ffd15be6be0 in ?? ()
    #18 0x00007fd2d834f2a6 in _dl_fini () from /lib64/ld-linux-x86-64.so.2

Here, _fini() is a function that calls all library destructors. The
problem is that _fini() decides to run the C++ library destructor
*after* Onload and makes it operate on an invalid Onload state.

The patch leverages the fact that Glibc sets up _fini() after running
the last library constructor, so by manually installing the exit handler
(instead of providing a library destructor), Onload wins the race with
_fini().

There's still an issue if the user library sets a custom exit handler
with atexit() or on_exit() and makes intercepted system calls from
there.

Tested:

* RHEL 7.9/glibc 2.17
* RHEL 8.2/glibc 2.28
* RHEL 9.1/glibc 2.34

Thanks-to: Richard Hughes <[email protected]>
Thanks-to: Siân James <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants