Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binutils 2.43.50: Segmentation fault in test_local_lib #698

Open
hroncok opened this issue Oct 24, 2024 · 14 comments
Open

binutils 2.43.50: Segmentation fault in test_local_lib #698

hroncok opened this issue Oct 24, 2024 · 14 comments
Labels
dependency-bug A bug experienced by users of meson-python caused by a dependency, rather than in code in this repo

Comments

@hroncok
Copy link

hroncok commented Oct 24, 2024

Hello.

I am trying to build and test meson-python with Python 3.14 in Fedora.

I see a strange Segmentation fault in test_local_lib. I can reproduce it on Fedora Rawhide (42), but not on Fedora 39.

To reproduce:

$ podman run --rm -ti fedora:rawhide /usr/bin/bash  # or docker
# dnf install uv git-core cmake python3.14-devel gcc patchelf gdb
...
# git clone https://github.com/mesonbuild/meson-python.git
...
# cd meson-python
# uv venv --python=python3.14 venv  # or regular venv
# . venv/bin/activate
# uv pip install ninja .[test]
Using Python 3.14.0a1 environment at venv
   Built meson-python @ file:///meson-python
Resolved 15 packages in 1.90s
   Built coverage==7.6.4
Prepared 11 packages in 1.06s
Installed 15 packages in 41ms
 + build==1.2.2.post1
 + coverage==7.6.4
 + cython==3.0.11
 + iniconfig==2.0.0
 + meson==1.6.0
 + meson-python==0.18.0.dev0 (from file:///meson-python)
 + ninja==1.11.1.1
 + packaging==24.1
 + pluggy==1.5.0
 + pyproject-hooks==1.2.0
 + pyproject-metadata==0.9.0
 + pytest==8.3.3
 + pytest-cov==5.0.0
 + pytest-mock==3.14.0
 + wheel==0.44.0
# python -m pytest -k test_local_lib
...
============================= test session starts ==============================
platform linux -- Python 3.14.0a1, pytest-8.3.3, pluggy-1.5.0
rootdir: /meson-python
configfile: pyproject.toml
testpaths: tests
plugins: cov-5.0.0, mock-3.14.0
collected 123 items / 122 deselected / 1 selected                              

tests/test_wheel.py F                                                    [100%]

=================================== FAILURES ===================================
________________________________ test_local_lib ________________________________

venv = <tests.conftest.VEnv object at 0x7fb577566f90>
wheel_link_against_local_lib = PosixPath('/tmp/pytest-of-root/pytest-5/test0/mesonpy-test-5tupkd1z/link_against_local_lib-1.0.0-cp314-cp314-linux_x86_64.whl')

    @pytest.mark.skipif(sys.platform not in {'linux', 'darwin'}, reason='Not supported on this platform')
    def test_local_lib(venv, wheel_link_against_local_lib):
        venv.pip('install', wheel_link_against_local_lib)
>       output = venv.python('-c', 'import example; print(example.example_sum(1, 2))')

tests/test_wheel.py:160: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/conftest.py:114: in python
    return subprocess.check_output([self.executable, *args]).decode()
/usr/lib64/python3.14/subprocess.py:472: in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

input = None, capture_output = False, timeout = None, check = True
popenargs = (['/tmp/pytest-of-root/pytest-5/mesonpy-test-venv0/bin/python3.14', '-c', 'import example; print(example.example_sum(1, 2))'],)
kwargs = {'stdout': -1}
process = <Popen: returncode: -11 args: ['/tmp/pytest-of-root/pytest-5/mesonpy-test-ve...>
stdout = b'', stderr = None, retcode = -11

    def run(*popenargs,
            input=None, capture_output=False, timeout=None, check=False, **kwargs):
        """Run command with arguments and return a CompletedProcess instance.
    
        The returned instance will have attributes args, returncode, stdout and
        stderr. By default, stdout and stderr are not captured, and those attributes
        will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them,
        or pass capture_output=True to capture both.
    
        If check is True and the exit code was non-zero, it raises a
        CalledProcessError. The CalledProcessError object will have the return code
        in the returncode attribute, and output & stderr attributes if those streams
        were captured.
    
        If timeout is given, and the process takes too long, a TimeoutExpired
        exception will be raised.
    
        There is an optional argument "input", allowing you to
        pass bytes or a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it will be used internally.
    
        By default, all communication is in bytes, and therefore any "input" should
        be bytes, and the stdout and stderr will be bytes. If in text mode, any
        "input" should be a string, and stdout and stderr will be strings decoded
        according to locale encoding, or by "encoding" if set. Text mode is
        triggered by setting any of text, encoding, errors or universal_newlines.
    
        The other arguments are the same as for the Popen constructor.
        """
        if input is not None:
            if kwargs.get('stdin') is not None:
                raise ValueError('stdin and input arguments may not both be used.')
            kwargs['stdin'] = PIPE
    
        if capture_output:
            if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
                raise ValueError('stdout and stderr arguments may not be used '
                                 'with capture_output.')
            kwargs['stdout'] = PIPE
            kwargs['stderr'] = PIPE
    
        with Popen(*popenargs, **kwargs) as process:
            try:
                stdout, stderr = process.communicate(input, timeout=timeout)
            except TimeoutExpired as exc:
                process.kill()
                if _mswindows:
                    # Windows accumulates the output in a single blocking
                    # read() call run on child threads, with the timeout
                    # being done in a join() on those threads.  communicate()
                    # _after_ kill() is required to collect that and add it
                    # to the exception.
                    exc.stdout, exc.stderr = process.communicate()
                else:
                    # POSIX _communicate already populated the output so
                    # far into the TimeoutExpired exception.
                    process.wait()
                raise
            except:  # Including KeyboardInterrupt, communicate handled that.
                process.kill()
                # We don't call process.wait() as .__exit__ does that for us.
                raise
            retcode = process.poll()
            if check and retcode:
>               raise CalledProcessError(retcode, process.args,
                                         output=stdout, stderr=stderr)
E               subprocess.CalledProcessError: Command '['/tmp/pytest-of-root/pytest-5/mesonpy-test-venv0/bin/python3.14', '-c', 'import example; print(example.example_sum(1, 2))']' died with <Signals.SIGSEGV: 11>.

/usr/lib64/python3.14/subprocess.py:577: CalledProcessError
---------------------------- Captured stdout setup -----------------------------
Initialized empty Git repository in /meson-python/tests/packages/link-against-local-lib/.git/
+ meson setup /meson-python/tests/packages/link-against-local-lib /meson-python/tests/packages/link-against-local-lib/.mesonpy-gx4cvlf7 -Dbuildtype=release -Db_ndebug=if-release -Db_vscrt=md --native-file=/meson-python/tests/packages/link-against-local-lib/.mesonpy-gx4cvlf7/meson-python-native-file.ini
The Meson build system
Version: 1.6.0
Source dir: /meson-python/tests/packages/link-against-local-lib
Build dir: /meson-python/tests/packages/link-against-local-lib/.mesonpy-gx4cvlf7
Build type: native build
Project name: link-against-local-lib
Project version: 1.0.0
C compiler for the host machine: cc (gcc 14.2.1 "cc (GCC) 14.2.1 20240912 (Red Hat 14.2.1-4)")
C linker for the host machine: cc ld.bfd 2.43.50.20241014
Host machine cpu family: x86_64
Host machine cpu: x86_64
Program python found: YES (/meson-python/venv/bin/python)
Found pkg-config: YES (/usr/bin/pkg-config) 2.3.0
Run-time dependency python found: YES 3.14
WARNING: Please do not define rpath with a linker argument, use install_rpath
or build_rpath properties instead.
This will become a hard error in a future Meson release.

Build targets in project: 2

link-against-local-lib 1.0.0

  User defined options
    Native files: /meson-python/tests/packages/link-against-local-lib/.mesonpy-gx4cvlf7/meson-python-native-file.ini
    b_ndebug    : if-release
    b_vscrt     : md
    buildtype   : release

Found ninja-1.11.1.git.kitware.jobserver-1 at /meson-python/venv/bin/ninja
+ /meson-python/venv/bin/ninja
[1/5] Compiling C object lib/libexample.so.p/examplelib.c.o
[2/5] Linking target lib/libexample.so
[3/5] Compiling C object example.cpython-314-x86_64-linux-gnu.so.p/examplemod.c.o
[4/5] Generating symbol file lib/libexample.so.p/libexample.so.symbols
[5/5] Linking target example.cpython-314-x86_64-linux-gnu.so
[1/2] /meson-python/tests/packages/link-against-local-lib/.mesonpy-gx4cvlf7/lib/libexample.so
[2/2] /meson-python/tests/packages/link-against-local-lib/.mesonpy-gx4cvlf7/example.cpython-314-x86_64-linux-gnu.so
=========================== short test summary info ============================
FAILED tests/test_wheel.py::test_local_lib - subprocess.CalledProcessError: Command '['/tmp/pytest-of-root/pytest-5/meso...
====================== 1 failed, 122 deselected in 3.01s =======================

# /tmp/pytest-of-root/pytest-5/mesonpy-test-venv0/bin/python3.14 -c 'import example; print(example.example_sum(1, 2))'
Segmentation fault (core dumped)

# gdb /tmp/pytest-of-root/pytest-5/mesonpy-test-venv0/bin/python3.14
(gdb) run -c 'import example; print(example.example_sum(1, 2))'
Program received signal SIGSEGV, Segmentation fault.
0x00007f05fbbe1294 in ?? ()
(gdb) bt
#0  0x00007f05fbbe1294 in ?? ()
#1  0x00007f05fbbfc310 in call_init (l=0x56104552bed0, argc=3, 
    argv=0x7fffbdb0b0f8, env=0x7fffbdb0b118) at dl-init.c:60
#2  call_init (l=0x56104552bed0, argc=3, argv=0x7fffbdb0b0f8, 
    env=0x7fffbdb0b118) at dl-init.c:26
#3  0x00007f05fbbfc42d in _dl_init (main_map=0x56104552bed0, argc=3, 
    argv=0x7fffbdb0b0f8, env=0x7fffbdb0b118) at dl-init.c:121
#4  0x00007f05fbbf9562 in __GI__dl_catch_exception (
    exception=exception@entry=0x0, 
    operate=operate@entry=0x7f05fbc030a0 <call_dl_init>, 
    args=args@entry=0x7fffbdb09fc0) at dl-catch.c:215
#5  0x00007f05fbc03039 in dl_open_worker (a=a@entry=0x7fffbdb09fc0)
    at dl-open.c:785
#6  0x00007f05fbbf94c3 in __GI__dl_catch_exception (
    exception=exception@entry=0x7fffbdb09fa0, 
    operate=operate@entry=0x7f05fbc02fb0 <dl_open_worker>, 
    args=args@entry=0x7fffbdb09fc0) at dl-catch.c:241
#7  0x00007f05fbc03424 in _dl_open (
    file=0x7f05fb0d94f0 "/tmp/pytest-of-root/pytest-5/mesonpy-test-venv0/lib64/python3.14/site-packages/example.cpython-314-x86_64-linux-gnu.so", 
    mode=<optimized out>, 
    caller_dlopen=0x7f05fb869e21 <_imp_create_dynamic+929>, 
    nsid=<optimized out>, argc=3, argv=0x7fffbdb0b0f8, env=0x7fffbdb0b118)
    at dl-open.c:860
#8  0x00007f05fb47a9b4 in dlopen_doit () from /lib64/libc.so.6
#9  0x00007f05fbbf94c3 in __GI__dl_catch_exception (
    exception=exception@entry=0x7fffbdb0a1b0, 
    operate=0x7f05fb47a950 <dlopen_doit>, args=0x7fffbdb0a270)
    at dl-catch.c:241
#10 0x00007f05fbbf9619 in _dl_catch_error (objname=0x7fffbdb0a218, 
    errstring=0x7fffbdb0a220, mallocedp=0x7fffbdb0a217, 
    operate=<optimized out>, args=<optimized out>) at dl-catch.c:260
#11 0x00007f05fb47a4a3 in _dlerror_run () from /lib64/libc.so.6
#12 0x00007f05fb47aa6f in dlopen@GLIBC_2.2.5 () from /lib64/libc.so.6
#13 0x00007f05fb869e21 in _imp_create_dynamic ()
   from /lib64/libpython3.14.so.1.0
#14 0x00007f05fb77dccb in cfunction_vectorcall_FASTCALL ()
   from /lib64/libpython3.14.so.1.0
#15 0x00007f05fb75b05a in _PyEval_EvalFrameDefault ()
   from /lib64/libpython3.14.so.1.0
#16 0x00007f05fb77aec2 in object_vacall () from /lib64/libpython3.14.so.1.0
#17 0x00007f05fb7b441e in PyObject_CallMethodObjArgs ()
   from /lib64/libpython3.14.so.1.0
#18 0x00007f05fb7b35bd in PyImport_ImportModuleLevelObject ()
   from /lib64/libpython3.14.so.1.0
#19 0x00007f05fb75d7c9 in _PyEval_EvalFrameDefault ()
   from /lib64/libpython3.14.so.1.0
#20 0x00007f05fb82d3bb in PyEval_EvalCode () from /lib64/libpython3.14.so.1.0
#21 0x00007f05fb852050 in run_eval_code_obj () from /lib64/libpython3.14.so.1.0
#22 0x00007f05fb84af83 in run_mod () from /lib64/libpython3.14.so.1.0
#23 0x00007f05fb83d8ee in _PyRun_StringFlagsWithName.constprop.0 ()
   from /lib64/libpython3.14.so.1.0
#24 0x00007f05fb83d798 in _PyRun_SimpleStringFlagsWithName ()
   from /lib64/libpython3.14.so.1.0
#25 0x00007f05fb8647e4 in Py_RunMain () from /lib64/libpython3.14.so.1.0
#26 0x00007f05fb81c7ec in Py_BytesMain () from /lib64/libpython3.14.so.1.0
#27 0x00007f05fb4120c8 in __libc_start_call_main () from /lib64/libc.so.6
#28 0x00007f05fb41218b in __libc_start_main_impl () from /lib64/libc.so.6
#29 0x0000561011e4f095 in _start ()

@dnicolodi
Copy link
Member

Is this the only test that fails? As far as I can tell, Python segfaults when executing the extension module initialization function. The module is extremely simple https://github.com/mesonbuild/meson-python/blob/main/tests/packages/link-against-local-lib/examplemod.c, thus I don't see how this is possible other than because of a bug in CPython. The other possibility is a bug in the packaging: if you end up with the Python headers and the installed Python having a different opinion about the shape of the PyModuleDef structure. Or something like this.

@hroncok
Copy link
Author

hroncok commented Oct 24, 2024

Is this the only test that fails?

Yes. Another one fails with pytest-dev/pytest-mock#468

@hroncok
Copy link
Author

hroncok commented Oct 24, 2024

Huh, I can even reproduce this with Python 3.13.0 and 3.12.7. Possibly this is a problem in binutils etc.

@hroncok hroncok changed the title Python 3.14.0a1: Segmentation fault in test_local_lib Segmentation fault in test_local_lib Oct 24, 2024
@hroncok
Copy link
Author

hroncok commented Oct 24, 2024

I can reproduce the crash with binutils 2.43.50 but not with binutils 2.43.1.

I'll take that to Fedora's binutils maintainer.

Should I keep this open or close it?

@hroncok hroncok changed the title Segmentation fault in test_local_lib binutils 2.43.50: Segmentation fault in test_local_lib Oct 24, 2024
@hroncok
Copy link
Author

hroncok commented Oct 24, 2024

@rgommers
Copy link
Contributor

Should I keep this open or close it?

Looks unrelated to meson-python, so it'd make sense to close this. If you prefer to keep it open for some days until you receive a reply on the binutils bug report, that seems fine as well.

@hroncok hroncok closed this as not planned Won't fix, can't repro, duplicate, stale Oct 25, 2024
@hroncok
Copy link
Author

hroncok commented Oct 30, 2024

For the record, the binutils folks say this is a problem in patchelf. They are also quite determined that patchelf cannot be supported and would rather see meson-python utilize the final -Wl,-rpath=… option when building the extension module.

@rgommers
Copy link
Contributor

rgommers commented Oct 30, 2024

Reopening to keep it visible, since it doesn't sound like a fix in either binutils or patchelf is in the work just yet.

They are also quite determined that patchelf cannot be supported and would rather see meson-python utilize the final -Wl,-rpath=… option when building the extension module.

Using the final -Wl,-rpath doesn't seem possible, since meson-python isn't actually building the extension module - meson is. And the package author (who could add an rpath argument to the package itself) doesn't know where and how meson-python will vendor the shared library into the wheel.

If the problem is RPATH rewriting though, this isn't just going to show up in this test case (which is a little niche and for a scenario that possibly is unused in the real world so far - not sure). auditwheel is doing the same when it vendors external shared libraries into wheels distributed on PyPI. That isn't going to show up in bug reports very soon only because auditwheel is usually run in a manylinux container that doesn't have a recent binutils. But that's an important use case for patchelf. And Nix will need it as well I'm sure.

It's still a little unclear to me what triggers the bug exactly, but it seems like this has to be fixed either in patchelf or in binutils.

I just read through the whole thread at https://bugzilla.redhat.com/show_bug.cgi?id=2321588. A few comments:

  • Agreed that setting LD_LIBRARY_PATH is useless (that means the wheel is broken by default)
  • Some of the feedback about us not understanding our build system seems misguided. The problem is fundamentally about (a) the way Python wheels are standardized and are not containing a libdir location for shared libraries, and (b) the need to make Python wheels portable and installable into venv's that don't have a predefined absolute install path on the user's OS. This makes relocating shared libraries a necessary step, and it's common to tooling for Python wheels, Nix packages, Conda packages, etc. It looks to me like saying "use -Wl,-rpath=/final/install/location" misses that key point.

@rgommers rgommers reopened this Oct 30, 2024
@rgommers
Copy link
Contributor

Also, thanks for trying to sort this out @hroncok! Doesn't look like an easy conversation.

@hroncok
Copy link
Author

hroncok commented Oct 30, 2024

-Wl,-rpath=/final/install/location

technically, this path is relative, so we don't have that exact problem. If meson-python could "tell" meson to use a particular path, that should work, no?

@eli-schwartz
Copy link
Member

There appears to be a kind of weirdly layered confusion going on here across multiple issue trackers.

"meson-python" happens to use patchelf, which uncovered a bug in the (uncoordinated) interaction between binutils snapshots (?) used distro-wide in fedora, and patchelf, a program widely used in various contacts. As noted in the fedora ticket, binutils has broken the PyPy build as well.

This bug doesn't need pip to replicate it, I'm sure. You could use python -m build, available on PyPI as "build", instead of pip install. It will create a wheel for you. And "build" assumes developer intent already, which means no passing --verbose to pip.

Patchelf is needed by literally anyone building wheels with C libraries for upload to PyPI and usage by basically all Linux users on any distro. Fedora and its derivatives are actually quite popular for this due to GCC Toolset, which allows you to use new GCC versions with older glibc... So having this broken on fedora specifically, seems a bit unfortunate!

Why is it needed, you ask? Well, it's needed because uploading to PyPI is a subcategory of building standalone binaries, so there's a program whose sole purpose is to modify your wheels, copy system libraries into the wheel, use patchelf to retarget everything to use relative rpaths that are part of the wheel layout, and upload the now standalone python modules.

This is a serious use case and complaining that meson-python should just not do that when using its own libraries, is missing the point (even though as a meson maintained, not a meson-python maintainer, I am sympathetic to this argument). Less blame, more investigating whether patchelf and binutils can get along, please.

@eli-schwartz
Copy link
Member

The problem is fundamentally about (a) the way Python wheels are standardized and are not containing a libdir location for shared libraries, and (b) the need to make Python wheels portable and installable into venv's that don't have a predefined absolute install path on the user's OS. This makes relocating shared libraries a necessary step, and it's common to tooling for Python wheels, Nix packages, Conda packages, etc. It looks to me like saying "use -Wl,-rpath=/final/install/location" misses that key point.

@rgommers, note that this actually isn't about being relocatable. It's about changing the install layout at all. Being relocatable just means you need an rpath string using $ORIGIN (this is a dynamic loader variable) and the knowledge at the time of linking, what relocatable layout you need. You can then just inject the string value in LDFLAGS.

It doesn't help because you will potentially still have other unwanted rpath entries, and you can't handle library dependencies that aren't part of meson.build -- that's why auditwheel uses patchelf too, isn't it?

And actually injecting LDFLAGS is difficult to do robustly since if you do it in the environment it will be ignored when the user specified a native file, and if you do it via a native file you scribble all over the user LDFLAGS and the user native files.

@hroncok
Copy link
Author

hroncok commented Oct 31, 2024

Less blame, more investigating whether patchelf and binutils can get along, please.

I am not blaming anybody here. I am merely trying to solve this problem.

I am well aware that even if meson-python stops using patchelf, we will have this problem with auditwheel etc.

@rgommers
Copy link
Contributor

@rgommers, note that this actually isn't about being relocatable. It's about changing the install layout at all.

You're right. I never encountered any other reasons for changing the install layout, so in my mind the two meant roughly the same thing.

Being relocatable just means you need an rpath string using $ORIGIN

Yes indeed. I just wrote docs for using shared libraries in gh-700, and for internal ones it starts with explaining how to use $ORIGIN. Being able to do so is relatively rare though, since shared libraries that are only meant for being included in a Python wheel are quite uncommon. The more typical case is something like this:

c-or-cpp-lib/
  meson.build  # contains shared_library() or library()
  python-bindings/
    meson.build  # contains extension module linking against shared library
  other-lang-bindings/
    ...

In such cases, especially if the Python bindings are maintained by other people than the C/C++ core, it may not be acceptable to mess with how the C/C++ is compiled specifically to make Python wheel builds nicer. The failing test case at hand here is representative for that: the shared library goes to libdir, and meson-python is left to do the "vendoring" work a la auditwheel.

In gh-700 I'm also adding more test cases, including for the $ORIGIN case. One that is still missing is for an external shared library + auditwheel - that may be useful as well.

@rgommers rgommers added the dependency-bug A bug experienced by users of meson-python caused by a dependency, rather than in code in this repo label Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependency-bug A bug experienced by users of meson-python caused by a dependency, rather than in code in this repo
Projects
None yet
Development

No branches or pull requests

4 participants