Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about rpath patching #21

Closed
danpetry opened this issue Feb 7, 2024 · 10 comments
Closed

Question about rpath patching #21

danpetry opened this issue Feb 7, 2024 · 10 comments
Labels
question Further information is requested

Comments

@danpetry
Copy link

danpetry commented Feb 7, 2024

Comment:

Building this feedstock on Anaconda's CI, we get the following error:

None INFO runpaths check failed
None INFO ["  ERROR (cuda-cudart_linux-64,targets/x86_64-linux/lib/libcudart.so.12.3.101): runpaths ['$ORIGIN'] found in
/croot/cuda-cudart-split_1707255477927/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol
d_placehold_placehold_placehold_placehold/targets/x86_64-linux/lib/libcudart.so.12.3.101"]

I'm not sure whether this comes from conda-build or some other process. I'm guessing conda-build, but I'm wondering why you don't get the same error?
Also, runpath is used instead of rpath AFAIU if they're both present, and they have different behaviours with respect to precedence of their entries over default system lib locations and LD_LIBRARY_PATH. With runpath, system library locations have precedence over runpath location, so if the user has installed their own libraries they'll take precedence? So you may want to erase the runpath entries as well?

@danpetry danpetry added the question Further information is requested label Feb 7, 2024
@danpetry
Copy link
Author

danpetry commented Feb 7, 2024

In the comment above, I misunderstood patchelf's behaviour. patchelf --set-rpath will set the runpatch unless --force-rpath is provided. So I guess you were going for the behaviour above? Or maybe the point is that LD_LIBRARY_PATH has precedence over runpath but not rpath...?

@danpetry
Copy link
Author

danpetry commented Feb 7, 2024

Further:

(base) [ec2-user@ip-172-31-48-167 lib]$ readelf -d libcudart.so.12

Dynamic section at offset 0xafd50 contains 33 entries:
  Tag        Type                         Name/Value
 0x000000000000001d (RUNPATH)            Library runpath: [$ORIGIN]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [ld-linux-x86-64.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [librt.so.1]
 0x000000000000000e (SONAME)             Library soname: [libcudart.so.12]

Are these libraries that conda-forge packages but defaults doesn't?

@jakirkham
Copy link
Member

For detailed background on the addition of patchelf --set-rpath, please read issue: conda-forge/cuda-feedstock#10

As to the changes themselves, please take a look at PR: #15

Would add that we needed to disable error_overlinking checks in conda-build as part of this change (as noted in the linked PR). That may be relevant to the error encountered

@danpetry
Copy link
Author

danpetry commented Feb 9, 2024

Ok, thanks for the info. I read through the issue and there's something I don't understand: why does setting the rpath to $ORIGIN after creation of the symlink make the linker resolve rpath to ${PREFIX}/lib at runtime? The rpath is being set on the original .sos in the targets/<arch> directory, so I would have though there's no change made? (I'm assuming these .sos already had $ORIGIN set as rpath when they came from Nvidia.)

This is basically for my understanding, I get the general idea and trust that you've all done this properly anyway.

@danpetry
Copy link
Author

danpetry commented Feb 9, 2024

nope, the provided .sos don't have an rpath set. The point is that without the rpath, the linker would look in the normal system locations and the location of the binary (.../targets/<arch>/) but with the rpath, it'll look in the directory of the symlink as well?

@vyasr
Copy link
Contributor

vyasr commented Feb 14, 2024

This looks to be an oversight on our part. When writing up conda-forge/cuda-feedstock#10, we had intended to set the RPATH and not the RUNPATH on all of our binaries, and we correctly did that in the feedstock that originally triggered this inspection (see the use of --force-rpath as you mentioned in https://github.com/conda-forge/cuda-nvtx-feedstock/pull/5/files#diff-d7075654874cb08007a21aaab3ecd4b3453a9087e7505d034d548b8938b599bc). We intended to port that fix over to other feedstocks as well but we appear to have forgotten. We'll get on to fix that.

I'm not sure whether this comes from conda-build or some other process. I'm guessing conda-build, but I'm wondering why you don't get the same error?

Yes, this is coming from conda build. This error only shows up when certain flags are set for conda build. Whether or not those flags are set by default may depend on your environment, so you may be seeing an error that we are not.

Also, runpath is used instead of rpath
...
So I guess you were going for the behaviour above? Or maybe the point is that LD_LIBRARY_PATH has precedence over runpath but not rpath...?

No, I think this is a mistake by us. There's a lot of debate in general about RPATH vs. RUNPATH and in what contexts to use what, but within the conda community at least there is a preference for RPATH since the types of library modification that RUNPATH would support are less appropriate when you are in a self-contained environment that in principle affords the user complete control over all libraries (i.e. there's no need to point away from system-installed libraries via LD_LIBRARY_PATH when everything within the environment should ideally only be within that environment and you can freely install your own versions or blow away your environment to test different cases).

I read through the issue and there's something I don't understand: ...

I would read through the "Problem Statement" section in conda-forge/cuda-feedstock#10 and see if that answers your question. If not, feel free to follow up. I suspect, though, that you are confusing the statement about which libraries are loaded at runtime (the ones in $PREFIX/lib vs the ones in $PREFIX/targets/...) and the statement about the RPATH (which we want to be $ORIGIN precisely because of how the symlink is resolved by the runtime loader). The RPATH is never $PREFIX/lib. The idea is that we want different things to be found by the linker (i.e. during the compilation process) and by the runtime loader (when users have the libraries in their environment).

danpetry added a commit to AnacondaRecipes/cuda-cudart-feedstock that referenced this issue Feb 20, 2024
@danpetry
Copy link
Author

danpetry commented Feb 20, 2024

ok, thanks. I'm going to include --force-rpath in this feedstock, then. This makes more sense.

Re the question about how the loading works: what I was missing was that the loader considers $ORIGIN to be relative to the location of the symlink, not the file to which the symlink points. The loader finds the symlink in $PREFIX/lib, then loads the library at $PREFIX/targets/<arch>/lib, then $ORIGIN is considered to be $PREFIX/lib (not $PREFIX/targets/<arch>/lib), because that's the loader's current working directory (or whatever the term is in this context).

@jakirkham
Copy link
Member

Vyas put together the PRs linked above, which should fix this issue in CUDA 12.3

Please let us know if those work for you and if you spot any other issues

@vyasr
Copy link
Contributor

vyasr commented Mar 26, 2024

Closing as resolved, but feel free to reopen if there are follow-up questions.

@vyasr vyasr closed this as completed Mar 26, 2024
@danpetry
Copy link
Author

Thanks, looking good, will let you know if we see any further issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants