RPATHs are wrong. #2

isuruf · 2023-11-14T21:48:24Z

Solution to issue cannot be found in the documentation.

I checked the documentation.

Issue

Since the actual library is in $PREFIX/targets/x86_64-linux/lib the RPATH is set to $ORIGIN/../../../lib.

A fix for this is to reverse the links, i.e. make the actual library $PREFIX/lib/libnvToolsExt.so.1.0.0 and make $PREFIX/targets/x86_64-linux/lib/libnvToolsExt.so.1.0.0 to be a symlink.

Having the wrong RPATH will lead to loading wrong libraries.

cc @peterbell10

Installed packages

# packages in environment at /home/isuru/miniforge3/envs/nvtx:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
cuda-nvtx                 12.0.76              hcb278e6_0    conda-forge
cuda-version              12.0                 hffde075_2    conda-forge
libgcc-ng                 13.2.0               h807b86a_3    conda-forge
libgomp                   13.2.0               h807b86a_3    conda-forge
libstdcxx-ng              13.2.0               h7e041cc_3    conda-forge

Environment info

active environment : nvtx
    active env location : /home/isuru/miniforge3/envs/nvtx
            shell level : 2
       user config file : /home/isuru/.condarc
 populated config files : /home/isuru/miniforge3/.condarc
                          /home/isuru/.condarc
          conda version : 23.10.0
    conda-build version : 3.27.0
         python version : 3.10.13.final.0
       virtual packages : __archspec=1=zen3
                          __glibc=2.35=0
                          __linux=6.2.0=0
                          __unix=0=0
       base environment : /home/isuru/miniforge3  (writable)
      conda av data dir : /home/isuru/miniforge3/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /home/isuru/miniforge3/pkgs
                          /home/isuru/.conda/pkgs
       envs directories : /home/isuru/miniforge3/envs
                          /home/isuru/.conda/envs
               platform : linux-64
             user-agent : conda/23.10.0 requests/2.31.0 CPython/3.10.13 Linux/6.2.0-36-generic ubuntu/22.04.3 glibc/2.35 solver/libmamba conda-libmamba-solver/23.11.0 libmambapy/1.5.3
                UID:GID : 1000:1000
             netrc file : /home/isuru/.netrc
           offline mode : False

The text was updated successfully, but these errors were encountered:

jakirkham · 2023-11-15T18:16:47Z

Thanks Isuru! 🙏

Is this only an NVTX issue or did we see this elsewhere?

Also is there some background/context about where this came up? For example is this related to PyTorch's NVTX search logic ( where we had some issues recently conda-forge/pytorch-cpu-feedstock#203 (comment) ) or did this come up somewhere else?

isuruf · 2023-11-15T18:28:52Z

I think this issue will pop up for all cuda libraries, but this is the first that I saw.

The issue came up with a local build of pytorch built with latest compilers from conda-forge.

Old $HOME/miniconda3/lib/libstdc++.so
New $HOME/miniconda3/envs/pytorch/lib/libstdc++.so.

Loading pytorch loads $HOME/miniconda3/envs/pytorch/lib/libnvToolsExt.so which has an RPATH of $ORIGIN/../../../lib and therefore loads $HOME/miniconda3/lib/libstdc++.so. Note that the loading order matters. If libstdc++.so is loaded first before libnvToolsExt.so, then everything works as expected.

msarahan · 2023-11-15T18:29:09Z

I think the fundamental issue here is that we're expecting users to use the library in /lib, while the actual library lives in a hierarchy that makes it easier for us (package maintainers/nvidia/whatever) to know we have the right library at a glance, and so that we match the package behavior relative to other deployments of NVIDIA stuff.

In ye olden days (especially observed with lib and lib64), I think Ray would have said to NOT use lib64, but to keep Conda "pure" in that it need not adopt these conventions that avoid conflicts that Conda does not have. One solution is to move the contents of /targets/x86_64-linux up two levels, and cut the targets/${arch} folders out. Do those folders really serve an important purpose to end users? How much would break if that change is made?

I discussed this with @jakirkham and we think this symlink reversal might make things more confusing in the future. Aside from reworking the package contents structure, here are some other ideas:

remove the symlink and replace it with a hard-link or copy (copy may be necessary). This seems like the safest, most likely to work option.
set build/binary_relocation to false to disable RPATH monkeying
manually specify RPATHS such that both $ORIGIN/../../lib and $ORIGIN/../../../lib are searched with https://docs.conda.io/projects/conda-build/en/stable/resources/define-metadata.html#rpaths

isuruf · 2023-11-15T18:42:28Z

Removing targets/x86_64-linux directory structure is not possible as that was done because some headers and libraries have fairly generic names without a cuda or nv prefix that might conflict with other headers.

Option 2 is not a great solution as libnvToolsExt.so wouldn't know where libstdc++.so is.
Option 3 is dangerous as that adds paths that are outside the environment.

My PR does:
4. Reverse the symlinks so that lib/libnvToolsExt.so is not a symlink and the file in targets/x86_64-linux/lib/libnvToolsExt.so is.

I prefer option 4 because targets/x86_64-linux/lib/ is only ever used for building and linking and RPATHs don't matter for those.

isuruf · 2023-11-15T18:46:20Z

Option 1 is of course an option, but it would increase the package size/

bdice · 2023-11-15T18:58:49Z

Option 2 is not a great solution as libnvToolsExt.so wouldn't know where libstdc++.so is.

Can you help me understand this problem in a bit more detail? I may be missing something. A CUDA Toolkit installed to the system (not with conda) doesn't have an RPATH specified for this library. This gives no output:

readelf -d /usr/local/cuda/targets/x86_64-linux/lib/libnvToolsExt.so | grep RPATH

That libnvToolsExt.so does have a NEEDED entry for libstdc++.so.6, like the package we're shipping here. I expected that we would want to ship conda CTK libraries the same way as other CTK distributions, with no RPATH, by setting build/binary_relocation to false.

isuruf · 2023-11-15T19:02:10Z

@bdice, that's because a system installation assumes that libstdc++.so comes from the default paths of the dynamic loader. If there's no RPATH, loading pytorch in my local install would load /usr/lib/libstdc++.so.6 instead of the one from the conda env leading to the same problem.

jakirkham · 2023-11-15T19:51:32Z

One thing we have been discussing is using stub libraries at build time instead ( conda-forge/cuda-feedstock#8 )

If we changed nvcc to search for the CTK stub libraries first, would that solve the issue?

isuruf · 2023-11-15T19:54:19Z

That would mean having the real library in $PREFIX/lib which is what PR #3 does anyway.

jakirkham · 2023-11-15T20:04:23Z

Maybe that issue description is a bit misleading. I think we are interested in the other changes there, but that may be further off

Atm am just meaning swapping the order of flags like these

So ${PREFIX}/${targetsDir}/lib/stubs is searched before ${PREFIX}/${targetsDir}/lib

isuruf · 2023-11-15T20:06:10Z

I think you are misunderstanding. This issue is not about building or linking. It's about runtime loading.

msarahan · 2023-11-15T20:11:45Z

It's about both, because the symlinks are conflating the build time usage with the runtime usage. Any approach that avoids the symlink should be fine, and conceptually cleaner.

Reversing the symlink is fine for making the runtime work, but I think it muddies up the model of where "real" content exists. I think it is important that arch-specific packages continue to contain the "real" stuff (or perhaps the stubs that @jakirkham mentions) in /targets/.../lib, but that a runtime package should contain native libraries in /lib.

Symlinks are something done to save space here, and perhaps avoid a separate -dev and runtime package. I don't think the symlink is a helpful thing in any case, whether or not you reverse it.

isuruf · 2023-11-15T20:30:59Z

Ok, so we all agree that

$PREFIX/lib/libnvTools.so.1 should resolve to $PREFIX/lib/libnvTools.so.1.0.0 which should be a real file.
We are not sure what $PREFIX/targets/x86_64-linux/lib/libnvTools.so should be.

Currently, 1 is not true for this feedstock and that's what I'm trying to fix.
For 2, yes, a stub library would be a good option to have, but not strictly necessary IMO.

msarahan · 2023-11-21T20:26:45Z

We've been discussing this extensively internally and we definitely think it's a big issue, as it could affect CUDA libraries other than NVTX. The team is working on a document that details possible solutions in the context of CUDA library packages that exist in other forms. We'll file an issue in https://github.com/conda-forge/cuda-feedstock with more info when we've dug in more.

jakirkham · 2023-11-30T22:20:43Z

After extensive internal discussion and evaluating the potential options, we have devised a planned course of action and have laid this out in detail in issue: conda-forge/cuda-feedstock#10

Happy to answer any questions that may come up

isuruf added the bug Something isn't working label Nov 14, 2023

isuruf mentioned this issue Nov 14, 2023

reverse symlinks #3

Open

5 tasks

jakirkham mentioned this issue Nov 30, 2023

Fix for CUDA Toolkit packages containing incorrect RPATH conda-forge/cuda-feedstock#10

Open

adibbley mentioned this issue Nov 30, 2023

Set RPATH to $ORIGIN #5

Merged

5 tasks

github-actions bot closed this as completed in #5 Dec 7, 2023

robertmaynard mentioned this issue Feb 22, 2024

Continued incorrect rpath fixups for libraries symlinked into $PREFIX/lib conda/conda-build#5198

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RPATHs are wrong. #2

RPATHs are wrong. #2

isuruf commented Nov 14, 2023 •

edited

Loading

jakirkham commented Nov 15, 2023

isuruf commented Nov 15, 2023

msarahan commented Nov 15, 2023 •

edited

Loading

isuruf commented Nov 15, 2023

isuruf commented Nov 15, 2023

bdice commented Nov 15, 2023

isuruf commented Nov 15, 2023

jakirkham commented Nov 15, 2023

isuruf commented Nov 15, 2023

jakirkham commented Nov 15, 2023

isuruf commented Nov 15, 2023

msarahan commented Nov 15, 2023

isuruf commented Nov 15, 2023

msarahan commented Nov 21, 2023

jakirkham commented Nov 30, 2023

RPATHs are wrong. #2

RPATHs are wrong. #2

Comments

isuruf commented Nov 14, 2023 • edited Loading

Solution to issue cannot be found in the documentation.

Issue

Installed packages

Environment info

jakirkham commented Nov 15, 2023

isuruf commented Nov 15, 2023

msarahan commented Nov 15, 2023 • edited Loading

isuruf commented Nov 15, 2023

isuruf commented Nov 15, 2023

bdice commented Nov 15, 2023

isuruf commented Nov 15, 2023

jakirkham commented Nov 15, 2023

isuruf commented Nov 15, 2023

jakirkham commented Nov 15, 2023

isuruf commented Nov 15, 2023

msarahan commented Nov 15, 2023

isuruf commented Nov 15, 2023

msarahan commented Nov 21, 2023

jakirkham commented Nov 30, 2023

isuruf commented Nov 14, 2023 •

edited

Loading

msarahan commented Nov 15, 2023 •

edited

Loading