-
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RPATHs are wrong. #2
Comments
Thanks Isuru! 🙏 Is this only an NVTX issue or did we see this elsewhere? Also is there some background/context about where this came up? For example is this related to PyTorch's NVTX search logic ( where we had some issues recently conda-forge/pytorch-cpu-feedstock#203 (comment) ) or did this come up somewhere else? |
I think this issue will pop up for all cuda libraries, but this is the first that I saw. The issue came up with a local build of pytorch built with latest compilers from conda-forge.
Loading pytorch loads |
I think the fundamental issue here is that we're expecting users to use the library in /lib, while the actual library lives in a hierarchy that makes it easier for us (package maintainers/nvidia/whatever) to know we have the right library at a glance, and so that we match the package behavior relative to other deployments of NVIDIA stuff. In ye olden days (especially observed with lib and lib64), I think Ray would have said to NOT use lib64, but to keep Conda "pure" in that it need not adopt these conventions that avoid conflicts that Conda does not have. One solution is to move the contents of /targets/x86_64-linux up two levels, and cut the targets/${arch} folders out. Do those folders really serve an important purpose to end users? How much would break if that change is made? I discussed this with @jakirkham and we think this symlink reversal might make things more confusing in the future. Aside from reworking the package contents structure, here are some other ideas:
|
Removing Option 2 is not a great solution as My PR does: I prefer option 4 because |
Option 1 is of course an option, but it would increase the package size/ |
Can you help me understand this problem in a bit more detail? I may be missing something. A CUDA Toolkit installed to the system (not with conda) doesn't have an RPATH specified for this library. This gives no output:
That |
@bdice, that's because a system installation assumes that |
One thing we have been discussing is using stub libraries at build time instead ( conda-forge/cuda-feedstock#8 ) If we changed |
That would mean having the real library in |
Maybe that issue description is a bit misleading. I think we are interested in the other changes there, but that may be further off Atm am just meaning swapping the order of flags like these So |
I think you are misunderstanding. This issue is not about building or linking. It's about runtime loading. |
It's about both, because the symlinks are conflating the build time usage with the runtime usage. Any approach that avoids the symlink should be fine, and conceptually cleaner. Reversing the symlink is fine for making the runtime work, but I think it muddies up the model of where "real" content exists. I think it is important that arch-specific packages continue to contain the "real" stuff (or perhaps the stubs that @jakirkham mentions) in /targets/.../lib, but that a runtime package should contain native libraries in /lib. Symlinks are something done to save space here, and perhaps avoid a separate |
Ok, so we all agree that
Currently, |
We've been discussing this extensively internally and we definitely think it's a big issue, as it could affect CUDA libraries other than NVTX. The team is working on a document that details possible solutions in the context of CUDA library packages that exist in other forms. We'll file an issue in https://github.com/conda-forge/cuda-feedstock with more info when we've dug in more. |
After extensive internal discussion and evaluating the potential options, we have devised a planned course of action and have laid this out in detail in issue: conda-forge/cuda-feedstock#10 Happy to answer any questions that may come up |
Solution to issue cannot be found in the documentation.
Issue
Since the actual library is in
$PREFIX/targets/x86_64-linux/lib
the RPATH is set to$ORIGIN/../../../lib
.A fix for this is to reverse the links, i.e. make the actual library
$PREFIX/lib/libnvToolsExt.so.1.0.0
and make$PREFIX/targets/x86_64-linux/lib/libnvToolsExt.so.1.0.0
to be a symlink.Having the wrong RPATH will lead to loading wrong libraries.
cc @peterbell10
Installed packages
Environment info
The text was updated successfully, but these errors were encountered: