-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build for GPU fails due to nccl error #16711
Comments
Update: Managed to continue the compilation by running all the docker commands with sudo. I suppose rootless docker runs could be possible with correct configuring of the nvidia container toolkit (but for me this failed) https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html. But still I get an error at the later stages of compilation:
|
Compiling/running that test should not be a blocker. I would suggest to skip it in your compile command with --build_tag_filters So as far as I can tell, this test is supposed to work on these architectures: https://github.com/openxla/xla/blob/main/xla/tests/local_client_aot_test_helper.cc#L66 |
@Tixxx, do you run into this, too? |
can you try running configure.py with the nccl option? |
@Tixxx still fails with the same error eventhough I run with the --nccl |
I have the same error... |
Maybe you can try this:
|
This command should work too: |
I'm trying to build the XLA for GPU according to this guide: https://openxla.org/xla/developer_guide. Configuration goes just fine:
But then when I try:
So it seems that building the nccl library fails. If I try:
I get a bit more verbose output. I've seen couple related unsolved issues #11604 and #10616, but nothing in these has worked for me yet.
The text was updated successfully, but these errors were encountered: