Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release for CUDA 11.2 #5668

Closed
noashin opened this issue Feb 7, 2021 · 9 comments
Closed

Release for CUDA 11.2 #5668

noashin opened this issue Feb 7, 2021 · 9 comments

Comments

@noashin
Copy link

noashin commented Feb 7, 2021

The available CUDA version I have is 11.2 and I would like to install jax on it.
Currently there is now release for jaxlib-0.1.60+cuda112.
Is it possible to build a release that supports CUDA 11.2?

Is there another way to install jax with CUDA 11.2 other than running
pip install --upgrade jax jaxlib==0.1.60+cuda112 -f https://storage.googleapis.com/jax-releases/jax_releases.html

Thanks!

@fcossio
Copy link

fcossio commented Feb 8, 2021

I am trying to get it working with cuda11.2 as well. I tried to build from source, but I could not manage to do it.

OS: Ubuntu 20.04.1

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:08:53_PST_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0

cudnn version: 8.1.0.77

python version: 3.8.5

After following the instructions for building from source, I got the following error.

$ python build/build.py --enable_cuda

     _   _  __  __
    | | / \ \ \/ /
 _  | |/ _ \ \  /
| |_| / ___ \/  \
 \___/_/   \/_/\_\


Another command (pid=8235) is running.  Waiting for it to complete on the server...
Bazel binary path: ./bazel-3.1.0-linux-x86_64
Python binary path: /home/fer/miniconda3/envs/jax-gpu/bin/python
Python version: 3.8
MKL-DNN enabled: yes
-march=native: no
CUDA enabled: yes
CUDA compute capabilities: 3.5,5.2,6.0,6.1,7.0
ROCm enabled: no

Building XLA and installing it in the jaxlib source tree...
./bazel-3.1.0-linux-x86_64 run --verbose_failures=true --config=short_logs --config=mkl_open_source_only --config=cuda --define=xla_python_enable_gpu=true :build_wheel -- --output_path=/home/fer/jax/dist
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=0 --terminal_columns=80
INFO: Reading rc options for 'run' from /home/fer/jax/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'run' from /home/fer/jax/.bazelrc:
  Inherited 'build' options: --repo_env PYTHON_BIN_PATH=/home/fer/miniconda3/envs/jax-gpu/bin/python --action_env=PYENV_ROOT --python_path=/home/fer/miniconda3/envs/jax-gpu/bin/python --repo_env TF_NEED_CUDA=1 --action_env TF_CUDA_COMPUTE_CAPABILITIES=3.5,5.2,6.0,6.1,7.0 --repo_env TF_NEED_ROCM=0 --action_env TF_ROCM_AMDGPU_TARGETS=gfx803,gfx900,gfx906,gfx1010 --distinct_host_configuration=false -c opt --apple_platform_type=macos --macos_minimum_os=10.9 --announce_rc --define open_source_build=true --define=no_kafka_support=true --define=no_ignite_support=true --define=grpc_no_ares=true --spawn_strategy=standalone --strategy=Genrule=standalone --enable_platform_specific_config
INFO: Found applicable config definition build:short_logs in file /home/fer/jax/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:mkl_open_source_only in file /home/fer/jax/.bazelrc: --define=tensorflow_mkldnn_contraction_kernel=1
INFO: Found applicable config definition build:cuda in file /home/fer/jax/.bazelrc: --crosstool_top=@local_config_cuda//crosstool:toolchain --define=using_cuda=true --define=using_cuda_nvcc=true
INFO: Found applicable config definition build:linux in file /home/fer/jax/.bazelrc: --copt=-Wno-sign-compare --define=no_aws_support=true --define=no_gcp_support=true --define=no_hdfs_support=true --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --copt=-Wno-stringop-truncation
Loading: 
Loading: 0 packages loaded
DEBUG: /home/fer/.cache/bazel/_bazel_fer/762940ba9c9dd4b483569ce1647fce9c/external/bazel_tools/tools/cpp/lib_cc_configure.bzl:118:5: 
Auto-Configuration Warning: 'TMP' environment variable is not set, using 'C:\Windows\Temp' as default
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
Analyzing: target //build:build_wheel (0 packages loaded, 0 targets configured)
WARNING: Download from https://storage.googleapis.com/mirror.tensorflow.org/github.com/llvm/llvm-project/archive/e635feb15a91e6eeb77876031be2811e63d542f3.tar.gz failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException GET returned 404 Not Found
DEBUG: Rule 'io_bazel_rules_docker' indicated that a canonical reproducible form can be obtained by modifying arguments shallow_since = "1556410077 -0400"
DEBUG: Repository io_bazel_rules_docker instantiated at:
  no stack (--record_rule_instantiation_callstack not enabled)
Repository rule git_repository defined at:
  /home/fer/.cache/bazel/_bazel_fer/762940ba9c9dd4b483569ce1647fce9c/external/bazel_tools/tools/build_defs/repo/git.bzl:195:18: in <toplevel>
WARNING: Download from https://storage.googleapis.com/mirror.tensorflow.org/github.com/nvidia/nccl/archive/v2.8.3-1.tar.gz failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException GET returned 404 Not Found
WARNING: Download from https://gitlab.mpcdf.mpg.de/mtr/pocketfft/-/archive/53e9dd4d12f986207c96d97c5183f5a72239c76e/pocketfft-53e9dd4d12f986207c96d97c5183f5a72239c76e.tar.gz failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException GET returned 406 Not Acceptable
INFO: Analyzed target //build:build_wheel (14 packages loaded, 6829 targets configured).
INFO: Found 1 target...
[0 / 199] [Prepa] Expanding template build/build_wheel ... (3 actions, 0 running)
[1,613 / 2,315] Compiling external/com_google_protobuf/src/google/protobuf/descriptor.pb.cc; 3s local ... (8 actions, 7 running)
[1,830 / 2,315] Compiling jaxlib/cuda_prng_kernels.cu.cc; 5s local ... (8 actions, 7 running)
[1,942 / 2,315] Compiling external/com_google_protobuf/src/google/protobuf/extension_set_heavy.cc; 2s local ... (8 actions, 7 running)
[2,121 / 2,349] Compiling external/com_google_protobuf/src/google/protobuf/generated_message_table_driven_lite.cc; 5s local ... (8 actions, 7 running)
[2,285 / 2,363] Compiling external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_stub.cc; 8s local ... (8 actions, 7 running)
[2,645 / 2,877] Compiling external/org_tensorflow/tensorflow/core/framework/function.pb.cc; 4s local ... (8 actions, 7 running)
[3,199 / 3,375] Compiling external/org_tensorflow/tensorflow/core/framework/common_shape_fns.cc; 5s local ... (8 actions, 7 running)
[3,622 / 3,735] Compiling external/org_tensorflow/tensorflow/compiler/xla/service/hlo_instruction.cc; 19s local ... (8 actions, 7 running)
[3,789 / 3,878] Compiling external/org_tensorflow/tensorflow/core/util/batch_util.cc; 45s local ... (8 actions, 7 running)
[5,569 / 6,664] Compiling external/flatbuffers/src/idl_parser.cpp; 16s local ... (8 actions, 7 running)
[5,868 / 6,664] Compiling external/llvm-project/llvm/utils/TableGen/GlobalISelEmitter.cpp; 12s local ... (8 actions, 7 running)
[6,657 / 7,380] Compiling external/llvm-project/mlir/lib/IR/BuiltinDialect.cpp; 6s local ... (8 actions, 7 running)
[6,826 / 7,551] Compiling external/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp; 10s local ... (8 actions, 7 running)
[6,870 / 7,570] Compiling external/org_tensorflow/tensorflow/compiler/xla/service/hlo_evaluator.cc; 100s local ... (8 actions running)
[6,870 / 7,570] Compiling external/org_tensorflow/tensorflow/compiler/xla/service/hlo_evaluator.cc; 280s local ... (8 actions running)
[6,870 / 7,570] Compiling external/org_tensorflow/tensorflow/compiler/xla/service/hlo_evaluator.cc; 466s local ... (8 actions running)
[6,870 / 7,570] Compiling external/org_tensorflow/tensorflow/compiler/xla/service/hlo_evaluator.cc; 680s local ... (8 actions running)
[6,870 / 7,570] Compiling external/org_tensorflow/tensorflow/compiler/xla/service/hlo_evaluator.cc; 927s local ... (8 actions running)
[6,870 / 7,570] Compiling external/org_tensorflow/tensorflow/compiler/xla/service/hlo_evaluator.cc; 1210s local ... (8 actions running)
[6,870 / 7,570] Compiling external/org_tensorflow/tensorflow/compiler/xla/service/hlo_evaluator.cc; 1535s local ... (8 actions running)
[6,870 / 7,570] Compiling external/org_tensorflow/tensorflow/compiler/xla/service/hlo_evaluator.cc; 1915s local ... (8 actions running)
ERROR: /home/fer/.cache/bazel/_bazel_fer/762940ba9c9dd4b483569ce1647fce9c/external/org_tensorflow/tensorflow/compiler/xla/service/BUILD:266:1: C++ compilation of rule '@org_tensorflow//tensorflow/compiler/xla/service:hlo_evaluator' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command 
  (cd /home/fer/.cache/bazel/_bazel_fer/762940ba9c9dd4b483569ce1647fce9c/execroot/__main__ && \
  exec env - \
    LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64 \
    PATH=/usr/local/cuda-11.2/bin:/home/fer/miniconda3/envs/jax-gpu/bin:/home/fer/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin \
    PWD=/proc/self/cwd \
    TF_CUDA_COMPUTE_CAPABILITIES=3.5,5.2,6.0,6.1,7.0 \
    TF_ROCM_AMDGPU_TARGETS=gfx803,gfx900,gfx906,gfx1010 \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/k8-opt/bin/external/org_tensorflow/tensorflow/compiler/xla/service/_objs/hlo_evaluator/hlo_evaluator.pic.d '-frandom-seed=bazel-out/k8-opt/bin/external/org_tensorflow/tensorflow/compiler/xla/service/_objs/hlo_evaluator/hlo_evaluator.pic.o' -DTF_USE_SNAPPY -DHAVE_SYS_UIO_H -DTENSORFLOW_USE_CUSTOM_CONTRACTION_KERNEL -DTENSORFLOW_USE_MKLDNN_CONTRACTION_KERNEL -DEIGEN_MPL2_ONLY '-DEIGEN_MAX_ALIGN_BYTES=64' '-DEIGEN_HAS_TYPE_TRAITS=0' -D__CLANG_SUPPORT_DYN_ANNOTATION__ -iquote external/org_tensorflow -iquote bazel-out/k8-opt/bin/external/org_tensorflow -iquote external/com_google_absl -iquote bazel-out/k8-opt/bin/external/com_google_absl -iquote external/nsync -iquote bazel-out/k8-opt/bin/external/nsync -iquote external/eigen_archive -iquote bazel-out/k8-opt/bin/external/eigen_archive -iquote external/gif -iquote bazel-out/k8-opt/bin/external/gif -iquote external/libjpeg_turbo -iquote bazel-out/k8-opt/bin/external/libjpeg_turbo -iquote external/com_google_protobuf -iquote bazel-out/k8-opt/bin/external/com_google_protobuf -iquote external/zlib -iquote bazel-out/k8-opt/bin/external/zlib -iquote external/com_googlesource_code_re2 -iquote bazel-out/k8-opt/bin/external/com_googlesource_code_re2 -iquote external/farmhash_archive -iquote bazel-out/k8-opt/bin/external/farmhash_archive -iquote external/fft2d -iquote bazel-out/k8-opt/bin/external/fft2d -iquote external/highwayhash -iquote bazel-out/k8-opt/bin/external/highwayhash -iquote external/double_conversion -iquote bazel-out/k8-opt/bin/external/double_conversion -iquote external/snappy -iquote bazel-out/k8-opt/bin/external/snappy -iquote external/local_config_cuda -iquote bazel-out/k8-opt/bin/external/local_config_cuda -iquote external/local_config_tensorrt -iquote bazel-out/k8-opt/bin/external/local_config_tensorrt -iquote external/mkl_dnn -iquote bazel-out/k8-opt/bin/external/mkl_dnn -Ibazel-out/k8-opt/bin/external/local_config_cuda/cuda/_virtual_includes/cuda_headers_virtual -Ibazel-out/k8-opt/bin/external/local_config_tensorrt/_virtual_includes/tensorrt_headers -isystem external/nsync/public -isystem bazel-out/k8-opt/bin/external/nsync/public -isystem external/eigen_archive -isystem bazel-out/k8-opt/bin/external/eigen_archive -isystem external/gif -isystem bazel-out/k8-opt/bin/external/gif -isystem external/com_google_protobuf/src -isystem bazel-out/k8-opt/bin/external/com_google_protobuf/src -isystem external/zlib -isystem bazel-out/k8-opt/bin/external/zlib -isystem external/farmhash_archive/src -isystem bazel-out/k8-opt/bin/external/farmhash_archive/src -isystem external/double_conversion -isystem bazel-out/k8-opt/bin/external/double_conversion -isystem external/local_config_cuda/cuda -isystem bazel-out/k8-opt/bin/external/local_config_cuda/cuda -isystem external/local_config_cuda/cuda/cuda/include -isystem bazel-out/k8-opt/bin/external/local_config_cuda/cuda/cuda/include -isystem external/mkl_dnn/include -isystem bazel-out/k8-opt/bin/external/mkl_dnn/include -isystem external/mkl_dnn/src -isystem bazel-out/k8-opt/bin/external/mkl_dnn/src -isystem external/mkl_dnn/src/common -isystem bazel-out/k8-opt/bin/external/mkl_dnn/src/common -isystem external/mkl_dnn/src/cpu -isystem bazel-out/k8-opt/bin/external/mkl_dnn/src/cpu -isystem external/mkl_dnn/src/cpu/gemm -isystem bazel-out/k8-opt/bin/external/mkl_dnn/src/cpu/gemm -isystem external/mkl_dnn/src/cpu/xbyak -isystem bazel-out/k8-opt/bin/external/mkl_dnn/src/cpu/xbyak -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fPIC -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes -fno-canonical-system-headers -DNDEBUG -g0 -O2 -ffunction-sections -fdata-sections -Wno-sign-compare -Wno-stringop-truncation '-std=c++14' -c external/org_tensorflow/tensorflow/compiler/xla/service/hlo_evaluator.cc -o bazel-out/k8-opt/bin/external/org_tensorflow/tensorflow/compiler/xla/service/_objs/hlo_evaluator/hlo_evaluator.pic.o)
Execution platform: @local_execution_config_platform//:platform
In file included from external/org_tensorflow/tensorflow/compiler/xla/service/hlo_evaluator.cc:15:
external/org_tensorflow/tensorflow/compiler/xla/service/hlo_evaluator.h:77:3: warning: multi-line comment [-Wcomment]
   77 |   //            /       \
      |   ^
external/org_tensorflow/tensorflow/compiler/xla/service/hlo_evaluator.h:79:3: warning: multi-line comment [-Wcomment]
   79 |   //        /      \
      |   ^
In file included from external/org_tensorflow/tensorflow/compiler/xla/service/hlo_evaluator.cc:39:
external/org_tensorflow/tensorflow/compiler/xla/service/hlo_evaluator_typed_visitor.h: In member function ‘double xla::HloEvaluatorTypedVisitor<ReturnT, ElementwiseT>::GetAsDouble(const xla::Literal&, absl::lts_2020_02_25::Span<const long int>)’:
external/org_tensorflow/tensorflow/compiler/xla/service/hlo_evaluator_typed_visitor.h:149:3: warning: no return statement in function returning non-void [-Wreturn-type]
  149 |   }
      |   ^
gcc: fatal error: Killed signal terminated program cc1plus
compilation terminated.
Target //build:build_wheel failed to build
INFO: Elapsed time: 3087.628s, Critical Path: 2160.17s
INFO: 3380 processes: 3380 local.
FAILED: Build did NOT complete successfully
ERROR: Build failed. Not running target
FAILED: Build did NOT complete successfully
Traceback (most recent call last):
  File "build/build.py", line 506, in <module>
    main()
  File "build/build.py", line 501, in main
    shell(command)
  File "build/build.py", line 51, in shell
    output = subprocess.check_output(cmd)
  File "/home/fer/miniconda3/envs/jax-gpu/lib/python3.8/subprocess.py", line 411, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/home/fer/miniconda3/envs/jax-gpu/lib/python3.8/subprocess.py", line 512, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['./bazel-3.1.0-linux-x86_64', 'run', '--verbose_failures=true', '--config=short_logs', '--config=mkl_open_source_only', '--config=cuda', '--define=xla_python_enable_gpu=true', ':build_wheel', '--', '--output_path=/home/fer/jax/dist']' returned non-zero exit status 1.

@hawkinsp
Copy link
Collaborator

hawkinsp commented Feb 8, 2021

We can add CUDA 11.2 to our next wheel release, which will probably be in the next couple of weeks.

@fcossio That looks like your copy of gcc or clang crashed. Try a different (newer?) version.

@skye
Copy link
Collaborator

skye commented Feb 13, 2021

The just-released jaxlib 0.1.61 now includes CUDA 11.2 wheels!
pip install --upgrade jax jaxlib==0.1.61+cuda112 -f https://storage.googleapis.com/jax-releases/jax_releases.html

@skye skye closed this as completed Feb 13, 2021
@puraminy
Copy link

puraminy commented Oct 8, 2021

It says

ERROR: No matching distribution found for jaxlib==0.1.61+cuda112

@jakevdp
Copy link
Collaborator

jakevdp commented Oct 8, 2021

ERROR: No matching distribution found for jaxlib==0.1.61+cuda112

I suspect you are using a non-supported Python version such as 3.10 (see issue #8097) or a non-supported GPU architecture (see #2012), or a non-supported operating system such as Windows (#438) or MacOS ARM (#5501).

You can see exactly which wheels are currently available at this link: https://storage.googleapis.com/jax-releases/jax_releases.html

@puraminy
Copy link

puraminy commented Oct 8, 2021

I manually downloaded and installed it. However, I think the following command also works:

pip install --upgrade pip
pip install --upgrade "jax[cuda111]" -f https://storage.googleapis.com/jax-releases/jax_releases.html  # Note: wheels only available on linux.

@jakevdp
Copy link
Collaborator

jakevdp commented Oct 8, 2021

Great! I'm not sure what might have gone wrong initially (the command from #5668 (comment) should work on supported systems) but I'm glad you found an approach that worked for you.

@homerjed
Copy link

homerjed commented Aug 4, 2022

@fcossio did you attempt to update your gcc version? I have the same error!

@fcossio
Copy link

fcossio commented Aug 4, 2022

This is a very old issue for me and I don't remember well. I think I solved this issue by expanding the swap memory and then restarting the computer after the build was successful. Sorry that I have no more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants