Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU build fails #10555

Closed
sbushmanov opened this issue Jul 6, 2024 · 3 comments
Closed

GPU build fails #10555

sbushmanov opened this issue Jul 6, 2024 · 3 comments

Comments

@sbushmanov
Copy link

sbushmanov commented Jul 6, 2024

Ubuntu 22.04
Python 3.11 (anaconda distibution)
NVIDIA-SMI 555.42.06 Driver Version: 555.42.06 CUDA Version: 12.5
cmake version 3.29.6

The build with the following command fails:

cmake .. -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc -DUSE_CUDA=ON
[ 76%] Building CUDA object src/CMakeFiles/objxgboost.dir/context.cu.o
[ 76%] Building CUDA object src/CMakeFiles/objxgboost.dir/data/array_interface.cu.o
/usr/local/cuda-12.5/targets/x86_64-linux/include/cuda/std/detail/libcxx/include/__utility/pair.h(577): error: no operator "==" matches these operands
            operand types are: const xgboost::common::WQSummary<float, float>::Entry == const xgboost::common::WQSummary<float, float>::Entry
    return __x.first == __y.first && __x.second == __y.second;
                                                ^
/usr/local/cuda/targets/x86_64-linux/include/cuda/std/detail/libcxx/include/__iterator/wrap_iter.h(212): note #3327-D: candidate function template "cuda::std::__4::operator==(const cuda::std::__4::__wrap_iter<_Iter1> &, const cuda::std::__4::__wrap_iter<_Iter2> &) noexcept" failed deduction
  operator==(const __wrap_iter<_Iter1>& __x, const __wrap_iter<_Iter2>& __y) noexcept
  ^
/usr/local/cuda/targets/x86_64-linux/include/cuda/std/detail/libcxx/include/__iterator/wrap_iter.h(205): note #3327-D: candidate function template "cuda::std::__4::operator==(const cuda::std::__4::__wrap_iter<_Iter1> &, const cuda::std::__4::__wrap_iter<_Iter1> &) noexcept" failed deduction
  operator==(const __wrap_iter<_Iter1>& __x, const __wrap_iter<_Iter1>& __y) noexcept
  ^
/usr/local/cuda/targets/x86_64-linux/include/cuda/std/detail/libcxx/include/__iterator/move_iterator.h(423): note #3327-D: candidate function template "cuda::std::__4::operator==(const cuda::std::__4::move_iterator<_Iter1> &, const cuda::std::__4::move_iterator<_Iter2> &)" failed deduction
  operator==(const move_iterator<_Iter1>& __x, const move_iterator<_Iter2>& __y)
  ^
/usr/local/cuda/targets/x86_64-linux/include/cuda/std/detail/libcxx/include/__iterator/istreambuf_iterator.h(124): note #3327-D: candidate function template "cuda::std::__4::operator==(const cuda::std::__4::istreambuf_iterator<_CharT, _Traits> &, const cuda::std::__4::istreambuf_iterator<_CharT, _Traits> &)" failed deduction
  bool operator==(const istreambuf_iterator<_CharT,_Traits>& __a,
       ^
/usr/local/cuda/targets/x86_64-linux/include/cuda/std/detail/libcxx/include/__iterator/istream_iterator.h(105): note #3327-D: candidate function template "cuda::std::__4::operator==(const cuda::std::__4::istream_iterator<_Tp, _CharT, _Traits, _Distance> &, const cuda::std::__4::istream_iterator<_Tp, _CharT, _Traits, _Distance> &)" failed deduction
  operator==(const istream_iterator<_Tp, _CharT, _Traits, _Distance>& __x,
  ^
/usr/local/cuda-12.5/targets/x86_64-linux/include/cuda/std/detail/libcxx/include/__iterator/reverse_iterator.h(307): note #3327-D: candidate function template "cuda::std::__4::operator==(const cuda::std::__4::reverse_iterator<_Iter1> &, const cuda::std::__4::reverse_iterator<_Iter2> &)" failed deduction
  operator==(const reverse_iterator<_Iter1>& __x, const reverse_iterator<_Iter2>& __y)
  ^
/usr/local/cuda-12.5/targets/x86_64-linux/include/cuda/std/detail/libcxx/include/tuple(1146): note #3327-D: candidate function template "cuda::std::__4::operator==(const cuda::std::__4::tuple<_Tp...> &, const cuda::std::__4::tuple<_Up...> &)" failed deduction
  operator==(const tuple<_Tp...> &__x, const tuple<_Up...> &__y) {
  ^
/usr/local/cuda-12.5/targets/x86_64-linux/include/cuda/std/detail/libcxx/include/__utility/pair.h(575): note #3327-D: candidate function template "cuda::std::__4::operator==(const cuda::std::__4::pair<_T1, _T2> &, const cuda::std::__4::pair<_T1, _T2> &)" failed deduction
  operator==(const pair<_T1, _T2>& __x, const pair<_T1, _T2>& __y)
  ^
/home/sergey/xgboost/include/xgboost/span.h(617): note #3327-D: candidate function template "xgboost::common::operator==" failed deduction
                bool operator==(Span<T, X> l, Span<U, Y> r) {
                     ^
/usr/local/cuda-12.5/targets/x86_64-linux/include/cuda/std/detail/libcxx/include/__utility/pair.h(577): note #3328-D: built-in operator==(<promoted arithmetic>, <promoted arithmetic>) does not match because argument #1 does not match parameter
    return __x.first == __y.first && __x.second == __y.second;
                                                ^
/usr/local/cuda-12.5/targets/x86_64-linux/include/cuda/std/detail/libcxx/include/__utility/pair.h(577): note #3328-D: built-in operator==(<nullptr>, <nullptr>) does not match because argument #1 does not match parameter
    return __x.first == __y.first && __x.second == __y.second;
                                                ^
          detected during:
            instantiation of "__nv_bool cuda::std::__4::operator==(const cuda::std::__4::pair<_T1, _T2> &, const cuda::std::__4::pair<_T1, _T2> &) [with _T1=size_t, _T2=xgboost::common::WQSummary<float, float>::Entry]" at line 369 of /usr/local/cuda/targets/x86_64-linux/include/cuda/std/detail/libcxx/include/__functional/operations.h
            instantiation of "auto cuda::std::__4::equal_to<void>::operator()(_T1 &&, _T2 &&) const->decltype((<expression>)) [with _T1=const cuda::std::__4::pair<size_t, xgboost::common::WQSummary<float, float>::Entry> &, _T2=const cuda::std::__4::pair<size_t, xgboost::common::WQSummary<float, float>::Entry> &]" at line 77 of /usr/local/cuda/targets/x86_64-linux/include/cub/thread/thread_operators.cuh
            instantiation of "__nv_bool cub::CUB_200400_860_NS::InequalityWrapper<EqualityOp>::operator()(T &&, U &&) [with EqualityOp=cub::CUB_200400_860_NS::Equality, T=const cuda::std::__4::pair<size_t, xgboost::common::WQSummary<float, float>::Entry> &, U=const cuda::std::__4::pair<size_t, xgboost::common::WQSummary<float, float>::Entry> &]" at line 176 of /usr/local/cuda/targets/x86_64-linux/include/cub/block/block_discontinuity.cuh
            instantiation of "__nv_bool cub::CUB_200400_860_NS::BlockDiscontinuity<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyOp<FlagOp, false>::FlagT(FlagOp, const T &, const T &, int) [with T=cuda::std::__4::pair<size_t, xgboost::common::WQSummary<float, float>::Entry>, BLOCK_DIM_X=64, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FlagOp=cub::CUB_200400_860_NS::InequalityWrapper<cub::CUB_200400_860_NS::Equality>]" at line 326 of /usr/local/cuda/targets/x86_64-linux/include/cub/block/block_discontinuity.cuh
            instantiation of "void cub::CUB_200400_860_NS::BlockDiscontinuity<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::FlagHeads(FlagT (&)[ITEMS_PER_THREAD], T (&)[ITEMS_PER_THREAD], T (&)[ITEMS_PER_THREAD], FlagOp) [with T=cuda::std::__4::pair<size_t, xgboost::common::WQSummary<float, float>::Entry>, BLOCK_DIM_X=64, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, ITEMS_PER_THREAD=1, FlagT=uint32_t, FlagOp=cub::CUB_200400_860_NS::InequalityWrapper<cub::CUB_200400_860_NS::Equality>]" at line 447 of /usr/local/cuda/targets/x86_64-linux/include/cub/block/block_discontinuity.cuh
            [ 15 instantiation contexts not shown ]
            instantiation of "thrust::THRUST_200400_860_NS::pair<KeyOutputIt, ValOutputIt> thrust::THRUST_200400_860_NS::cuda_cub::detail::unique_by_key(thrust::THRUST_200400_860_NS::cuda_cub::execution_policy<Derived> &, KeyInputIt, KeyInputIt, ValInputIt, KeyOutputIt, ValOutputIt, BinaryPred) [with Derived=thrust::THRUST_200400_860_NS::detail::execute_with_allocator<dh::detail::XGBCachingDeviceAllocatorImpl<char> &, thrust::THRUST_200400_860_NS::cuda_cub::execute_on_stream_base>, KeyInputIt=thrust::THRUST_200400_860_NS::transform_iterator<lambda [](size_t)->cuda::std::__4::pair<size_t, xgboost::common::WQSummary<float, float>::Entry>, thrust::THRUST_200400_860_NS::counting_iterator<unsigned long, thrust::THRUST_200400_860_NS::use_default, thrust::THRUST_200400_860_NS::use_default, thrust::THRUST_200400_860_NS::use_default>, cuda::std::__4::pair<size_t, xgboost::common::WQSummary<float, float>::Entry>, thrust::THRUST_200400_860_NS::use_default>, ValInputIt=xgboost::common::SketchEntry *, KeyOutputIt=thrust::THRUST_200400_860_NS::transform_output_iterator<dh::detail::SegmentedUniqueReduceOp<cuda::std::__4::pair<size_t, xgboost::common::WQSummary<float, float>::Entry>, xgboost::bst_idx_t *>, thrust::THRUST_200400_860_NS::discard_iterator<thrust::THRUST_200400_860_NS::use_default>>, ValOutputIt=xgboost::common::SketchEntry *, BinaryPred=lambda [](const Key &, const Key &)->__nv_bool]" at line 258 of /usr/local/cuda/targets/x86_64-linux/include/thrust/system/cuda/detail/unique_by_key.h
            instantiation of "thrust::THRUST_200400_860_NS::pair<KeyOutputIt, ValOutputIt> thrust::THRUST_200400_860_NS::cuda_cub::unique_by_key_copy(thrust::THRUST_200400_860_NS::cuda_cub::execution_policy<Derived> &, KeyInputIt, KeyInputIt, ValInputIt, KeyOutputIt, ValOutputIt, BinaryPred) [with Derived=thrust::THRUST_200400_860_NS::detail::execute_with_allocator<dh::detail::XGBCachingDeviceAllocatorImpl<char> &, thrust::THRUST_200400_860_NS::cuda_cub::execute_on_stream_base>, KeyInputIt=thrust::THRUST_200400_860_NS::transform_iterator<lambda [](size_t)->cuda::std::__4::pair<size_t, xgboost::common::WQSummary<float, float>::Entry>, thrust::THRUST_200400_860_NS::counting_iterator<unsigned long, thrust::THRUST_200400_860_NS::use_default, thrust::THRUST_200400_860_NS::use_default, thrust::THRUST_200400_860_NS::use_default>, cuda::std::__4::pair<size_t, xgboost::common::WQSummary<float, float>::Entry>, thrust::THRUST_200400_860_NS::use_default>, ValInputIt=xgboost::common::SketchEntry *, KeyOutputIt=thrust::THRUST_200400_860_NS::transform_output_iterator<dh::detail::SegmentedUniqueReduceOp<cuda::std::__4::pair<size_t, xgboost::common::WQSummary<float, float>::Entry>, xgboost::bst_idx_t *>, thrust::THRUST_200400_860_NS::discard_iterator<thrust::THRUST_200400_860_NS::use_default>>, ValOutputIt=xgboost::common::SketchEntry *, BinaryPred=lambda [](const Key &, const Key &)->__nv_bool]" at line 171 of /usr/local/cuda/targets/x86_64-linux/include/thrust/detail/unique.inl
            instantiation of "thrust::THRUST_200400_860_NS::pair<OutputIterator1, OutputIterator2> thrust::THRUST_200400_860_NS::unique_by_key_copy(const thrust::THRUST_200400_860_NS::detail::execution_policy_base<DerivedPolicy> &, InputIterator1, InputIterator1, InputIterator2, OutputIterator1, OutputIterator2, BinaryPredicate) [with DerivedPolicy=thrust::THRUST_200400_860_NS::detail::execute_with_allocator<dh::detail::XGBCachingDeviceAllocatorImpl<char> &, thrust::THRUST_200400_860_NS::cuda_cub::execute_on_stream_base>, InputIterator1=thrust::THRUST_200400_860_NS::transform_iterator<lambda [](size_t)->cuda::std::__4::pair<size_t, xgboost::common::WQSummary<float, float>::Entry>, thrust::THRUST_200400_860_NS::counting_iterator<unsigned long, thrust::THRUST_200400_860_NS::use_default, thrust::THRUST_200400_860_NS::use_default, thrust::THRUST_200400_860_NS::use_default>, cuda::std::__4::pair<size_t, xgboost::common::WQSummary<float, float>::Entry>, thrust::THRUST_200400_860_NS::use_default>, InputIterator2=xgboost::common::SketchEntry *, OutputIterator1=thrust::THRUST_200400_860_NS::transform_output_iterator<dh::detail::SegmentedUniqueReduceOp<cuda::std::__4::pair<size_t, xgboost::common::WQSummary<float, float>::Entry>, xgboost::bst_idx_t *>, thrust::THRUST_200400_860_NS::discard_iterator<thrust::THRUST_200400_860_NS::use_default>>, OutputIterator2=xgboost::common::SketchEntry *, BinaryPredicate=lambda [](const Key &, const Key &)->__nv_bool]" at line 897 of /home/sergey/xgboost/src/common/../collective/../data/../common/device_helpers.cuh
            instantiation of "size_t dh::SegmentedUnique(const thrust::THRUST_200400_860_NS::detail::execution_policy_base<DerivedPolicy> &, KeyInIt, KeyInIt, ValInIt, ValInIt, KeyOutIt, ValOutIt, CompValue, CompKey) [with DerivedPolicy=thrust::THRUST_200400_860_NS::detail::execute_with_allocator<dh::detail::XGBCachingDeviceAllocatorImpl<char> &, thrust::THRUST_200400_860_NS::cuda_cub::execute_on_stream_base>, KeyInIt=xgboost::bst_idx_t *, KeyOutIt=xgboost::bst_idx_t *, ValInIt=xgboost::common::SketchEntry *, ValOutIt=xgboost::common::SketchEntry *, CompValue=xgboost::common::detail::SketchUnique, CompKey=thrust::THRUST_200400_860_NS::equal_to<size_t>]" at line 912 of /home/sergey/xgboost/src/common/../collective/../data/../common/device_helpers.cuh
            instantiation of "size_t dh::SegmentedUnique(Inputs &&...) [with Inputs=<xgboost::bst_idx_t *, xgboost::bst_idx_t *, xgboost::common::SketchEntry *, xgboost::common::SketchEntry *, xgboost::bst_idx_t *, xgboost::common::SketchEntry *, xgboost::common::detail::SketchUnique>, <unnamed>=(void *)nullptr]" at line 396 of /home/sergey/xgboost/src/common/quantile.cu

1 error detected in the compilation of "/home/sergey/xgboost/src/common/quantile.cu".
make[2]: *** [src/CMakeFiles/objxgboost.dir/build.make:1470: src/CMakeFiles/objxgboost.dir/common/quantile.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [CMakeFiles/Makefile2:260: src/CMakeFiles/objxgboost.dir/all] Error 2
make: *** [Makefile:156: all] Error 2

Any suggestions are appreciated!

@trivialfis
Copy link
Member

We haven't tested it with CUDA toolkit 12.5 yet. I use 12.4 for daily development, and the CI is not using the latest either, I will try 12.5 next week. Thank you for opening the issue.

@trivialfis
Copy link
Member

Opened an issue in cccl NVIDIA/cccl#1956 .

@trivialfis
Copy link
Member

Closing as fixed in CCCL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants