-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA vector_add
sample project
#2160
Merged
Merged
Changes from all commits
Commits
Show all changes
48 commits
Select commit
Hold shift + click to select a range
6634ffb
add the CUDA vector addition sample
ericniebler 2d9f364
Remove unnecessary sample helpers
pciolkosz e872ca2
Merge remote-tracking branch 'origin/main' into cudax-samples
ericniebler 93a82ce
use a specific cuda architecture instead of `native`
ericniebler 59ea51d
use `cuda::launch` instead of launching the kernel directly
ericniebler e62220a
use thrust's host_ and device_vector types in the cudax sample for now
ericniebler adb634d
use a temporary `launch_ex` fn that applies an arg transform
ericniebler 29732bf
minor cleanup
ericniebler 6f13b40
Merge remote-tracking branch 'origin/main' into cudax-samples
ericniebler dbd7a68
use `__launch_transform` in the `vector_add` sample
ericniebler 9ed5532
mock up a cudax::vector and the in/out annotations
ericniebler 52e6c7e
a working example with vector, in/out, and launch
ericniebler 82db01d
insert a sync stream at the right place
ericniebler f587bc9
add missing include directory
ericniebler 4732a80
i do not like cmake
ericniebler c73f856
add missing header
ericniebler ef0b399
add explicit device selection
ericniebler 4d1ad50
try to fix msvc build break
ericniebler fad5e66
try again
ericniebler e817699
Merge remote-tracking branch 'origin/main' into cudax-samples
ericniebler 2500b30
cmake is evil
ericniebler 56247ca
once more with feeling
ericniebler 388d57c
again
ericniebler 1d74986
again
ericniebler 791d13c
ah, enable language CXX
ericniebler 1ff49c6
again
ericniebler fc27771
try c++ 20
ericniebler 2fc597d
better?
ericniebler 285333e
maybe this?
ericniebler 8a200a2
will it ever end?
ericniebler 600dde1
wassup?
ericniebler 0f2494e
work around msvc non-conformance
ericniebler 87e67f3
very close now i think
ericniebler b43b90b
use msvc with conforming preprocessor
ericniebler ab29482
cmake string strangeness
ericniebler 9850ef5
here i go again
ericniebler da3120d
try c++20
ericniebler 505545b
only require c++20 when using msvc
ericniebler ca9d544
Replace the mdspan concept emulation with libcu++ one
miscco 6ffa2ae
Fix formatting
miscco ac8e6d8
Fix issues with concept emulation
miscco 9a13c77
Try and work around issue with nvcc deduction failure
miscco e2e7354
Drop the whole macro
miscco 4be1ee9
drop more concept emulation
miscco bdbd29e
Fix one more issue with `is_always_strided`
miscco 457e0d9
Merge branch 'main' into pr/ericniebler/2160
miscco 6656965
Merge remote-tracking branch 'origin/main' into cudax-samples
ericniebler f9580c8
Merge branch 'cudax-samples' of github.com:ericniebler/cccl into cuda…
ericniebler File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
//===----------------------------------------------------------------------===// | ||
// | ||
// Part of CUDA Experimental in CUDA C++ Core Libraries, | ||
// under the Apache License v2.0 with LLVM Exceptions. | ||
// See https://llvm.org/LICENSE.txt for license information. | ||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
// SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. | ||
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
#ifndef _CUDAX__LAUNCH_PARAM_KIND | ||
#define _CUDAX__LAUNCH_PARAM_KIND | ||
|
||
#include <cuda/__cccl_config> | ||
|
||
#if defined(_CCCL_IMPLICIT_SYSTEM_HEADER_GCC) | ||
# pragma GCC system_header | ||
#elif defined(_CCCL_IMPLICIT_SYSTEM_HEADER_CLANG) | ||
# pragma clang system_header | ||
#elif defined(_CCCL_IMPLICIT_SYSTEM_HEADER_MSVC) | ||
# pragma system_header | ||
#endif // no system header | ||
|
||
#include <cuda/std/__type_traits/maybe_const.h> | ||
|
||
#include <cuda/experimental/__detail/utility.cuh> | ||
|
||
namespace cuda::experimental | ||
{ | ||
namespace detail | ||
{ | ||
enum class __param_kind : unsigned | ||
{ | ||
_in = 1, | ||
_out = 2, | ||
_inout = 3 | ||
}; | ||
|
||
_CCCL_NODISCARD _CCCL_HOST_DEVICE inline constexpr __param_kind operator&(__param_kind __a, __param_kind __b) noexcept | ||
{ | ||
return __param_kind(unsigned(__a) & unsigned(__b)); | ||
} | ||
|
||
template <typename _Ty, __param_kind _Kind> | ||
struct _CCCL_NODISCARD __box | ||
{ | ||
::cuda::std::__maybe_const<_Kind == __param_kind::_in, _Ty>& __val; | ||
}; | ||
|
||
struct __in_t | ||
{ | ||
template <class _Ty> | ||
__box<_Ty, __param_kind::_in> operator()(const _Ty& __v) const noexcept | ||
{ | ||
return {__v}; | ||
} | ||
}; | ||
|
||
struct __out_t | ||
{ | ||
template <class _Ty> | ||
__box<_Ty, __param_kind::_out> operator()(_Ty& __v) const noexcept | ||
{ | ||
return {__v}; | ||
} | ||
}; | ||
|
||
struct __inout_t | ||
{ | ||
template <class _Ty> | ||
__box<_Ty, __param_kind::_inout> operator()(_Ty& __v) const noexcept | ||
{ | ||
return {__v}; | ||
} | ||
}; | ||
|
||
} // namespace detail | ||
|
||
_CCCL_GLOBAL_CONSTANT detail::__in_t in{}; | ||
_CCCL_GLOBAL_CONSTANT detail::__out_t out{}; | ||
_CCCL_GLOBAL_CONSTANT detail::__inout_t inout{}; | ||
|
||
} // namespace cuda::experimental | ||
|
||
#endif // _CUDAX__LAUNCH_PARAM_KIND |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
# SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
cmake_minimum_required(VERSION 3.14 FATAL_ERROR) | ||
|
||
project(CUDAX_SAMPLES CUDA CXX) | ||
|
||
# This example uses the CMake Package Manager (CPM) to simplify fetching CCCL from GitHub | ||
# For more information, see https://github.com/cpm-cmake/CPM.cmake | ||
include(cmake/CPM.cmake) | ||
|
||
# We define these as variables so they can be overriden in CI to pull from a PR instead of CCCL `main` | ||
# In your project, these variables are unncessary and you can just use the values directly | ||
set(CCCL_REPOSITORY "nvidia/cccl" CACHE STRING "GitHub repository to fetch CCCL from") | ||
set(CCCL_TAG "main" CACHE STRING "Git tag/branch to fetch from CCCL repository") | ||
|
||
# This will automatically clone CCCL from GitHub and make the exported cmake targets available | ||
CPMAddPackage( | ||
NAME CCCL | ||
GITHUB_REPOSITORY ${CCCL_REPOSITORY} | ||
GIT_TAG ${CCCL_TAG} | ||
GIT_SHALLOW ON | ||
OPTIONS "CCCL_ENABLE_UNSTABLE ON" | ||
) | ||
|
||
# Default to building for the GPU on the current system | ||
if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES) | ||
set(CMAKE_CUDA_ARCHITECTURES 86) | ||
endif() | ||
|
||
# Creates a cmake executable target for the main program | ||
add_executable(vector_add vector_add/vector_add.cu) | ||
|
||
# "Links" the CCCL::cudax CMake target to the `vector_add` executable. This | ||
# configures everything needed to use CCCL's headers, including setting up | ||
# include paths, compiler flags, etc. | ||
target_link_libraries(vector_add | ||
PUBLIC | ||
CCCL::cudax | ||
CCCL::CCCL | ||
CCCL::Thrust | ||
CCCL::libcudacxx | ||
INTERFACE cudax.compiler_interface | ||
) | ||
|
||
# TODO: These are temporary until the main branch catches up with the latest changes | ||
target_compile_definitions(vector_add PUBLIC LIBCUDACXX_ENABLE_EXCEPTIONS) | ||
|
||
if ("MSVC" STREQUAL "${CMAKE_CXX_COMPILER_ID}") | ||
# mdspan on windows only works in C++20 mode | ||
target_compile_features(vector_add PUBLIC cxx_std_20) | ||
|
||
# cudax requires dim3 to be usable from a constexpr context, and the CUDART headers require | ||
# __cplusplus to be defined for this to work: | ||
target_compile_options(vector_add PRIVATE | ||
$<$<COMPILE_LANGUAGE:CXX>:/Zc:__cplusplus /Zc:preprocessor> | ||
$<$<COMPILE_LANG_AND_ID:CUDA,NVIDIA>:-Xcompiler=/Zc:__cplusplus -Xcompiler=/Zc:preprocessor> | ||
) | ||
endif() | ||
|
||
# This is only relevant for internal testing and not needed by end users. | ||
include(CTest) | ||
enable_testing() | ||
add_test(NAME vector_add COMMAND vector_add) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
set(CPM_DOWNLOAD_VERSION 0.38.1) | ||
|
||
if(CPM_SOURCE_CACHE) | ||
set(CPM_DOWNLOAD_LOCATION "${CPM_SOURCE_CACHE}/cpm/CPM_${CPM_DOWNLOAD_VERSION}.cmake") | ||
elseif(DEFINED ENV{CPM_SOURCE_CACHE}) | ||
set(CPM_DOWNLOAD_LOCATION "$ENV{CPM_SOURCE_CACHE}/cpm/CPM_${CPM_DOWNLOAD_VERSION}.cmake") | ||
else() | ||
set(CPM_DOWNLOAD_LOCATION "${CMAKE_BINARY_DIR}/cmake/CPM_${CPM_DOWNLOAD_VERSION}.cmake") | ||
endif() | ||
|
||
# Expand relative path. This is important if the provided path contains a tilde (~) | ||
get_filename_component(CPM_DOWNLOAD_LOCATION ${CPM_DOWNLOAD_LOCATION} ABSOLUTE) | ||
|
||
function(download_cpm) | ||
message(STATUS "Downloading CPM.cmake to ${CPM_DOWNLOAD_LOCATION}") | ||
file(DOWNLOAD | ||
https://github.com/cpm-cmake/CPM.cmake/releases/download/v${CPM_DOWNLOAD_VERSION}/CPM.cmake | ||
${CPM_DOWNLOAD_LOCATION} | ||
) | ||
endfunction() | ||
|
||
if(NOT (EXISTS ${CPM_DOWNLOAD_LOCATION})) | ||
download_cpm() | ||
else() | ||
# resume download if it previously failed | ||
file(READ ${CPM_DOWNLOAD_LOCATION} check) | ||
if("${check}" STREQUAL "") | ||
download_cpm() | ||
endif() | ||
unset(check) | ||
endif() | ||
|
||
include(${CPM_DOWNLOAD_LOCATION}) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
//===----------------------------------------------------------------------===// | ||
// | ||
// Part of CUDA Experimental in CUDA C++ Core Libraries, | ||
// under the Apache License v2.0 with LLVM Exceptions. | ||
// See https://llvm.org/LICENSE.txt for license information. | ||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
// SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. | ||
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
#ifndef _CUDAX__LAUNCH_PARAM_KIND | ||
#define _CUDAX__LAUNCH_PARAM_KIND | ||
|
||
#include <cuda/__cccl_config> | ||
|
||
#if defined(_CCCL_IMPLICIT_SYSTEM_HEADER_GCC) | ||
# pragma GCC system_header | ||
#elif defined(_CCCL_IMPLICIT_SYSTEM_HEADER_CLANG) | ||
# pragma clang system_header | ||
#elif defined(_CCCL_IMPLICIT_SYSTEM_HEADER_MSVC) | ||
# pragma system_header | ||
#endif // no system header | ||
|
||
#include <cuda/std/__type_traits/maybe_const.h> | ||
|
||
#include <cuda/experimental/__detail/utility.cuh> | ||
|
||
namespace cuda::experimental | ||
{ | ||
namespace detail | ||
{ | ||
enum class __param_kind : unsigned | ||
{ | ||
_in = 1, | ||
_out = 2, | ||
_inout = 3 | ||
}; | ||
|
||
_CCCL_NODISCARD _CCCL_HOST_DEVICE inline constexpr __param_kind operator&(__param_kind __a, __param_kind __b) noexcept | ||
{ | ||
return __param_kind(unsigned(__a) & unsigned(__b)); | ||
} | ||
|
||
template <typename _Ty, __param_kind _Kind> | ||
struct _CCCL_NODISCARD __box | ||
{ | ||
::cuda::std::__maybe_const<_Kind == __param_kind::_in, _Ty>& __val; | ||
}; | ||
|
||
struct __in_t | ||
{ | ||
template <class _Ty> | ||
__box<_Ty, __param_kind::_in> operator()(const _Ty& __v) const noexcept | ||
{ | ||
return {__v}; | ||
} | ||
}; | ||
|
||
struct __out_t | ||
{ | ||
template <class _Ty> | ||
__box<_Ty, __param_kind::_out> operator()(_Ty& __v) const noexcept | ||
{ | ||
return {__v}; | ||
} | ||
}; | ||
|
||
struct __inout_t | ||
{ | ||
template <class _Ty> | ||
__box<_Ty, __param_kind::_inout> operator()(_Ty& __v) const noexcept | ||
{ | ||
return {__v}; | ||
} | ||
}; | ||
|
||
} // namespace detail | ||
|
||
_CCCL_GLOBAL_CONSTANT detail::__in_t in{}; | ||
_CCCL_GLOBAL_CONSTANT detail::__out_t out{}; | ||
_CCCL_GLOBAL_CONSTANT detail::__inout_t inout{}; | ||
|
||
} // namespace cuda::experimental | ||
|
||
#endif // _CUDAX__LAUNCH_PARAM_KIND |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem isn't the architecture value, it's that the way this test is set up, it requires running on a GPU runner, but is ending up on a CPU runner.
@alliepiper can help you get it sorted.