-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA vector_add
sample project
#2160
Merged
Merged
Changes from 11 commits
Commits
Show all changes
48 commits
Select commit
Hold shift + click to select a range
6634ffb
add the CUDA vector addition sample
ericniebler 2d9f364
Remove unnecessary sample helpers
pciolkosz e872ca2
Merge remote-tracking branch 'origin/main' into cudax-samples
ericniebler 93a82ce
use a specific cuda architecture instead of `native`
ericniebler 59ea51d
use `cuda::launch` instead of launching the kernel directly
ericniebler e62220a
use thrust's host_ and device_vector types in the cudax sample for now
ericniebler adb634d
use a temporary `launch_ex` fn that applies an arg transform
ericniebler 29732bf
minor cleanup
ericniebler 6f13b40
Merge remote-tracking branch 'origin/main' into cudax-samples
ericniebler dbd7a68
use `__launch_transform` in the `vector_add` sample
ericniebler 9ed5532
mock up a cudax::vector and the in/out annotations
ericniebler 52e6c7e
a working example with vector, in/out, and launch
ericniebler 82db01d
insert a sync stream at the right place
ericniebler f587bc9
add missing include directory
ericniebler 4732a80
i do not like cmake
ericniebler c73f856
add missing header
ericniebler ef0b399
add explicit device selection
ericniebler 4d1ad50
try to fix msvc build break
ericniebler fad5e66
try again
ericniebler e817699
Merge remote-tracking branch 'origin/main' into cudax-samples
ericniebler 2500b30
cmake is evil
ericniebler 56247ca
once more with feeling
ericniebler 388d57c
again
ericniebler 1d74986
again
ericniebler 791d13c
ah, enable language CXX
ericniebler 1ff49c6
again
ericniebler fc27771
try c++ 20
ericniebler 2fc597d
better?
ericniebler 285333e
maybe this?
ericniebler 8a200a2
will it ever end?
ericniebler 600dde1
wassup?
ericniebler 0f2494e
work around msvc non-conformance
ericniebler 87e67f3
very close now i think
ericniebler b43b90b
use msvc with conforming preprocessor
ericniebler ab29482
cmake string strangeness
ericniebler 9850ef5
here i go again
ericniebler da3120d
try c++20
ericniebler 505545b
only require c++20 when using msvc
ericniebler ca9d544
Replace the mdspan concept emulation with libcu++ one
miscco 6ffa2ae
Fix formatting
miscco ac8e6d8
Fix issues with concept emulation
miscco 9a13c77
Try and work around issue with nvcc deduction failure
miscco e2e7354
Drop the whole macro
miscco 4be1ee9
drop more concept emulation
miscco bdbd29e
Fix one more issue with `is_always_strided`
miscco 457e0d9
Merge branch 'main' into pr/ericniebler/2160
miscco 6656965
Merge remote-tracking branch 'origin/main' into cudax-samples
ericniebler f9580c8
Merge branch 'cudax-samples' of github.com:ericniebler/cccl into cuda…
ericniebler File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
# SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
cmake_minimum_required(VERSION 3.14 FATAL_ERROR) | ||
|
||
project(CUDAX_SAMPLES CUDA) | ||
|
||
# This example uses the CMake Package Manager (CPM) to simplify fetching CCCL from GitHub | ||
# For more information, see https://github.com/cpm-cmake/CPM.cmake | ||
include(cmake/CPM.cmake) | ||
|
||
# We define these as variables so they can be overriden in CI to pull from a PR instead of CCCL `main` | ||
# In your project, these variables are unncessary and you can just use the values directly | ||
set(CCCL_REPOSITORY "nvidia/cccl" CACHE STRING "GitHub repository to fetch CCCL from") | ||
set(CCCL_TAG "main" CACHE STRING "Git tag/branch to fetch from CCCL repository") | ||
|
||
# This will automatically clone CCCL from GitHub and make the exported cmake targets available | ||
CPMAddPackage( | ||
NAME CCCL | ||
GITHUB_REPOSITORY ${CCCL_REPOSITORY} | ||
GIT_TAG ${CCCL_TAG} | ||
) | ||
|
||
# Default to building for the GPU on the current system | ||
if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES) | ||
set(CMAKE_CUDA_ARCHITECTURES 86) | ||
endif() | ||
|
||
# Creates a cmake executable target for the main program | ||
add_executable(vector_add vector_add/vector_add.cu) | ||
set_property(TARGET vector_add PROPERTY CXX_STANDARD 17) | ||
target_include_directories(vector_add PRIVATE ${CMAKE_SOURCE_DIR}/../include) | ||
|
||
# "Links" the CCCL Cmake target to the `vector_add` executable. This configures everything needed to use | ||
# CCCL headers, including setting up include paths, compiler flags, etc. | ||
target_link_libraries(vector_add PRIVATE CCCL::CCCL) | ||
|
||
# This is only relevant for internal testing and not needed by end users. | ||
include(CTest) | ||
enable_testing() | ||
add_test(NAME vector_add COMMAND vector_add) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
set(CPM_DOWNLOAD_VERSION 0.38.1) | ||
|
||
if(CPM_SOURCE_CACHE) | ||
set(CPM_DOWNLOAD_LOCATION "${CPM_SOURCE_CACHE}/cpm/CPM_${CPM_DOWNLOAD_VERSION}.cmake") | ||
elseif(DEFINED ENV{CPM_SOURCE_CACHE}) | ||
set(CPM_DOWNLOAD_LOCATION "$ENV{CPM_SOURCE_CACHE}/cpm/CPM_${CPM_DOWNLOAD_VERSION}.cmake") | ||
else() | ||
set(CPM_DOWNLOAD_LOCATION "${CMAKE_BINARY_DIR}/cmake/CPM_${CPM_DOWNLOAD_VERSION}.cmake") | ||
endif() | ||
|
||
# Expand relative path. This is important if the provided path contains a tilde (~) | ||
get_filename_component(CPM_DOWNLOAD_LOCATION ${CPM_DOWNLOAD_LOCATION} ABSOLUTE) | ||
|
||
function(download_cpm) | ||
message(STATUS "Downloading CPM.cmake to ${CPM_DOWNLOAD_LOCATION}") | ||
file(DOWNLOAD | ||
https://github.com/cpm-cmake/CPM.cmake/releases/download/v${CPM_DOWNLOAD_VERSION}/CPM.cmake | ||
${CPM_DOWNLOAD_LOCATION} | ||
) | ||
endfunction() | ||
|
||
if(NOT (EXISTS ${CPM_DOWNLOAD_LOCATION})) | ||
download_cpm() | ||
else() | ||
# resume download if it previously failed | ||
file(READ ${CPM_DOWNLOAD_LOCATION} check) | ||
if("${check}" STREQUAL "") | ||
download_cpm() | ||
endif() | ||
unset(check) | ||
endif() | ||
|
||
include(${CPM_DOWNLOAD_LOCATION}) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,185 @@ | ||
//===----------------------------------------------------------------------===// | ||
// | ||
// Part of CUDA Experimental in CUDA C++ Core Libraries, | ||
// under the Apache License v2.0 with LLVM Exceptions. | ||
// See https://llvm.org/LICENSE.txt for license information. | ||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
// SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. | ||
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
#ifndef _CUDAX__CONTAINER_VECTOR | ||
#define _CUDAX__CONTAINER_VECTOR | ||
|
||
#include <cuda/__cccl_config> | ||
|
||
#if defined(_CCCL_IMPLICIT_SYSTEM_HEADER_GCC) | ||
# pragma GCC system_header | ||
#elif defined(_CCCL_IMPLICIT_SYSTEM_HEADER_CLANG) | ||
# pragma clang system_header | ||
#elif defined(_CCCL_IMPLICIT_SYSTEM_HEADER_MSVC) | ||
# pragma system_header | ||
#endif // no system header | ||
|
||
#include <thrust/device_vector.h> | ||
#include <thrust/host_vector.h> | ||
|
||
#include <cuda/std/__type_traits/maybe_const.h> | ||
#include <cuda/std/span> | ||
#include <cuda/stream_ref> | ||
|
||
#include <cuda/experimental/__detail/utility.cuh> | ||
|
||
#if 1 //_CCCL_STD_VER >= 2017 | ||
namespace cuda::experimental | ||
{ | ||
using ::cuda::std::span; | ||
using ::thrust::device_vector; | ||
using ::thrust::host_vector; | ||
|
||
namespace detail | ||
{ | ||
template <typename _Ty> | ||
struct __in_box | ||
{ | ||
const _Ty& __val; | ||
}; | ||
|
||
template <typename _Ty> | ||
struct __out_box | ||
{ | ||
_Ty& __val; | ||
}; | ||
} // namespace detail | ||
|
||
template <typename _Ty> | ||
class vector | ||
{ | ||
public: | ||
vector() = default; | ||
explicit vector(size_t n) | ||
: __h_(n) | ||
{} | ||
|
||
_Ty& operator[](size_t i) noexcept | ||
{ | ||
__dirty_ = true; | ||
return __h_[i]; | ||
} | ||
|
||
const _Ty& operator[](size_t i) const noexcept | ||
{ | ||
return __h_[i]; | ||
} | ||
|
||
private: | ||
enum class __param : unsigned | ||
{ | ||
_in = 1, | ||
_out = 2, | ||
_inout = 3 | ||
}; | ||
|
||
_CCCL_NODISCARD_FRIEND _CCCL_HOST_DEVICE constexpr __param operator&(__param __a, __param __b) noexcept | ||
{ | ||
return __param(unsigned(__a) & unsigned(__b)); | ||
} | ||
|
||
void sync_host_to_device() const | ||
{ | ||
if (__dirty_) | ||
{ | ||
printf("sync_host_to_device\n"); | ||
__d_ = __h_; | ||
__dirty_ = false; | ||
} | ||
} | ||
|
||
void sync_device_to_host() | ||
{ | ||
printf("sync_device_to_host\n"); | ||
__h_ = __d_; | ||
} | ||
|
||
template <__param _Param> | ||
struct __action : detail::__immovable | ||
{ | ||
static constexpr bool __mut = ((_Param & __param::_out) == __param::_out); | ||
using __cv_vector = ::cuda::std::__maybe_const<!__mut, vector>; | ||
|
||
explicit __action(stream_ref __str, __cv_vector& __v) noexcept | ||
: __str_(__str) | ||
, __v_(__v) | ||
{ | ||
printf("action()\n"); | ||
if constexpr ((_Param & __param::_in) == __param::_in) | ||
{ | ||
__v_.sync_host_to_device(); | ||
} | ||
} | ||
|
||
~__action() | ||
{ | ||
printf("~action()\n"); | ||
if constexpr ((_Param & __param::_out) == __param::_out) | ||
{ | ||
printf("about to synchronize the stream\n"); | ||
fflush(stdout); | ||
__str_.wait(); // wait for the kernel to finish | ||
ericniebler marked this conversation as resolved.
Show resolved
Hide resolved
|
||
printf("done synchronizing the stream\n"); | ||
fflush(stdout); | ||
__v_.sync_device_to_host(); | ||
} | ||
} | ||
|
||
using __as_kernel_arg = ::cuda::std::span<_Ty>; | ||
|
||
operator ::cuda::std::span<_Ty>() | ||
{ | ||
printf("to span\n"); | ||
return {__v_.__d_.data().get(), __v_.__d_.size()}; | ||
} | ||
|
||
public: | ||
stream_ref __str_; | ||
__cv_vector& __v_; | ||
}; | ||
|
||
_CCCL_NODISCARD_FRIEND __action<__param::_inout> __cudax_launch_transform(stream_ref __str, const vector& __v) noexcept | ||
{ | ||
return __action<__param::_inout>{__str, __v}; | ||
} | ||
|
||
_CCCL_NODISCARD_FRIEND __action<__param::_in> | ||
__cudax_launch_transform(stream_ref __str, detail::__in_box<vector> __b) noexcept | ||
{ | ||
return __action<__param::_in>{__str, __b.__val}; | ||
} | ||
|
||
_CCCL_NODISCARD_FRIEND __action<__param::_out> | ||
__cudax_launch_transform(stream_ref __str, detail::__out_box<vector> __b) noexcept | ||
{ | ||
return __action<__param::_out>{__str, __b.__val}; | ||
} | ||
|
||
host_vector<_Ty> __h_; | ||
mutable device_vector<_Ty> __d_{}; | ||
mutable bool __dirty_ = true; | ||
}; | ||
|
||
template <class _Ty> | ||
detail::__in_box<_Ty> in(const _Ty& __v) noexcept | ||
{ | ||
return {__v}; | ||
} | ||
|
||
template <class _Ty> | ||
detail::__out_box<_Ty> out(_Ty& __v) noexcept | ||
{ | ||
return {__v}; | ||
} | ||
|
||
} // namespace cuda::experimental | ||
|
||
#endif | ||
#endif |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem isn't the architecture value, it's that the way this test is set up, it requires running on a GPU runner, but is ending up on a CPU runner.
@alliepiper can help you get it sorted.