Thrust: providing the error messages about the lack of GPU or a GPU w… #1848

zkhatami · 2023-01-13T21:32:18Z

To give a user some clue what's happening if the program gets compiled on a node with no GPU or if it gets compiled with different compute capability than the one it's running on. In both scenarios no good error message was produced before. The proposed changes will improve the user experience and make it easier for users to troubleshoot problems.

This fix is for addressing the issue#1785 reported on Thrust NVIDIA/cccl#818

From issue#1785 on thrust (NVIDIA/cccl#818), for this small test case:

#include <thrust/device_vector.h> #include <thrust/sort.h> int main() { thrust::device_vector<int> dv; thrust::sort(dv.begin(), dv.end()); }

when compiled with -gpu=cc60 and then run it on a system with cc80, the error message would be:

terminate called after throwing an instance of 'thrust::system::system_error' what(): radix_sort: failed on 1st step: cudaErrorUnsupportedPtxVersion: the provided PTX was compiled with an unsupported toolchain. Aborted

This doesn't help user to understand what's happening. I tried to address it in this change so that better message will show up:
Incompatible GPU: you are trying to run this program on sm_80, different from the one that it was compiled for.

…ith an incompatible architecture

GPUtester · 2023-01-13T21:32:20Z

Can one of the admins verify this patch?

Admins can comment ok to test to allow this one PR to run or add to allowlist to allow all future PRs from the same author to run.

dkolsen-pgi · 2023-01-17T05:53:18Z

This is essentially what I was thinking for how to solve this problem. I don't know if the details are correct or if the code is as efficient as it should be, but having Thrust detect a bad GPU before invoking any kernels and throwing an exception with a good message is the correct approach.

Co-authored-by: Michael Schellenberger Costa <[email protected]>

gevtushenko

@zkhatami, @dkolsen-pgi I've made a few adjustments to make it work on both cpu and gpu sides. I've also changed the signature, since the semantic of a function that returns an optional but always throws or returns a value is controversial. Please, take a look and let me know if you agree with the changes.

gevtushenko · 2023-01-26T11:21:54Z

run tests

miscco

Looks good to me with a minor request for simplification

thrust/system/cuda/detail/core/util.h

gevtushenko · 2023-01-26T12:54:41Z

run tests

thrust/system/cuda/detail/core/util.h

gevtushenko · 2023-01-27T10:54:12Z

run tests

zkhatami · 2023-01-27T22:50:26Z

Looks good to me as well. Thanks!

gevtushenko · 2023-01-29T04:03:51Z

run tests

Thrust: providing the error messages about the lack of GPU or a GPU w…

ae0c1e7

…ith an incompatible architecture

renaming device to dev_id

f2a1d5f

zkhatami mentioned this pull request Jan 13, 2023

Better error message for no GPU or incompatible GPU NVIDIA/cub#577

Closed

dkolsen-pgi requested review from gevtushenko, alliepiper and jrhemstad January 17, 2023 05:47

Adjust error message about lack of GPU

78b17fe

Co-authored-by: Michael Schellenberger Costa <[email protected]>

gevtushenko approved these changes Jan 26, 2023

View reviewed changes

miscco approved these changes Jan 26, 2023

View reviewed changes

thrust/system/cuda/detail/core/util.h Outdated Show resolved Hide resolved

thrust/system/cuda/detail/core/util.h Show resolved Hide resolved

thrust/system/cuda/detail/core/util.h Outdated Show resolved Hide resolved

Remove optional usage around ptx version

22ed101

dkolsen-pgi reviewed Jan 27, 2023

View reviewed changes

thrust/system/cuda/detail/core/util.h Outdated Show resolved Hide resolved

Reduce number of API calls in PTX check

65dff88

Silence MSVC int/char conversion warning

e636580

gevtushenko merged commit bf941ec into NVIDIA:main Jan 30, 2023

alliepiper added this to the 2.1.0 milestone Mar 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thrust: providing the error messages about the lack of GPU or a GPU w… #1848

Thrust: providing the error messages about the lack of GPU or a GPU w… #1848

zkhatami commented Jan 13, 2023

GPUtester commented Jan 13, 2023

dkolsen-pgi commented Jan 17, 2023

gevtushenko left a comment

gevtushenko commented Jan 26, 2023

miscco left a comment

gevtushenko commented Jan 26, 2023

gevtushenko commented Jan 27, 2023

zkhatami commented Jan 27, 2023

gevtushenko commented Jan 29, 2023

Thrust: providing the error messages about the lack of GPU or a GPU w… #1848

Thrust: providing the error messages about the lack of GPU or a GPU w… #1848

Conversation

zkhatami commented Jan 13, 2023

GPUtester commented Jan 13, 2023

dkolsen-pgi commented Jan 17, 2023

gevtushenko left a comment

Choose a reason for hiding this comment

gevtushenko commented Jan 26, 2023

miscco left a comment

Choose a reason for hiding this comment

gevtushenko commented Jan 26, 2023

gevtushenko commented Jan 27, 2023

zkhatami commented Jan 27, 2023

gevtushenko commented Jan 29, 2023