Intentionally leak thread_local CUDA resources to avoid crash (part 1) #16787

kingcrimsontianyu · 2024-09-10T20:50:02Z

Description

The NVbench application PARQUET_READER_NVBENCH in libcudf currently crashes with the segmentation fault. To reproduce:

./PARQUET_READER_NVBENCH -d 0 -b 1 --run-once -a io_type=FILEPATH -a compression_type=SNAPPY -a cardinality=0 -a run_length=1

The root cause is that some (1) thread_local objects on the main thread in libcudf and (2) static objects in kvikio are destroyed after cudaDeviceReset() in NVbench and upon program termination. These objects should simply be leaked, since their destructors making CUDA calls upon program termination constitutes UB in CUDA.

This simple PR is the cuDF side of the fix. The other part is done here rapidsai/kvikio#462.

closes #13229

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

wence- · 2024-09-11T14:28:00Z

cpp/src/utilities/stream_pool.cpp

@@ -130,7 +130,6 @@ rmm::cuda_device_id get_current_cuda_device()
 */
 struct cuda_event {
  cuda_event() { CUDF_CUDA_TRY(cudaEventCreateWithFlags(&e_, cudaEventDisableTiming)); }
-  virtual ~cuda_event() { CUDF_ASSERT_CUDA_SUCCESS(cudaEventDestroy(e_)); }


issue (maybe): This means that every usage of cuda_event in the code base will leak the underlying event. Is this really what we want? Perhaps we only want to leak the thread_local ones below.

That is, should we change event_for_thread to do (approximately):

thread_local std::vector<cuda_event *> thread_events(get_num_cuda_devices()); ... thread_events[device_id.value()] = new cuda_event();

?

Alternately, if the only usage of cuda_event is in event_for_thread, we could just delete this class completely and implement event_for_thread as:

thread_local std::vector<cudaEvent_t> thread_events(...); if (!thread_events[device_id.value()]) { CUDF_CUDA_TRY(cudaEventCreateWithFlags(&thread_events[device_id.value()], ...); } return thread_events[device_id.value()];

WDYT?

cuda_event is only used as the RAII wrapper by event_for_thread, so deleting the user-defined destructor and allocating cuda_event on the heap happen to be equivalent. I agree that the leak makes the cuda_event wrapper entirely superfluous, so I will use the alternative approach you suggested to make the code cleaner. Thank you @wence- !

vuule · 2024-09-11T19:42:16Z

cpp/src/utilities/stream_pool.cpp

@@ -147,12 +136,13 @@ struct cuda_event {
 */
 cudaEvent_t event_for_thread()
 {
-  thread_local std::vector<std::unique_ptr<cuda_event>> thread_events(get_num_cuda_devices());
+  thread_local std::vector<cudaEvent_t> thread_events(get_num_cuda_devices());


IMO we should keep cuda_event and only change thread_events to std::vector<cuda_event*>. cuda_event is a useful class and there are other places in libcudf where we can utilize it. I'd rather just disable RAII in this specific place.

I see. Should cuda_event be moved to the header file include/cudf/detail/utilities/stream_pool.hpp? It's currently defined in .cpp.

Yeah, we should move it at some point.

On a second thought, this event RAII wrapper should be moved to a separate header file if found useful. I have some doubt on its necessity after a brief search in the cudf repo. Anway, for this PR, I'm keeping its location unchanged, and have applied the change you suggested. @vuule

copy-pr-bot · 2024-09-12T04:07:34Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

kingcrimsontianyu · 2024-09-12T04:13:24Z

/ok to test

The NVbench application `PARQUET_READER_NVBENCH` in libcudf currently crashes with the segmentation fault. To reproduce: ``` ./PARQUET_READER_NVBENCH -d 0 -b 1 --run-once -a io_type=FILEPATH -a compression_type=SNAPPY -a cardinality=0 -a run_length=1 ``` The root cause is that some (1) `thread_local` objects on the main thread in `libcudf` and (2) `static` objects in `kvikio` are destroyed after `cudaDeviceReset()` in NVbench and upon program termination. These objects should simply be leaked, since their destructors making CUDA calls upon program termination constitutes UB in CUDA. This simple PR is the kvikIO side of the fix. The other part is done here rapidsai/cudf#16787. Authors: - Tianyu Liu (https://github.com/kingcrimsontianyu) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) URL: #462

kingcrimsontianyu · 2024-09-17T14:42:19Z

/ok to test

cpp/src/utilities/stream_pool.cpp

ttnghia · 2024-09-18T17:23:13Z

/ok to test

vuule · 2024-09-19T18:31:54Z

/ok to test

vuule · 2024-09-19T22:08:37Z

/merge

rapidsai#16787) The NVbench application `PARQUET_READER_NVBENCH` in libcudf currently crashes with the segmentation fault. To reproduce: ``` ./PARQUET_READER_NVBENCH -d 0 -b 1 --run-once -a io_type=FILEPATH -a compression_type=SNAPPY -a cardinality=0 -a run_length=1 ``` The root cause is that some (1) `thread_local` objects on the main thread in `libcudf` and (2) `static` objects in `kvikio` are destroyed after `cudaDeviceReset()` in NVbench and upon program termination. These objects should simply be leaked, since their destructors making CUDA calls upon program termination constitutes UB in CUDA. This simple PR is the cuDF side of the fix. The other part is done here rapidsai/kvikio#462. closes rapidsai#13229 Authors: - Tianyu Liu (https://github.com/kingcrimsontianyu) - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Nghia Truong (https://github.com/ttnghia) URL: rapidsai#16787

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Sep 10, 2024

kingcrimsontianyu changed the title ~~Intentionally leak thread_local CUDA resources to avoid crash~~ Intentionally leak thread_local CUDA resources to avoid crash (part 1) Sep 11, 2024

kingcrimsontianyu mentioned this pull request Sep 11, 2024

Intentionally leak static CUDA resources to avoid crash (part 2) rapidsai/kvikio#462

Merged

kingcrimsontianyu marked this pull request as ready for review September 11, 2024 03:12

kingcrimsontianyu requested a review from a team as a code owner September 11, 2024 03:12

kingcrimsontianyu requested review from hyperbolic2346 and lamarrr September 11, 2024 03:12

kingcrimsontianyu added bug Something isn't working non-breaking Non-breaking change labels Sep 11, 2024

wence- reviewed Sep 11, 2024

View reviewed changes

vuule reviewed Sep 11, 2024

View reviewed changes

kingcrimsontianyu force-pushed the tianyu.liu/leak-intentionally branch from 00b0549 to 442db74 Compare September 12, 2024 04:07

kingcrimsontianyu requested review from vuule and wence- September 16, 2024 19:15

vuule approved these changes Sep 16, 2024

View reviewed changes

kingcrimsontianyu force-pushed the tianyu.liu/leak-intentionally branch from 442db74 to eb8bcdf Compare September 17, 2024 14:43

lamarrr reviewed Sep 17, 2024

View reviewed changes

cpp/src/utilities/stream_pool.cpp Show resolved Hide resolved

kingcrimsontianyu added 5 commits September 18, 2024 11:27

Intentionally leak thread_local CUDA resources

efb7f80

Improve implementation

f3fac15

Address reviewers comments

601e83d

Cleanup

f323335

Add copy and move restrictions to the event wrapper

5c28023

kingcrimsontianyu force-pushed the tianyu.liu/leak-intentionally branch from 1a1731b to 5c28023 Compare September 18, 2024 15:27

vuule requested a review from lamarrr September 18, 2024 17:17

GregoryKimball requested a review from ttnghia September 18, 2024 17:18

ttnghia approved these changes Sep 18, 2024

View reviewed changes

Merge branch 'branch-24.10' into tianyu.liu/leak-intentionally

3eef454

vuule added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Sep 18, 2024

Merge branch 'branch-24.10' into tianyu.liu/leak-intentionally

5c19051

rapids-bot bot merged commit 8e1345f into rapidsai:branch-24.10 Sep 19, 2024
97 checks passed

kingcrimsontianyu self-assigned this Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intentionally leak thread_local CUDA resources to avoid crash (part 1) #16787

Intentionally leak thread_local CUDA resources to avoid crash (part 1) #16787

kingcrimsontianyu commented Sep 10, 2024 •

edited

Loading

wence- Sep 11, 2024

kingcrimsontianyu Sep 11, 2024

kingcrimsontianyu Sep 11, 2024

vuule Sep 11, 2024

kingcrimsontianyu Sep 11, 2024

vuule Sep 11, 2024

kingcrimsontianyu Sep 12, 2024

copy-pr-bot bot commented Sep 12, 2024

kingcrimsontianyu commented Sep 12, 2024

kingcrimsontianyu commented Sep 17, 2024

ttnghia commented Sep 18, 2024

vuule commented Sep 19, 2024

vuule commented Sep 19, 2024

Intentionally leak thread_local CUDA resources to avoid crash (part 1) #16787

Intentionally leak thread_local CUDA resources to avoid crash (part 1) #16787

Conversation

kingcrimsontianyu commented Sep 10, 2024 • edited Loading

Description

Checklist

wence- Sep 11, 2024

Choose a reason for hiding this comment

kingcrimsontianyu Sep 11, 2024

Choose a reason for hiding this comment

kingcrimsontianyu Sep 11, 2024

Choose a reason for hiding this comment

vuule Sep 11, 2024

Choose a reason for hiding this comment

kingcrimsontianyu Sep 11, 2024

Choose a reason for hiding this comment

vuule Sep 11, 2024

Choose a reason for hiding this comment

kingcrimsontianyu Sep 12, 2024

Choose a reason for hiding this comment

copy-pr-bot bot commented Sep 12, 2024

kingcrimsontianyu commented Sep 12, 2024

kingcrimsontianyu commented Sep 17, 2024

ttnghia commented Sep 18, 2024

vuule commented Sep 19, 2024

vuule commented Sep 19, 2024

kingcrimsontianyu commented Sep 10, 2024 •

edited

Loading