Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make async_pool immune to handle reuse. #5348

Merged
merged 3 commits into from
Mar 5, 2024

Conversation

mzient
Copy link
Contributor

@mzient mzient commented Mar 1, 2024

Category:

Bug fix .... actually, previous behavior was accepted and documented, but nonetheless limiting and error prone.

Description:

Prior to this change, stream-ordered allocations had to use stream handles with care - specifically, it was illegal to delete a stream which still had a pending deallocation. This might have caused problems with streams over which we have no control.
This PR removes this limitation. Also, handling of per-thread default stream was broken.

Instead of using a stream handle, this PR tries to obtain a unique stream ID. If possible, we proceed as before, but with extra guarantees - we don't have to care about the stream being destroyed, because the ID is unique.
When a stream ID is not available (old drivers), this code uses a handle-derived non-unique ID and scans the per-stream free blocks linearly when returning to upstream. When getting a per-stream free block, we're not sure it really comes from the same stream. Therefore, if the block isn't ready yet, we record an event on the requesting stream, so that in the worst case, the stream would wait for the allocation to really happen.

Additional information:

The functionality borrows heavily on the implementation of stream id hints in cv-cuda.

Affected modules and functionalities:

Memory resources.

Key points relevant for the review:

N/A

Tests:

NOTE: It's impossible to trigger the fallback behavior manually (well... perhaps we could have a separate test target with some env vars which are only used for testing - I don't think it's worth the extra overhead). It triggers automatically on old (pre-cuda12) drivers.

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is almost copied from cv-cuda.

@@ -549,7 +577,7 @@ class async_pool_resource : public async_memory_resource<Kind>,

detail::pooled_map<char *, padded_block, true> padded_;

std::unordered_map<cudaStream_t, PerStreamFreeBlocks> stream_free_;
std::unordered_map<uint64_t, PerStreamFreeBlocks> stream_free_;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The per-stream resources are keyed not with a handle, but with an ID hint.

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [13203053]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [13203053]: BUILD PASSED

Comment on lines +10 to +11
"cuGetProcAddress": {},
"cuGetProcAddress_v2": {},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depending on CUDA header versions, we need either the former or the latter (starting with CUDA 12.0).

return fn;
}

bool _hasPreciseHint() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why the _ at the beginning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I don't know :)

@jantonguirao jantonguirao self-assigned this Mar 5, 2024
Signed-off-by: Michal Zientkiewicz <[email protected]>
Signed-off-by: Michal Zientkiewicz <[email protected]>
@mzient mzient force-pushed the fix_stream_handle_aliasing branch from 9c08600 to 8a4fab1 Compare March 5, 2024 09:04
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [13281756]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [13281883]: BUILD STARTED

Remove underscore from a function name.

Signed-off-by: Michal Zientkiewicz <[email protected]>
@mzient mzient force-pushed the fix_stream_handle_aliasing branch from c9828d8 to ddaf0a7 Compare March 5, 2024 09:09
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [13281980]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [13281980]: BUILD PASSED

@mzient mzient merged commit 3c263df into NVIDIA:main Mar 5, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants