-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pool_memory_resource optimization: disable tracking allocated blocks by default #702
Merged
rapids-bot
merged 1 commit into
rapidsai:branch-0.19
from
harrism:fea-pool-dont-track-allocated-blocks
Feb 16, 2021
Merged
pool_memory_resource optimization: disable tracking allocated blocks by default #702
rapids-bot
merged 1 commit into
rapidsai:branch-0.19
from
harrism:fea-pool-dont-track-allocated-blocks
Feb 16, 2021
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
harrism
added
3 - Ready for review
Ready for review by team
non-breaking
Non-breaking change
improvement
Improvement / enhancement to an existing function
labels
Feb 15, 2021
harrism
changed the title
Tracking allocated blocks in pool disabled by default
pool_memory_resource optimization: disable tracking allocated blocks by default
Feb 15, 2021
harrism
changed the title
pool_memory_resource optimization: disable tracking allocated blocks by default
Optimize pool_memory_resource to disable tracking allocated blocks by default
Feb 15, 2021
harrism
changed the title
Optimize pool_memory_resource to disable tracking allocated blocks by default
pool_memory_resource optimization: disable tracking allocated blocks by default
Feb 16, 2021
jrhemstad
approved these changes
Feb 16, 2021
rongou
approved these changes
Feb 16, 2021
@gpucibot merge |
rapids-bot bot
pushed a commit
that referenced
this pull request
Mar 17, 2021
… by default (#732) This is done similarly to #702. Previously `arena_memory_resource` maintained a set of allocated blocks, but this was only used for reporting/debugging purposes. Maintaining this set requires a `set::find` at every deallocation, which can get expensive when there are many allocated blocks. This PR moves the tracking behind a default-undefined preprocessor flag. This results in some speedup in the random allocations benchmark for `arena_memory_resource`. Tracking can be enabled by defining `RMM_POOL_TRACK_ALLOCATIONS`. This should also fix the Spark small shuffle buffer issue: NVIDIA/spark-rapids#1711 Before: ```console ------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------ BM_RandomAllocations/arena_mr/1000/1 1.36 ms 1.36 ms 457 BM_RandomAllocations/arena_mr/1000/4 1.21 ms 1.21 ms 517 BM_RandomAllocations/arena_mr/1000/64 1.22 ms 1.22 ms 496 BM_RandomAllocations/arena_mr/1000/256 1.08 ms 1.07 ms 535 BM_RandomAllocations/arena_mr/1000/1024 0.949 ms 0.948 ms 583 BM_RandomAllocations/arena_mr/1000/4096 0.853 ms 0.848 ms 680 BM_RandomAllocations/arena_mr/10000/1 98.7 ms 98.3 ms 8 BM_RandomAllocations/arena_mr/10000/4 65.4 ms 65.4 ms 9 BM_RandomAllocations/arena_mr/10000/64 16.6 ms 16.5 ms 38 BM_RandomAllocations/arena_mr/10000/256 11.2 ms 11.2 ms 48 BM_RandomAllocations/arena_mr/10000/1024 9.45 ms 9.44 ms 62 BM_RandomAllocations/arena_mr/10000/4096 9.24 ms 9.20 ms 59 BM_RandomAllocations/arena_mr/100000/1 7536 ms 7536 ms 1 BM_RandomAllocations/arena_mr/100000/4 3002 ms 3002 ms 1 BM_RandomAllocations/arena_mr/100000/64 170 ms 170 ms 3 BM_RandomAllocations/arena_mr/100000/256 107 ms 107 ms 7 BM_RandomAllocations/arena_mr/100000/1024 96.0 ms 95.7 ms 6 BM_RandomAllocations/arena_mr/100000/4096 86.7 ms 86.7 ms 6 ``` After: ```console ------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------ BM_RandomAllocations/arena_mr/1000/1 1.20 ms 1.20 ms 519 BM_RandomAllocations/arena_mr/1000/4 1.08 ms 1.08 ms 588 BM_RandomAllocations/arena_mr/1000/64 1.11 ms 1.11 ms 552 BM_RandomAllocations/arena_mr/1000/256 0.957 ms 0.957 ms 611 BM_RandomAllocations/arena_mr/1000/1024 0.857 ms 0.857 ms 687 BM_RandomAllocations/arena_mr/1000/4096 0.795 ms 0.793 ms 724 BM_RandomAllocations/arena_mr/10000/1 73.0 ms 73.0 ms 10 BM_RandomAllocations/arena_mr/10000/4 45.7 ms 45.7 ms 14 BM_RandomAllocations/arena_mr/10000/64 14.4 ms 14.4 ms 40 BM_RandomAllocations/arena_mr/10000/256 9.87 ms 9.82 ms 60 BM_RandomAllocations/arena_mr/10000/1024 8.72 ms 8.72 ms 69 BM_RandomAllocations/arena_mr/10000/4096 7.32 ms 7.30 ms 85 BM_RandomAllocations/arena_mr/100000/1 6384 ms 6384 ms 1 BM_RandomAllocations/arena_mr/100000/4 2480 ms 2480 ms 1 BM_RandomAllocations/arena_mr/100000/64 147 ms 147 ms 5 BM_RandomAllocations/arena_mr/100000/256 103 ms 103 ms 7 BM_RandomAllocations/arena_mr/100000/1024 78.1 ms 78.1 ms 9 BM_RandomAllocations/arena_mr/100000/4096 72.3 ms 72.3 ms 9 ``` @abellina Authors: - Rong Ou (@rongou) Approvers: - Mark Harris (@harrism) - Conor Hoekstra (@codereport) URL: #732
2 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
3 - Ready for review
Ready for review by team
cpp
Pertains to C++ code
improvement
Improvement / enhancement to an existing function
non-breaking
Non-breaking change
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously
pool_memory_resource
maintained a set of allocated blocks, but this was only used for reporting/debugging purposes. Maintaining this set requires aset::find
at every deallocation, which can get expensive when there are many allocated blocks. This PR moves the tracking behind a default-undefined preprocessor flag. This results in up to 40% speedup in the random allocations benchmark forpool_memory_resource
. Tracking can be enabled by definingRMM_POOL_TRACK_ALLOCATIONS
.Here are the results.
Before:
After: