Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pool_memory_resource optimization: disable tracking allocated blocks by default #702

Merged

Conversation

harrism
Copy link
Member

@harrism harrism commented Feb 15, 2021

Previously pool_memory_resource maintained a set of allocated blocks, but this was only used for reporting/debugging purposes. Maintaining this set requires a set::find at every deallocation, which can get expensive when there are many allocated blocks. This PR moves the tracking behind a default-undefined preprocessor flag. This results in up to 40% speedup in the random allocations benchmark for pool_memory_resource. Tracking can be enabled by defining RMM_POOL_TRACK_ALLOCATIONS.

Here are the results.

Before:

-----------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations
-----------------------------------------------------------------------------------
BM_RandomAllocations/pool_mr/1000/1           0.787 ms        0.787 ms          887
BM_RandomAllocations/pool_mr/1000/4           0.811 ms        0.811 ms          764
BM_RandomAllocations/pool_mr/1000/64          0.840 ms        0.839 ms          728
BM_RandomAllocations/pool_mr/1000/256         0.712 ms        0.712 ms          919
BM_RandomAllocations/pool_mr/1000/1024        0.594 ms        0.594 ms         1106
BM_RandomAllocations/pool_mr/1000/4096        0.542 ms        0.542 ms         1076
BM_RandomAllocations/pool_mr/10000/1           75.8 ms         75.8 ms            7
BM_RandomAllocations/pool_mr/10000/4           80.0 ms         80.0 ms            8
BM_RandomAllocations/pool_mr/10000/64          15.4 ms         15.4 ms           44
BM_RandomAllocations/pool_mr/10000/256         7.67 ms         7.66 ms           78
BM_RandomAllocations/pool_mr/10000/1024        6.04 ms         6.04 ms          109
BM_RandomAllocations/pool_mr/10000/4096        5.51 ms         5.51 ms          106
BM_RandomAllocations/pool_mr/100000/1         10648 ms        10645 ms            1
BM_RandomAllocations/pool_mr/100000/4          3520 ms         3519 ms            1
BM_RandomAllocations/pool_mr/100000/64          170 ms          170 ms            4
BM_RandomAllocations/pool_mr/100000/256        76.5 ms         76.5 ms            9
BM_RandomAllocations/pool_mr/100000/1024       59.0 ms         59.0 ms           11
BM_RandomAllocations/pool_mr/100000/4096       54.6 ms         54.5 ms           10

After:

-----------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations
-----------------------------------------------------------------------------------
BM_RandomAllocations/pool_mr/1000/1           0.621 ms        0.621 ms         1116
BM_RandomAllocations/pool_mr/1000/4           0.642 ms        0.642 ms         1079
BM_RandomAllocations/pool_mr/1000/64          0.677 ms        0.677 ms          952
BM_RandomAllocations/pool_mr/1000/256         0.585 ms        0.585 ms         1094
BM_RandomAllocations/pool_mr/1000/1024        0.487 ms        0.487 ms         1284
BM_RandomAllocations/pool_mr/1000/4096        0.458 ms        0.458 ms         1229
BM_RandomAllocations/pool_mr/10000/1           46.2 ms         46.2 ms           15
BM_RandomAllocations/pool_mr/10000/4           49.8 ms         49.8 ms           14
BM_RandomAllocations/pool_mr/10000/64          11.2 ms         11.2 ms           61
BM_RandomAllocations/pool_mr/10000/256         6.29 ms         6.28 ms          102
BM_RandomAllocations/pool_mr/10000/1024        4.91 ms         4.91 ms          128
BM_RandomAllocations/pool_mr/10000/4096        4.61 ms         4.61 ms          119
BM_RandomAllocations/pool_mr/100000/1          7372 ms         7370 ms            1
BM_RandomAllocations/pool_mr/100000/4          2468 ms         2468 ms            1
BM_RandomAllocations/pool_mr/100000/64          115 ms          115 ms            6
BM_RandomAllocations/pool_mr/100000/256        63.8 ms         63.8 ms           10
BM_RandomAllocations/pool_mr/100000/1024       49.7 ms         49.7 ms           13
BM_RandomAllocations/pool_mr/100000/4096       47.2 ms         47.2 ms           12

@harrism harrism added 3 - Ready for review Ready for review by team non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Feb 15, 2021
@harrism harrism self-assigned this Feb 15, 2021
@harrism harrism requested a review from a team as a code owner February 15, 2021 22:52
@github-actions github-actions bot added the cpp Pertains to C++ code label Feb 15, 2021
@harrism harrism changed the title Tracking allocated blocks in pool disabled by default pool_memory_resource optimization: disable tracking allocated blocks by default Feb 15, 2021
@harrism harrism changed the title pool_memory_resource optimization: disable tracking allocated blocks by default Optimize pool_memory_resource to disable tracking allocated blocks by default Feb 15, 2021
@harrism harrism changed the title Optimize pool_memory_resource to disable tracking allocated blocks by default pool_memory_resource optimization: disable tracking allocated blocks by default Feb 16, 2021
@harrism
Copy link
Member Author

harrism commented Feb 16, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 0add908 into rapidsai:branch-0.19 Feb 16, 2021
rapids-bot bot pushed a commit that referenced this pull request Mar 17, 2021
… by default (#732)

This is done similarly to #702.

Previously `arena_memory_resource` maintained a set of allocated blocks, but this was only used for reporting/debugging purposes. Maintaining this set requires a `set::find` at every deallocation, which can get expensive when there are many allocated blocks. This PR moves the tracking behind a default-undefined preprocessor flag. This results in some speedup in the random allocations benchmark for `arena_memory_resource`. Tracking can be enabled by defining `RMM_POOL_TRACK_ALLOCATIONS`.

This should also fix the Spark small shuffle buffer issue: NVIDIA/spark-rapids#1711

Before:
```console
------------------------------------------------------------------------------------
Benchmark                                          Time             CPU   Iterations
------------------------------------------------------------------------------------
BM_RandomAllocations/arena_mr/1000/1            1.36 ms         1.36 ms          457
BM_RandomAllocations/arena_mr/1000/4            1.21 ms         1.21 ms          517
BM_RandomAllocations/arena_mr/1000/64           1.22 ms         1.22 ms          496
BM_RandomAllocations/arena_mr/1000/256          1.08 ms         1.07 ms          535
BM_RandomAllocations/arena_mr/1000/1024        0.949 ms        0.948 ms          583
BM_RandomAllocations/arena_mr/1000/4096        0.853 ms        0.848 ms          680
BM_RandomAllocations/arena_mr/10000/1           98.7 ms         98.3 ms            8
BM_RandomAllocations/arena_mr/10000/4           65.4 ms         65.4 ms            9
BM_RandomAllocations/arena_mr/10000/64          16.6 ms         16.5 ms           38
BM_RandomAllocations/arena_mr/10000/256         11.2 ms         11.2 ms           48
BM_RandomAllocations/arena_mr/10000/1024        9.45 ms         9.44 ms           62
BM_RandomAllocations/arena_mr/10000/4096        9.24 ms         9.20 ms           59
BM_RandomAllocations/arena_mr/100000/1          7536 ms         7536 ms            1
BM_RandomAllocations/arena_mr/100000/4          3002 ms         3002 ms            1
BM_RandomAllocations/arena_mr/100000/64          170 ms          170 ms            3
BM_RandomAllocations/arena_mr/100000/256         107 ms          107 ms            7
BM_RandomAllocations/arena_mr/100000/1024       96.0 ms         95.7 ms            6
BM_RandomAllocations/arena_mr/100000/4096       86.7 ms         86.7 ms            6
```

After:
```console
------------------------------------------------------------------------------------
Benchmark                                          Time             CPU   Iterations
------------------------------------------------------------------------------------
BM_RandomAllocations/arena_mr/1000/1            1.20 ms         1.20 ms          519
BM_RandomAllocations/arena_mr/1000/4            1.08 ms         1.08 ms          588
BM_RandomAllocations/arena_mr/1000/64           1.11 ms         1.11 ms          552
BM_RandomAllocations/arena_mr/1000/256         0.957 ms        0.957 ms          611
BM_RandomAllocations/arena_mr/1000/1024        0.857 ms        0.857 ms          687
BM_RandomAllocations/arena_mr/1000/4096        0.795 ms        0.793 ms          724
BM_RandomAllocations/arena_mr/10000/1           73.0 ms         73.0 ms           10
BM_RandomAllocations/arena_mr/10000/4           45.7 ms         45.7 ms           14
BM_RandomAllocations/arena_mr/10000/64          14.4 ms         14.4 ms           40
BM_RandomAllocations/arena_mr/10000/256         9.87 ms         9.82 ms           60
BM_RandomAllocations/arena_mr/10000/1024        8.72 ms         8.72 ms           69
BM_RandomAllocations/arena_mr/10000/4096        7.32 ms         7.30 ms           85
BM_RandomAllocations/arena_mr/100000/1          6384 ms         6384 ms            1
BM_RandomAllocations/arena_mr/100000/4          2480 ms         2480 ms            1
BM_RandomAllocations/arena_mr/100000/64          147 ms          147 ms            5
BM_RandomAllocations/arena_mr/100000/256         103 ms          103 ms            7
BM_RandomAllocations/arena_mr/100000/1024       78.1 ms         78.1 ms            9
BM_RandomAllocations/arena_mr/100000/4096       72.3 ms         72.3 ms            9
```

@abellina

Authors:
  - Rong Ou (@rongou)

Approvers:
  - Mark Harris (@harrism)
  - Conor Hoekstra (@codereport)

URL: #732
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for review Ready for review by team cpp Pertains to C++ code improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants