Add default pinned pool that falls back to new pinned allocations #15665

vuule · 2024-05-06T16:54:21Z

Description

Adds a pooled pinned memory resource that is created on first call to get_host_memory_resource or set_host_memory_resource.
The pool has a fixed size: 0.5% of the device memory capacity, limited to 100MB. At 100MB, the pool takes ~30ms to initialize. Size of the pool can be overridden with environment variable LIBCUDF_PINNED_POOL_SIZE.
If an allocation cannot be done within the pool, a new pinned allocation is performed.
The allocator uses a stream from the global stream pool to initialize and perform synchronous operations (allocate/deallocate). Users of the resource don't need to be aware of this implementation detail as these operations synchronize before they are completed.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

…d-pool

cpp/src/io/utilities/config_utils.cpp

…bug-allocator-copy-wrong-stream

…vuule/cudf into bug-allocator-copy-wrong-stream

…d-pool

…perf-defaul-piined-pool

… into perf-defaul-piined-pool

vuule · 2024-05-07T22:12:26Z

cpp/include/cudf/detail/utilities/stream_pool.hpp

+/**
+ * @brief Get the global stream pool.
+ */
+cuda_stream_pool& global_cuda_stream_pool();


had to expose the pool to get a stream from it without forking

vuule · 2024-05-07T22:47:24Z

cpp/src/io/utilities/config_utils.cpp

+
+    size_t free{}, total{};
+    cudaMemGetInfo(&free, &total);
+    // 0.5% of the total device memory, capped at 100MB


Allocating a 100MB pool takes 30ms on my system. This is smaller then CUDA runtime init (~60ms) and cuFile/kvikio init (180ms) so IMO this won't significantly impact user experience.

…perf-defaul-piined-pool

ttnghia · 2024-05-13T23:24:15Z

cpp/include/cudf/io/memory_resource.hpp

+/**
+ * @brief Configure the size of the default host memory resource.
+ *
+ * Must be called before any other function in this header.


This is a dangerous requirement, and may not be satisfied. How about making the static function re-configuruable?

For doing so:

Static variable is declared outside of function scope (but in an anonymous namespace, so it is static inside just this TU). In addition, it can be a smart pointer.

host_mr will initialize it with std::nullopt size if it is nullptr, otherwise just derefs the current pointer and returns.

User can specify a size parameter to recompute and overwrite that static variable with a new mr.

All these ops should be thread safe.

to clarify, the issue with calling config after get/set is that it would have no effect.

allowing this opens another can of worms., e.g. what is the intended effect of calling config after set?

If we don't allow this, let's make some validity check to prevent it from being accidentally misused. It sounds unsafe if we just make an assumption.

@abellina what behavior do you suggest when config is called after the first resource use? I'm not sure if we should throw or just warn.

I think we should throw, I agree with @ttnghia that we should do something in that case.

Added a mechanism to throw if config is called after the default resource has already been created.
@abellina might be good to test your branch with this change.

…perf-defaul-piined-pool

abellina

@vuule thank you for adding the configuration API. I have a local branch that I will PR once this goes in that pipes that API through our JNI layer, so we can set it from the plugin.

I tested this as is and saw both pinned allocations (once for the default pinned pool, and once for ours).
With the config set to 0 I only see 1 allocation (ours)
I re-verified that the default resource works with a 0 size allocation: I see cudaHostAlloc.
I also verified that we use our resource if we set it, as it used to work before.

LGTM

cpp/src/io/utilities/config_utils.cpp

…perf-defaul-piined-pool

davidwendt · 2024-05-14T20:50:25Z

cpp/src/io/utilities/config_utils.cpp

 {
-  static rmm::mr::pinned_host_memory_resource default_mr{};
-  return default_mr;
+  static rmm::host_async_resource_ref mr_ref = make_default_pinned_mr(size);


Perhaps this was asked before but I'm curious when/how this object is destroyed?
Is it destroyed automatically when the process ends i.e. after main() completes?
Are there any CUDA API calls in the destructor(s)?
Maybe this is ok for host memory resources.

great question. Currently the pool itself is not destroyed as it caused a segfault at the end of some tests; presumably because of the call to cudaFreeHost after main(). But this is something I should revisit and verify what exactly the issue was.

Yeah, can't destroy a static pool resource object. Open to suggestions to avoid the pool leak.

abellina · 2024-05-15T14:39:23Z

cpp/src/io/utilities/config_utils.cpp

-  static rmm::mr::pinned_host_memory_resource default_mr{};
-  return default_mr;
+  static rmm::host_async_resource_ref* mr_ref = nullptr;
+  CUDF_EXPECTS(mr_ref == nullptr, "The default host memory resource has already been created");


This does not work as intended.

call config_default_host_memory_resource

call set_host_memory_resource(cudf_jni_resource) or skip this step, it fails either on the set_host_memory_resource call or when the first code tries to call get_host_memory_resource.

We see parquet read blowing up with "The default host memory resource has already been created".

I think I know why:

config_default_host_memory_resource: calls make_host_mr, which sets mr_ref to not nullptr.

call set_host_memory_resource(cudf_jni_resource): calls host_mr whose mr_ref is NOT set yet, so it has to call make_host_mr, this blows up. If you don't set a custom resource, get_host_memory_resource also calls host_mr blowing up.

I think we need to go through host_mr in all code paths. That way mr_ref is set when config_default_host_memory_resource is called as well. I think this implies adding an optional size to host_mr.

I think we may need to use some sort of out param that gets passed to host_mr that says whether it was initialized or not. If we do end up creating a resource we would indicate that by setting the out param to true (e.g. did_initialize). Then in configure_default_host_memory_resource we could detect if !did_initialize and throw there?

updated the logic; seems to work in local tests (tested both cases this time 😅 )
@abellina Please rerun tests and let me know. I'll also try to come up with unit tests before merging.

abellina

The new throw logic isn't working as expected, please see my comment.

…perf-defaul-piined-pool

vyasr

Approving, thanks for some offline discussions.

vuule · 2024-05-20T22:28:43Z

/merge

@vuule

This PR depends on #15665 and so it won't build until that PR merges. Adds support for `cudf::io::config_host_memory_resource` which is being worked on in #15665. In 24.06 we are going to disable the cuDF pinned pool and look into this more in 24.08. We currently have a pinned pooled resource that has been setup to share pinned memory with other APIs we use from java, so we wanted to prevent extra pinned memory being created by default, and @vuule has added an API for us to call to accomplish this. Authors: - Alessandro Bellina (https://github.com/abellina) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Nghia Truong (https://github.com/ttnghia) URL: #15745

harrism · 2024-05-22T00:17:05Z

cpp/src/io/utilities/config_utils.cpp

+    pool_->deallocate_async(pool_begin_, pool_size_, stream_);
+  }
+
+  void* do_allocate_async(std::size_t bytes, std::size_t alignment, cuda::stream_ref stream)


I'm late, but these do_ versions should probably be protected/private?

harrism · 2024-05-22T00:18:46Z

cpp/src/io/utilities/config_utils.cpp

+      size_t free{}, total{};
+      CUDF_CUDA_TRY(cudaMemGetInfo(&free, &total));
+      // 0.5% of the total device memory, capped at 100MB
+      return std::min(total / 200, size_t{100} * 1024 * 1024);


I'm late, but this should use rmm::percent_of_free_device_memory. That function only takes an integer percent. If you need a decimal percent, please file an issue. Or you can just use 1% and then divide by 2.

Or at least use rmm::available_device_memory().

Yes, we could be using rmm::available_device_memory() to get the memory capacity. I'll address this in 24.08.
If there's a plan to add percent_of_total_device_memory, that would be even better.

It already exists.

pool with fallback

163ad97

vuule added Performance Performance related issue non-breaking Non-breaking change labels May 6, 2024

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label May 6, 2024

GregoryKimball assigned vuule May 6, 2024

vuule added 6 commits May 6, 2024 14:51

don't use default pool

1e850d6

fix allocator copy assignment

3be42ba

fix ver2

395dcf1

Merge branch 'branch-24.06' into bug-allocator-copy-wrong-stream

70ae74e

Merge branch 'bug-allocator-copy-wrong-stream' into perf-defaul-piine…

503d170

…d-pool

copyright

0873b1f

vuule added the feature request New feature or request label May 6, 2024

vuule commented May 6, 2024

View reviewed changes

cpp/src/io/utilities/config_utils.cpp Outdated Show resolved Hide resolved

vuule added 10 commits May 7, 2024 11:57

fix operator==

ff18a21

Merge branch 'branch-24.06' of https://github.com/rapidsai/cudf into …

f5a735c

…bug-allocator-copy-wrong-stream

Merge branch 'bug-allocator-copy-wrong-stream' of https://github.com/…

5766805

…vuule/cudf into bug-allocator-copy-wrong-stream

Merge branch 'bug-allocator-copy-wrong-stream' into perf-defaul-piine…

0bd92bf

…d-pool

simplify pool creation

854c0ab

namespace; comments

5bf0ce4

Merge branch 'branch-24.06' into perf-defaul-piined-pool

284654d

clean up

ff4d7f6

Merge branch 'branch-24.06' of https://github.com/rapidsai/cudf into …

f5b2c84

…perf-defaul-piined-pool

Merge branch 'perf-defaul-piined-pool' of https://github.com/vuule/cudf…

cf3f8a3

… into perf-defaul-piined-pool

vuule commented May 7, 2024

View reviewed changes

mild polish

80b5963

vuule commented May 7, 2024

View reviewed changes

Merge branch 'branch-24.06' of https://github.com/rapidsai/cudf into …

d23684d

…perf-defaul-piined-pool

vuule marked this pull request as ready for review May 8, 2024 18:34

vuule requested a review from a team as a code owner May 8, 2024 18:35

vuule requested a review from bdice May 8, 2024 18:35

add config function

0eccf9a

ttnghia reviewed May 13, 2024

View reviewed changes

vuule added 2 commits May 13, 2024 16:35

align config size; add missing header

ecd6481

Merge branch 'branch-24.06' of https://github.com/rapidsai/cudf into …

709123f

…perf-defaul-piined-pool

abellina approved these changes May 14, 2024

View reviewed changes

jrhemstad requested changes May 14, 2024

View reviewed changes

cpp/src/io/utilities/config_utils.cpp Outdated Show resolved Hide resolved

This was referenced May 14, 2024

Disabling the cuDF default pinned pool for 24.06 NVIDIA/spark-rapids#10815

Closed

[JNI] Expose java API for cudf::io::config_host_memory_resource #15745

Merged

vuule added 2 commits May 14, 2024 13:00

Merge branch 'branch-24.06' of https://github.com/rapidsai/cudf into …

01b1bdb

…perf-defaul-piined-pool

CUDF_EXPORT

ecb5f5a

jrhemstad approved these changes May 14, 2024

View reviewed changes

davidwendt reviewed May 14, 2024

View reviewed changes

fail config if resource is already created

f0d0bf0

abellina reviewed May 15, 2024

View reviewed changes

abellina self-requested a review May 15, 2024 14:40

abellina requested changes May 15, 2024

View reviewed changes

vuule added 3 commits May 15, 2024 14:34

fix config check

f989a56

Merge branch 'branch-24.06' of https://github.com/rapidsai/cudf into …

d0e6dd7

…perf-defaul-piined-pool

docs

2b4952a

vuule requested a review from abellina May 16, 2024 16:45

Merge branch 'branch-24.06' into perf-defaul-piined-pool

fdcfad3

abellina approved these changes May 17, 2024

View reviewed changes

vyasr approved these changes May 20, 2024

View reviewed changes

rapids-bot bot merged commit 58f4526 into rapidsai:branch-24.06 May 20, 2024
71 checks passed

vuule deleted the perf-defaul-piined-pool branch May 20, 2024 22:28

harrism reviewed May 22, 2024

View reviewed changes

abellina mentioned this pull request May 22, 2024

[BUG] stop throwing when configuring default host mr #15814

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add default pinned pool that falls back to new pinned allocations #15665

Add default pinned pool that falls back to new pinned allocations #15665

vuule commented May 6, 2024 •

edited

Loading

vuule May 7, 2024

vuule May 7, 2024

ttnghia May 13, 2024 •

edited

Loading

ttnghia May 13, 2024 •

edited

Loading

vuule May 13, 2024

ttnghia May 14, 2024 •

edited

Loading

vuule May 14, 2024

abellina May 14, 2024

vuule May 14, 2024

abellina left a comment

davidwendt May 14, 2024

vuule May 14, 2024

vuule May 14, 2024

abellina May 15, 2024

abellina May 15, 2024

vuule May 15, 2024

abellina left a comment

vyasr left a comment

vuule commented May 20, 2024

harrism May 22, 2024

harrism May 22, 2024

harrism May 22, 2024

vuule May 22, 2024

harrism May 22, 2024

Add default pinned pool that falls back to new pinned allocations #15665

Add default pinned pool that falls back to new pinned allocations #15665

Conversation

vuule commented May 6, 2024 • edited Loading

Description

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ttnghia May 13, 2024 • edited Loading

Choose a reason for hiding this comment

ttnghia May 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ttnghia May 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abellina left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abellina left a comment

Choose a reason for hiding this comment

vyasr left a comment

Choose a reason for hiding this comment

vuule commented May 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vuule commented May 6, 2024 •

edited

Loading

ttnghia May 13, 2024 •

edited

Loading

ttnghia May 13, 2024 •

edited

Loading

ttnghia May 14, 2024 •

edited

Loading