From dbf4bd02a8fdccd1891edbc2d049c3ddddb234b3 Mon Sep 17 00:00:00 2001
From: GALI PREM SAGAR <sagarprem75@gmail.com>
Date: Tue, 30 Jul 2024 12:14:14 -0500
Subject: [PATCH] Add about rmm modes in `cudf.pandas` docs (#16404)

This PR adds user facing docs for rmm memory modes and prefetching.

---------

Co-authored-by: Mark Harris <783069+harrism@users.noreply.github.com>
Co-authored-by: Bradley Dice <bdice@bradleydice.com>
---
 docs/cudf/source/cudf_pandas/how-it-works.md | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/docs/cudf/source/cudf_pandas/how-it-works.md b/docs/cudf/source/cudf_pandas/how-it-works.md
index 75f57742ac9..8efd9d7e063 100644
--- a/docs/cudf/source/cudf_pandas/how-it-works.md
+++ b/docs/cudf/source/cudf_pandas/how-it-works.md
@@ -36,3 +36,19 @@ transfers.
 When using `cudf.pandas`, cuDF's [pandas compatibility
 mode](api.options) is automatically enabled, ensuring consistency with
 pandas-specific semantics like default sort ordering.
+
+`cudf.pandas` uses a managed memory pool by default. This allows `cudf.pandas` to process datasets larger than the memory of the GPU it is running on. Managed memory prefetching is also enabled by default to improve memory access performance. For more information on CUDA Unified Memory (managed memory), performance, and prefetching, see [this NVIDIA Developer blog post](https://developer.nvidia.com/blog/improving-gpu-memory-oversubscription-performance/).
+
+Pool allocators improve allocation performance. Without using one, memory
+allocation may be a bottleneck depending on the workload. Managed memory
+enables oversubscribing GPU memory. This allows cudf.pandas to process
+data larger than GPU memory in many cases, without CPU (Pandas) fallback.
+
+Other memory allocators can be used by changing the environment
+variable `CUDF_PANDAS_RMM_MODE` to one of the following.
+
+1. "managed_pool" (default): CUDA Unified Memory (managed memory) with RMM's asynchronous pool allocator.
+2. "managed": CUDA Unified Memory, (managed memory) with no pool allocator.
+3. "async": CUDA's built-in pool asynchronous pool allocator with normal CUDA device memory.
+4. "pool": RMM's asynchronous pool allocator with normal CUDA device memory.
+5. "cuda": normal CUDA device memory with no pool allocator.