diff --git a/docs/dask_cudf/source/best_practices.rst b/docs/dask_cudf/source/best_practices.rst index 83039f86fed..41263ebf589 100644 --- a/docs/dask_cudf/source/best_practices.rst +++ b/docs/dask_cudf/source/best_practices.rst @@ -81,7 +81,7 @@ representations, native cuDF spilling may be insufficient. For these cases, `JIT-unspill `__ is likely to produce better protection from out-of-memory (OOM) errors. Please see `Dask-CUDA's spilling documentation -`__ for further details +`__ for further details and guidance. Use RMM @@ -160,7 +160,7 @@ of the underlying task graph to materialize the collection. :func:`sort_values` / :func:`set_index` : These operations both require Dask to eagerly collect quantile information about the column(s) being targeted by the -global sort operation. See `Avoid Sorting`__ for notes on sorting considerations. +global sort operation. See the next section for notes on sorting considerations. .. note:: When using :func:`set_index`, be sure to pass in ``sort=False`` whenever the @@ -297,11 +297,14 @@ bottleneck is typically device-to-host memory spilling. Although every workflow is different, the following guidelines are often recommended: -* `Use a distributed cluster with Dask-CUDA workers `_ -* `Use native cuDF spilling whenever possible `_ +* Use a distributed cluster with `Dask-CUDA `__ workers + +* Use native cuDF spilling whenever possible (`Dask-CUDA spilling documentation `__) + * Avoid shuffling whenever possible - * Use ``split_out=1`` for low-cardinality groupby aggregations - * Use ``broadcast=True`` for joins when at least one collection comprises a small number of partitions (e.g. ``<=5``) + * Use ``split_out=1`` for low-cardinality groupby aggregations + * Use ``broadcast=True`` for joins when at least one collection comprises a small number of partitions (e.g. ``<=5``) + * `Use UCX `__ if communication is a bottleneck. .. note:: diff --git a/docs/dask_cudf/source/conf.py b/docs/dask_cudf/source/conf.py index dc40254312e..5daa8245695 100644 --- a/docs/dask_cudf/source/conf.py +++ b/docs/dask_cudf/source/conf.py @@ -78,6 +78,7 @@ "cudf": ("https://docs.rapids.ai/api/cudf/stable/", None), "dask": ("https://docs.dask.org/en/stable/", None), "pandas": ("https://pandas.pydata.org/docs/", None), + "dask-cuda": ("https://docs.rapids.ai/api/dask-cuda/stable/", None), } numpydoc_show_inherited_class_members = True diff --git a/docs/dask_cudf/source/index.rst b/docs/dask_cudf/source/index.rst index 6eb755d7854..c2891ebc15e 100644 --- a/docs/dask_cudf/source/index.rst +++ b/docs/dask_cudf/source/index.rst @@ -16,10 +16,9 @@ as the ``"cudf"`` dataframe backend for Neither Dask cuDF nor Dask DataFrame provide support for multi-GPU or multi-node execution on their own. You must also deploy a `dask.distributed `__ cluster - to leverage multiple GPUs. We strongly recommend using `Dask-CUDA - `__ to simplify the - setup of the cluster, taking advantage of all features of the GPU - and networking hardware. + to leverage multiple GPUs. We strongly recommend using :doc:`dask-cuda:index` + to simplify the setup of the cluster, taking advantage of all features + of the GPU and networking hardware. If you are familiar with Dask and `pandas `__ or `cuDF `__, then Dask cuDF @@ -161,7 +160,7 @@ out-of-core computing. This also means that the compute tasks can be executed in parallel over a multi-GPU cluster. In order to execute your Dask workflow on multiple GPUs, you will -typically need to use `Dask-CUDA `__ +typically need to use :doc:`dask-cuda:index` to deploy distributed Dask cluster, and `Distributed `__ to define a client object. For example:: @@ -192,7 +191,7 @@ to define a client object. For example:: `__ for more details. -Please see the `Dask-CUDA `__ +Please see the :doc:`dask-cuda:index` documentation for more information about deploying GPU-aware clusters (including `best practices `__).