[cherry-pick][releases/2.0.0][air/docs] Update Trainer documentation (r…

…ay-project#27481) (ray-project#27644)
ericl · Aug 8, 2022 · 39e63cf · 39e63cf
1 parent e520e04
commit 39e63cf
Show file tree

Hide file tree

Showing 31 changed files with 908 additions and 383 deletions.
diff --git a/doc/source/_toc.yml b/doc/source/_toc.yml
@@ -66,11 +66,16 @@ parts:
       - file: train/train
         title: Ray Train
         sections:
-          - file: train/user_guide
-          - file: train/gbdt
-          - file: train/examples
+          - file: train/getting-started
+          - file: train/key-concepts
+          - file: train/user-guides
+            sections:
+              - file: train/config_guide
+              - file: train/dl_guide
+              - file: train/gbdt
+              - file: train/architecture
           - file: train/faq
-          - file: train/architecture
+          - file: train/examples
           - file: train/api
 
       - file: tune/index

diff --git a/doc/source/ray-air/examples/xgboost_starter.py b/doc/source/ray-air/examples/xgboost_starter.py
@@ -39,6 +39,7 @@
     params={
         # XGBoost specific params
         "objective": "binary:logistic",
+        # "tree_method": "gpu_hist",  # uncomment this to use GPUs.
         "eval_metric": ["logloss", "error"],
     },
     datasets={"train": train_dataset, "valid": valid_dataset},

diff --git a/doc/source/ray-air/getting-started.rst b/doc/source/ray-air/getting-started.rst
@@ -22,6 +22,7 @@ Get started by installing Ray AIR:
 .. code:: bash
 
     pip install -U "ray[air]"
+
     # The below Ray AIR tutorial was written with the following libraries.
     # Consider running the following to ensure that the code below runs properly:
     pip install -U pandas>=1.3.5
@@ -30,7 +31,6 @@ Get started by installing Ray AIR:
     pip install -U tensorflow>=2.6.2
     pip install -U pyarrow>=6.0.1
 
-
 Quick Start
 -----------
 

diff --git a/doc/source/ray-air/package-ref.rst b/doc/source/ray-air/package-ref.rst
@@ -146,6 +146,8 @@ TensorFlow
     :members:
     :show-inheritance:
 
+.. _air-pytorch-ref:
+
 PyTorch
 #######
 
@@ -174,6 +176,14 @@ Scikit-Learn
     :members:
     :show-inheritance:
 
+
+Reinforcement Learning (RLlib)
+##############################
+
+.. automodule:: ray.train.rl
+    :members:
+    :show-inheritance:
+
 .. _air-builtin-callbacks:
 
 Monitoring Integrations

diff --git a/doc/source/ray-air/trainer.rst b/doc/source/ray-air/trainer.rst
@@ -1,7 +1,7 @@
 .. _air-trainers:
 
-Ray AIR Trainers
-================
+Using Trainers for Distributed Training
+=======================================
 
 .. https://docs.google.com/drawings/d/1anmT0JVFH9abR5wX5_WcxNHJh6jWeDL49zWxGpkfORA/edit
 
@@ -32,7 +32,7 @@ construct a Trainer, you can provide:
 * A collection of :ref:`datasets <air-ingest>` and a :ref:`preprocessor <air-preprocessors>` for the provided datasets, which configures preprocessing and the datasets to ingest from.
 * ``resume_from_checkpoint``, which is a checkpoint path to resume from, should your training run be interrupted.
 
-After instatiating a Trainer, you can invoke it by calling :meth:`Trainer.fit() <ray.air.Trainer.fit>`.
+After instantiating a Trainer, you can invoke it by calling :meth:`Trainer.fit() <ray.air.trainer.BaseTrainer.fit>`.
 
 .. literalinclude:: doc_code/xgboost_trainer.py
     :language: python
@@ -63,7 +63,7 @@ You can access the data shard within a worker via ``session.get_dataset_shard()`
 to generate batches of Tensorflow or Pytorch tensors.
 You can read more about :ref:`data ingest <air-ingest>` here.
 
-Read more about :ref:`Ray Train's Deep Learning Trainers <train-user-guide>`.
+Read more about :ref:`Ray Train's Deep Learning Trainers <train-dl-guide>`.
 
 .. dropdown:: Code examples
 
@@ -110,7 +110,7 @@ Ray Train offers 2 main tree-based trainers:
 :class:`XGBoostTrainer <ray.train.xgboost.XGBoostTrainer>` and
 :class:`LightGBMTrainer <ray.train.lightgbm.LightGBMTrainer>`.
 
-See :ref:`here for a more detailed user-guide <air-trainers-gbdt-user-guide>`.
+See :ref:`here for a more detailed user-guide <train-gbdt-guide>`.
 
 
 XGBoost Trainer

diff --git a/doc/source/ray-air/tuner.rst b/doc/source/ray-air/tuner.rst
@@ -45,6 +45,8 @@ Below, we demonstrate how you can use a Trainer object with a Tuner.
     :end-before: __basic_end__
 
 
+.. _air-tuner-search-space:
+
 How to configure a search space?
 --------------------------------
 
@@ -54,7 +56,7 @@ from which hyperparameter configurations will be sampled.
 Depending on the model and dataset, you may want to tune:
 
 - The training batch size
-- The learning rate for SGD-based training (e.g., image classification)
+- The learning rate for deep learning training (e.g., image classification)
 - The maximum depth for tree-based models (e.g., XGBoost)
 
 The following shows some example code on how to specify the ``param_space``.

diff --git a/doc/source/train/api.rst b/doc/source/train/api.rst
@@ -52,121 +52,6 @@ BackendConfig
 .. autoclass:: ray.train.backend.BackendConfig
 
 
-.. _train-api-func-utils:
-
-Training Function Utilities
----------------------------
-
-train.report
-~~~~~~~~~~~~
-
-.. autofunction::  ray.train.report
-
-train.load_checkpoint
-~~~~~~~~~~~~~~~~~~~~~
-
-.. autofunction::  ray.train.load_checkpoint
-
-train.save_checkpoint
-~~~~~~~~~~~~~~~~~~~~~
-
-.. autofunction::  ray.train.save_checkpoint
-
-train.get_dataset_shard
-~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autofunction::  ray.train.get_dataset_shard
-
-train.world_rank
-~~~~~~~~~~~~~~~~
-
-.. autofunction::  ray.train.world_rank
-
-train.local_rank
-~~~~~~~~~~~~~~~~
-
-.. autofunction:: ray.train.local_rank
-
-train.world_size
-~~~~~~~~~~~~~~~~
-
-.. autofunction:: ray.train.world_size
-
-.. _train-api-torch-utils:
-
-PyTorch Training Function Utilities
------------------------------------
-
-.. _train-api-torch-prepare-model:
-
-train.torch.prepare_model
-~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autofunction:: ray.train.torch.prepare_model
-    :noindex:
-
-.. _train-api-torch-prepare-data-loader:
-
-train.torch.prepare_data_loader
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autofunction:: ray.train.torch.prepare_data_loader
-    :noindex:
-
-train.torch.prepare_optimizer
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autofunction:: ray.train.torch.prepare_optimizer
-    :noindex:
-
-
-train.torch.backward
-~~~~~~~~~~~~~~~~~~~~
-
-.. autofunction:: ray.train.torch.backward
-    :noindex:
-
-.. _train-api-torch-get-device:
-
-train.torch.get_device
-~~~~~~~~~~~~~~~~~~~~~~
-
-.. autofunction:: ray.train.torch.get_device
-    :noindex:
-
-train.torch.enable_reproducibility
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autofunction:: ray.train.torch.enable_reproducibility
-    :noindex:
-
-.. _train-api-torch-worker-profiler:
-
-train.torch.accelerate
-~~~~~~~~~~~~~~~~~~~~~~
-
-.. autofunction:: ray.train.torch.accelerate
-    :noindex:
-
-train.torch.TorchWorkerProfiler
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autoclass:: ray.train.torch.TorchWorkerProfiler
-    :members:
-    :noindex:
-
-.. _train-api-tensorflow-utils:
-
-TensorFlow Training Function Utilities
---------------------------------------
-
-train.tensorflow.prepare_dataset_shard
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. autofunction:: ray.train.tensorflow.prepare_dataset_shard
-    :noindex:
-
-
 Deprecated APIs
 ---------------
 

diff --git a/doc/source/train/config_guide.rst b/doc/source/train/config_guide.rst
@@ -0,0 +1,87 @@
+.. _train-config:
+
+Configurations User Guide
+=========================
+
+The following overviews how to configure scale-out, run options, and fault-tolerance for Train.
+For more details on how to configure data ingest, also refer to :ref:`air-ingest`.
+
+Scaling configuration (``ScalingConfig``)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The scaling configuration specifies distributed training properties like the number of workers or the
+resources per worker.
+
+The properties of the scaling configuration are :ref:`tunable <air-tuner-search-space>`.
+
+:class:`ScalingConfig API reference <ray.air.config.ScalingConfig>`
+
+.. literalinclude:: doc_code/key_concepts.py
+    :language: python
+    :start-after: __scaling_config_start__
+    :end-before: __scaling_config_end__
+
+
+Run configuration (``RunConfig``)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The run configuration specifies distributed training properties like the number of workers or the
+resources per worker.
+
+The properties of the run configuration are :ref:`not tunable <air-tuner-search-space>`.
+
+:class:`RunConfig API reference <ray.air.config.RunConfig>`
+
+.. literalinclude:: doc_code/key_concepts.py
+    :language: python
+    :start-after: __run_config_start__
+    :end-before: __run_config_end__
+
+Failure configuration (``FailureConfig``)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The failure configuration specifies how training failures should be dealt with.
+
+As part of the RunConfig, the properties of the failure configuration
+are :ref:`not tunable <air-tuner-search-space>`.
+
+:class:`FailureConfig API reference <ray.air.config.FailureConfig>`
+
+.. literalinclude:: doc_code/key_concepts.py
+    :language: python
+    :start-after: __failure_config_start__
+    :end-before: __failure_config_end__
+
+Sync configuration (``SyncConfig``)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The sync configuration specifies how to synchronize checkpoints between the
+Ray cluster and remote storage.
+
+As part of the RunConfig, the properties of the sync configuration
+are :ref:`not tunable <air-tuner-search-space>`.
+
+:class:`SyncConfig API reference <ray.tune.syncer.SyncConfig>`
+
+.. literalinclude:: doc_code/key_concepts.py
+    :language: python
+    :start-after: __sync_config_start__
+    :end-before: __sync_config_end__
+
+
+Checkpoint configuration (``CheckpointConfig``)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The checkpoint configuration specifies how often to checkpoint training state
+and how many checkpoints to keep.
+
+As part of the RunConfig, the properties of the checkpoint configuration
+are :ref:`not tunable <air-tuner-search-space>`.
+
+:class:`CheckpointConfig API reference <ray.air.config.CheckpointConfig>`
+
+.. literalinclude:: doc_code/key_concepts.py
+    :language: python
+    :start-after: __checkpoint_config_start__
+    :end-before: __checkpoint_config_end__
+