Skip to content

Commit

Permalink
[cherry-pick][releases/2.0.0][air/docs] Update Trainer documentation (r…
Browse files Browse the repository at this point in the history
  • Loading branch information
richardliaw authored Aug 8, 2022
1 parent e520e04 commit 39e63cf
Show file tree
Hide file tree
Showing 31 changed files with 908 additions and 383 deletions.
13 changes: 9 additions & 4 deletions doc/source/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,11 +66,16 @@ parts:
- file: train/train
title: Ray Train
sections:
- file: train/user_guide
- file: train/gbdt
- file: train/examples
- file: train/getting-started
- file: train/key-concepts
- file: train/user-guides
sections:
- file: train/config_guide
- file: train/dl_guide
- file: train/gbdt
- file: train/architecture
- file: train/faq
- file: train/architecture
- file: train/examples
- file: train/api

- file: tune/index
Expand Down
1 change: 1 addition & 0 deletions doc/source/ray-air/examples/xgboost_starter.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
params={
# XGBoost specific params
"objective": "binary:logistic",
# "tree_method": "gpu_hist", # uncomment this to use GPUs.
"eval_metric": ["logloss", "error"],
},
datasets={"train": train_dataset, "valid": valid_dataset},
Expand Down
2 changes: 1 addition & 1 deletion doc/source/ray-air/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Get started by installing Ray AIR:
.. code:: bash
pip install -U "ray[air]"
# The below Ray AIR tutorial was written with the following libraries.
# Consider running the following to ensure that the code below runs properly:
pip install -U pandas>=1.3.5
Expand All @@ -30,7 +31,6 @@ Get started by installing Ray AIR:
pip install -U tensorflow>=2.6.2
pip install -U pyarrow>=6.0.1
Quick Start
-----------

Expand Down
10 changes: 10 additions & 0 deletions doc/source/ray-air/package-ref.rst
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,8 @@ TensorFlow
:members:
:show-inheritance:

.. _air-pytorch-ref:

PyTorch
#######

Expand Down Expand Up @@ -174,6 +176,14 @@ Scikit-Learn
:members:
:show-inheritance:


Reinforcement Learning (RLlib)
##############################

.. automodule:: ray.train.rl
:members:
:show-inheritance:

.. _air-builtin-callbacks:

Monitoring Integrations
Expand Down
10 changes: 5 additions & 5 deletions doc/source/ray-air/trainer.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _air-trainers:

Ray AIR Trainers
================
Using Trainers for Distributed Training
=======================================

.. https://docs.google.com/drawings/d/1anmT0JVFH9abR5wX5_WcxNHJh6jWeDL49zWxGpkfORA/edit
Expand Down Expand Up @@ -32,7 +32,7 @@ construct a Trainer, you can provide:
* A collection of :ref:`datasets <air-ingest>` and a :ref:`preprocessor <air-preprocessors>` for the provided datasets, which configures preprocessing and the datasets to ingest from.
* ``resume_from_checkpoint``, which is a checkpoint path to resume from, should your training run be interrupted.

After instatiating a Trainer, you can invoke it by calling :meth:`Trainer.fit() <ray.air.Trainer.fit>`.
After instantiating a Trainer, you can invoke it by calling :meth:`Trainer.fit() <ray.air.trainer.BaseTrainer.fit>`.

.. literalinclude:: doc_code/xgboost_trainer.py
:language: python
Expand Down Expand Up @@ -63,7 +63,7 @@ You can access the data shard within a worker via ``session.get_dataset_shard()`
to generate batches of Tensorflow or Pytorch tensors.
You can read more about :ref:`data ingest <air-ingest>` here.

Read more about :ref:`Ray Train's Deep Learning Trainers <train-user-guide>`.
Read more about :ref:`Ray Train's Deep Learning Trainers <train-dl-guide>`.

.. dropdown:: Code examples

Expand Down Expand Up @@ -110,7 +110,7 @@ Ray Train offers 2 main tree-based trainers:
:class:`XGBoostTrainer <ray.train.xgboost.XGBoostTrainer>` and
:class:`LightGBMTrainer <ray.train.lightgbm.LightGBMTrainer>`.

See :ref:`here for a more detailed user-guide <air-trainers-gbdt-user-guide>`.
See :ref:`here for a more detailed user-guide <train-gbdt-guide>`.


XGBoost Trainer
Expand Down
4 changes: 3 additions & 1 deletion doc/source/ray-air/tuner.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ Below, we demonstrate how you can use a Trainer object with a Tuner.
:end-before: __basic_end__


.. _air-tuner-search-space:

How to configure a search space?
--------------------------------

Expand All @@ -54,7 +56,7 @@ from which hyperparameter configurations will be sampled.
Depending on the model and dataset, you may want to tune:

- The training batch size
- The learning rate for SGD-based training (e.g., image classification)
- The learning rate for deep learning training (e.g., image classification)
- The maximum depth for tree-based models (e.g., XGBoost)

The following shows some example code on how to specify the ``param_space``.
Expand Down
115 changes: 0 additions & 115 deletions doc/source/train/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,121 +52,6 @@ BackendConfig
.. autoclass:: ray.train.backend.BackendConfig


.. _train-api-func-utils:

Training Function Utilities
---------------------------

train.report
~~~~~~~~~~~~

.. autofunction:: ray.train.report

train.load_checkpoint
~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: ray.train.load_checkpoint

train.save_checkpoint
~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: ray.train.save_checkpoint

train.get_dataset_shard
~~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: ray.train.get_dataset_shard

train.world_rank
~~~~~~~~~~~~~~~~

.. autofunction:: ray.train.world_rank

train.local_rank
~~~~~~~~~~~~~~~~

.. autofunction:: ray.train.local_rank

train.world_size
~~~~~~~~~~~~~~~~

.. autofunction:: ray.train.world_size

.. _train-api-torch-utils:

PyTorch Training Function Utilities
-----------------------------------

.. _train-api-torch-prepare-model:

train.torch.prepare_model
~~~~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: ray.train.torch.prepare_model
:noindex:

.. _train-api-torch-prepare-data-loader:

train.torch.prepare_data_loader
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: ray.train.torch.prepare_data_loader
:noindex:

train.torch.prepare_optimizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: ray.train.torch.prepare_optimizer
:noindex:


train.torch.backward
~~~~~~~~~~~~~~~~~~~~

.. autofunction:: ray.train.torch.backward
:noindex:

.. _train-api-torch-get-device:

train.torch.get_device
~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: ray.train.torch.get_device
:noindex:

train.torch.enable_reproducibility
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: ray.train.torch.enable_reproducibility
:noindex:

.. _train-api-torch-worker-profiler:

train.torch.accelerate
~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: ray.train.torch.accelerate
:noindex:

train.torch.TorchWorkerProfiler
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: ray.train.torch.TorchWorkerProfiler
:members:
:noindex:

.. _train-api-tensorflow-utils:

TensorFlow Training Function Utilities
--------------------------------------

train.tensorflow.prepare_dataset_shard
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: ray.train.tensorflow.prepare_dataset_shard
:noindex:


Deprecated APIs
---------------

Expand Down
87 changes: 87 additions & 0 deletions doc/source/train/config_guide.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
.. _train-config:

Configurations User Guide
=========================

The following overviews how to configure scale-out, run options, and fault-tolerance for Train.
For more details on how to configure data ingest, also refer to :ref:`air-ingest`.

Scaling configuration (``ScalingConfig``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The scaling configuration specifies distributed training properties like the number of workers or the
resources per worker.

The properties of the scaling configuration are :ref:`tunable <air-tuner-search-space>`.

:class:`ScalingConfig API reference <ray.air.config.ScalingConfig>`

.. literalinclude:: doc_code/key_concepts.py
:language: python
:start-after: __scaling_config_start__
:end-before: __scaling_config_end__


Run configuration (``RunConfig``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The run configuration specifies distributed training properties like the number of workers or the
resources per worker.

The properties of the run configuration are :ref:`not tunable <air-tuner-search-space>`.

:class:`RunConfig API reference <ray.air.config.RunConfig>`

.. literalinclude:: doc_code/key_concepts.py
:language: python
:start-after: __run_config_start__
:end-before: __run_config_end__

Failure configuration (``FailureConfig``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The failure configuration specifies how training failures should be dealt with.

As part of the RunConfig, the properties of the failure configuration
are :ref:`not tunable <air-tuner-search-space>`.

:class:`FailureConfig API reference <ray.air.config.FailureConfig>`

.. literalinclude:: doc_code/key_concepts.py
:language: python
:start-after: __failure_config_start__
:end-before: __failure_config_end__

Sync configuration (``SyncConfig``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The sync configuration specifies how to synchronize checkpoints between the
Ray cluster and remote storage.

As part of the RunConfig, the properties of the sync configuration
are :ref:`not tunable <air-tuner-search-space>`.

:class:`SyncConfig API reference <ray.tune.syncer.SyncConfig>`

.. literalinclude:: doc_code/key_concepts.py
:language: python
:start-after: __sync_config_start__
:end-before: __sync_config_end__


Checkpoint configuration (``CheckpointConfig``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The checkpoint configuration specifies how often to checkpoint training state
and how many checkpoints to keep.

As part of the RunConfig, the properties of the checkpoint configuration
are :ref:`not tunable <air-tuner-search-space>`.

:class:`CheckpointConfig API reference <ray.air.config.CheckpointConfig>`

.. literalinclude:: doc_code/key_concepts.py
:language: python
:start-after: __checkpoint_config_start__
:end-before: __checkpoint_config_end__

Loading

0 comments on commit 39e63cf

Please sign in to comment.