Skip to content

Commit

Permalink
[docs/air] Fix up some minor docstrings (ray-project#28361)
Browse files Browse the repository at this point in the history
  • Loading branch information
richardliaw authored and justinvyu committed Sep 14, 2022
1 parent 52036e7 commit d5db148
Show file tree
Hide file tree
Showing 9 changed files with 68 additions and 67 deletions.
4 changes: 2 additions & 2 deletions doc/source/ray-overview/ray-libraries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Dask |dask|

Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love. Dask uses existing Python APIs and data structures to make it easy to switch between Numpy, Pandas, Scikit-learn to their Dask-powered equivalents.

[`Link to integration <../data/dask-on-ray.html>`__]
[:ref:`Link to integration <dask-on-ray>`]

Flambe |flambe|
---------------
Expand Down Expand Up @@ -74,7 +74,7 @@ MARS |mars|

Mars is a tensor-based unified framework for large-scale data computation which scales Numpy, Pandas and Scikit-learn. Mars can scale in to a single machine, and scale out to a cluster with thousands of machines.

[`Link to integration <../data/mars-on-ray.html>`__]
[:ref:`Link to integration <mars-on-ray>`]

Modin |modin|
-------------
Expand Down
4 changes: 2 additions & 2 deletions doc/source/rllib/rllib-training.rst
Original file line number Diff line number Diff line change
Expand Up @@ -748,9 +748,9 @@ Here is an example of the basic usage (for a more complete example, see `custom_
.. note::

It's recommended that you run RLlib algorithms with :doc:`Tune <../tune/index>`, for easy experiment management and visualization of results. Just set ``"run": ALG_NAME, "env": ENV_NAME`` in the experiment config.
It's recommended that you run RLlib algorithms with :ref:`Ray Tune <tune-main>`, for easy experiment management and visualization of results. Just set ``"run": ALG_NAME, "env": ENV_NAME`` in the experiment config.

All RLlib algorithms are compatible with the :ref:`Tune API <tune-60-seconds>`. This enables them to be easily used in experiments with :doc:`Tune <../tune/index>`. For example, the following code performs a simple hyperparam sweep of PPO:
All RLlib algorithms are compatible with the :ref:`Tune API <tune-60-seconds>`. This enables them to be easily used in experiments with :ref:`Ray Tune <tune-main>`. For example, the following code performs a simple hyperparam sweep of PPO:

.. code-block:: python
Expand Down
2 changes: 1 addition & 1 deletion doc/source/tune/examples/tune-pytorch-lightning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -488,7 +488,7 @@
"id": "ca050dfa",
"metadata": {},
"source": [
"You can also specify {doc}`fractional GPUs for Tune <../../ray-core/tasks/using-ray-with-gpus>`,\n",
"You can also specify {ref}`fractional GPUs for Tune <tune-parallelism>`,\n",
"allowing multiple trials to share GPUs and thus increase concurrency under resource constraints.\n",
"While the `gpus_per_trial` passed into\n",
"Tune is a decimal value, the `gpus` passed into the `pl.Trainer` should still be an integer.\n",
Expand Down
5 changes: 4 additions & 1 deletion python/ray/air/checkpoint.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,8 @@ class Checkpoint:
be used to create checkpoint objects
(e.g. ``Checkpoint.from_directory()``).
*Other implementation notes:*
**Other implementation notes:**
When converting between different checkpoint formats, it is guaranteed
that a full round trip of conversions (e.g. directory --> dict -->
obj ref --> directory) will recover the original checkpoint data.
Expand Down Expand Up @@ -488,6 +489,8 @@ def as_directory(self) -> Iterator[str]:
Example:
.. code-block:: python
with checkpoint.as_directory() as checkpoint_dir:
# Do some read-only processing of files within checkpoint_dir
pass
Expand Down
66 changes: 31 additions & 35 deletions python/ray/air/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -273,52 +273,48 @@ class DatasetConfig:
``datasets`` argument. Users have the opportunity to selectively override these
configs by passing the ``dataset_config`` argument. Trainers can also define user
customizable values (e.g., XGBoostTrainer doesn't support streaming ingest).
Args:
fit: Whether to fit preprocessors on this dataset. This can be set on at most
one dataset at a time. True by default for the "train" dataset only.
split: Whether the dataset should be split across multiple workers.
True by default for the "train" dataset only.
required: Whether to raise an error if the Dataset isn't provided by the user.
False by default.
transform: Whether to transform the dataset with the fitted preprocessor.
This must be enabled at least for the dataset that is fit.
True by default.
use_stream_api: Whether the dataset should be streamed into memory using
pipelined reads. When enabled, get_dataset_shard() returns DatasetPipeline
instead of Dataset. The amount of memory to use is controlled
by `stream_window_size`. False by default.
stream_window_size: Configure the streaming window size in bytes.
A good value is something like 20% of object store memory.
If set to -1, then an infinite window size will be used (similar to
bulk ingest). This only has an effect if use_stream_api is set.
Set to 1.0 GiB by default.
global_shuffle: Whether to enable global shuffle (per pipeline window
in streaming mode). Note that this is an expensive all-to-all operation,
and most likely you want to use local shuffle instead.
See https://docs.ray.io/en/master/data/faq.html and
https://docs.ray.io/en/master/ray-air/check-ingest.html.
False by default.
randomize_block_order: Whether to randomize the iteration order over blocks.
The main purpose of this is to prevent data fetching hotspots in the
cluster when running many parallel workers / trials on the same data.
We recommend enabling it always. True by default.
"""

# TODO(ekl) could we unify DataParallelTrainer and Trainer so the same data ingest
# strategy applies to all Trainers?

# Whether to fit preprocessors on this dataset. This can be set on at most one
# dataset at a time.
# True by default for the "train" dataset only.
fit: Optional[bool] = None

# Whether the dataset should be split across multiple workers.
# True by default for the "train" dataset only.
split: Optional[bool] = None

# Whether to raise an error if the Dataset isn't provided by the user.
# False by default.
required: Optional[bool] = None

# Whether to transform the dataset with the fitted preprocessor. This must be
# enabled at least for the dataset that is fit.
# True by default.
transform: Optional[bool] = None

# Whether the dataset should be streamed into memory using pipelined reads.
# When enabled, get_dataset_shard() returns DatasetPipeline instead of Dataset.
# The amount of memory to use is controlled by `stream_window_size`.
# False by default.
use_stream_api: Optional[bool] = None

# Configure the streaming window size in bytes. A good value is something like
# 20% of object store memory. If set to -1, then an infinite window size will be
# used (similar to bulk ingest). This only has an effect if use_stream_api is set.
# Set to 1.0 GiB by default.
stream_window_size: Optional[float] = None

# Whether to enable global shuffle (per pipeline window in streaming mode). Note
# that this is an expensive all-to-all operation, and most likely you want to use
# local shuffle instead. See https://docs.ray.io/en/master/data/faq.html and
# https://docs.ray.io/en/master/air/check-ingest.html.
# False by default.
global_shuffle: Optional[bool] = None

# Whether to randomize the iteration order over blocks. The main purpose of this
# is to prevent data fetching hotspots in the cluster when running many parallel
# workers / trials on the same data. We recommend enabling it always.
# True by default.
randomize_block_order: Optional[bool] = None

def __repr__(self):
Expand Down Expand Up @@ -353,7 +349,7 @@ def merge(
"""Merge two given DatasetConfigs, the second taking precedence.
Raises:
ValueError if validation fails on the merged configs.
ValueError: if validation fails on the merged configs.
"""
has_wildcard = WILDCARD_KEY in a
result = a.copy()
Expand Down
8 changes: 4 additions & 4 deletions python/ray/train/base_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,10 @@ class BaseTrainer(abc.ABC):
Note: The base ``BaseTrainer`` class cannot be instantiated directly. Only
one of its subclasses can be used.
How does a trainer work?
**How does a trainer work?**
- First, initialize the Trainer. The initialization runs locally,
so heavyweight setup should not be done in __init__.
so heavyweight setup should not be done in ``__init__``.
- Then, when you call ``trainer.fit()``, the Trainer is serialized
and copied to a remote Ray actor. The following methods are then
called in sequence on the remote actor.
Expand Down Expand Up @@ -301,7 +301,7 @@ def preprocess_datasets(self) -> None:
def training_loop(self) -> None:
"""Loop called by fit() to run training and report results to Tune.
Note: this method runs on a remote process.
.. note:: This method runs on a remote process.
``self.datasets`` have already been preprocessed by ``self.preprocessor``.
Expand All @@ -311,7 +311,7 @@ def training_loop(self) -> None:
Example:
.. code-block: python
.. code-block:: python
from ray.train.trainer import BaseTrainer
Expand Down
4 changes: 2 additions & 2 deletions python/ray/train/gbdt_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,9 +68,9 @@ def _convert_scaling_config_to_ray_params(

@DeveloperAPI
class GBDTTrainer(BaseTrainer):
"""Common logic for gradient-boosting decision tree (GBDT) frameworks
like XGBoost-Ray and LightGBM-Ray.
"""Abstract class for scaling gradient-boosting decision tree (GBDT) frameworks.
Inherited by XGBoostTrainer and LightGBMTrainer.
Args:
datasets: Ray Datasets to use for training and validation. Must include a
Expand Down
39 changes: 20 additions & 19 deletions python/ray/train/predictor.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,9 @@ class PredictorNotSerializableException(RuntimeError):
class Predictor(abc.ABC):
"""Predictors load models from checkpoints to perform inference.
Note: The base ``Predictor`` class cannot be instantiated directly. Only one of
its subclasses can be used.
.. note::
The base ``Predictor`` class cannot be instantiated directly. Only one of
its subclasses can be used.
**How does a Predictor work?**
Expand All @@ -50,27 +51,27 @@ class Predictor(abc.ABC):
When the ``predict`` method is called the following occurs:
- The input batch is converted into a pandas DataFrame. Tensor input (like a
``np.ndarray``) will be converted into a single column Pandas Dataframe.
- If there is a :ref:`Preprocessor <air-preprocessor-ref>` saved in the provided
:ref:`Checkpoint <air-checkpoint-ref>`, the preprocessor will be used to
transform the DataFrame.
- The transformed DataFrame will be passed to the model for inference (via the
``predictor._predict_pandas`` method).
- The predictions will be outputted by ``predict`` in the same type as the
original input.
- The input batch is converted into a pandas DataFrame. Tensor input (like a
``np.ndarray``) will be converted into a single column Pandas Dataframe.
- If there is a :ref:`Preprocessor <air-preprocessor-ref>` saved in the provided
:ref:`Checkpoint <air-checkpoint-ref>`, the preprocessor will be used to
transform the DataFrame.
- The transformed DataFrame will be passed to the model for inference (via the
``predictor._predict_pandas`` method).
- The predictions will be outputted by ``predict`` in the same type as the
original input.
**How do I create a new Predictor?**
To implement a new Predictor for your particular framework, you should subclass
the base ``Predictor`` and implement the following two methods:
1. ``_predict_pandas``: Given a pandas.DataFrame input, return a
pandas.DataFrame containing predictions.
2. ``from_checkpoint``: Logic for creating a Predictor from an
:ref:`AIR Checkpoint <air-checkpoint-ref>`.
3. Optionally ``_predict_arrow`` for better performance when working with
tensor data to avoid extra copies from Pandas conversions.
1. ``_predict_pandas``: Given a pandas.DataFrame input, return a
pandas.DataFrame containing predictions.
2. ``from_checkpoint``: Logic for creating a Predictor from an
:ref:`AIR Checkpoint <air-checkpoint-ref>`.
3. Optionally ``_predict_arrow`` for better performance when working with
tensor data to avoid extra copies from Pandas conversions.
"""

def __init__(self, preprocessor: Optional[Preprocessor] = None):
Expand Down Expand Up @@ -141,8 +142,8 @@ def predict(self, data: DataBatchType, **kwargs) -> DataBatchType:
directly to ``_predict_pandas``.
Returns:
DataBatchType: Prediction result. The return type will be the same as the
input type.
DataBatchType:
Prediction result. The return type will be the same as the input type.
"""
data_df = convert_batch_type_to_pandas(data, self._cast_tensor_columns)

Expand Down
3 changes: 2 additions & 1 deletion python/ray/tune/tuner.py
Original file line number Diff line number Diff line change
Expand Up @@ -226,7 +226,8 @@ def fit(self) -> ResultGrid:
to resume.
Raises:
RayTaskError when the exception happens in trainable else TuneError.
RayTaskError: If user-provided trainable raises an exception
TuneError: General Ray Tune error.
"""

if not self._is_ray_client:
Expand Down

0 comments on commit d5db148

Please sign in to comment.