Skip to content

Commit

Permalink
[docs] Improve the "Why Ray" and "Why AIR" sections of the docs (#27480)
Browse files Browse the repository at this point in the history
  • Loading branch information
ericl authored and richardliaw committed Aug 8, 2022
1 parent 025c927 commit 1750760
Show file tree
Hide file tree
Showing 12 changed files with 95 additions and 75 deletions.
12 changes: 11 additions & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Or more about `Ray Core`_ and its key abstractions:
- `Actors`_: Stateful worker processes created in the cluster.
- `Objects`_: Immutable values accessible across the cluster.

Ray runs on any machine, cluster, cloud provider, and Kubernetes, and also features a growing
Ray runs on any machine, cluster, cloud provider, and Kubernetes, and features a growing
`ecosystem of community integrations`_.

Install Ray with: ``pip install ray``. For nightly wheels, see the
Expand All @@ -49,6 +49,16 @@ Install Ray with: ``pip install ray``. For nightly wheels, see the
.. _`RLlib`: https://docs.ray.io/en/latest/rllib/index.html
.. _`ecosystem of community integrations`: https://docs.ray.io/en/latest/ray-overview/ray-libraries.html


Why Ray?
--------

Today's ML workloads are increasingly compute-intensive. As convenient as they are, single-node development environments such as your laptop cannot scale to meet these demands.

Ray is a unified way to scale Python and AI applications from a laptop to a cluster.

With Ray, you can seamlessly scale the same code from a laptop to a cluster. Ray is designed to be general-purpose, meaning that it can performantly run any kind of workload. If your application is written in Python, you can scale it with Ray, no other infrastructure required.

More Information
----------------

Expand Down
2 changes: 1 addition & 1 deletion doc/source/data/dataset-tensor-support.rst
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ Because Tensor datasets rely on Datasets-specific extension types, they can only
.. _disable_tensor_extension_casting:

Disabling Tensor Extension Casting
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
----------------------------------

To disable automatic casting of Pandas and Arrow arrays to
:class:`TensorArray <ray.data.extensions.tensor_extension.TensorArray>`, run the code
Expand Down
10 changes: 9 additions & 1 deletion doc/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,9 +103,17 @@ Or more about [Ray Core](ray-core/walkthrough) and its key abstractions:
- [Actors](ray-core/actors): Stateful worker processes created in the cluster.
- [Objects](ray-core/objects): Immutable values accessible across the cluster.

Ray runs on any machine, cluster, cloud provider, and Kubernetes, and also features a growing
Ray runs on any machine, cluster, cloud provider, and Kubernetes, and features a growing
[ecosystem of community integrations](ray-overview/ray-libraries).

## Why Ray?

Today's ML workloads are increasingly compute-intensive. As convenient as they are, single-node development environments such as your laptop cannot scale to meet these demands.

Ray is a unified way to scale Python and AI applications from a laptop to a cluster.

With Ray, you can seamlessly scale the same code from a laptop to a cluster. Ray is designed to be general-purpose, meaning that it can performantly run any kind of workload. If your application is written in Python, you can scale it with Ray, no other infrastructure required.

## How to get involved?

Ray is more than a framework for distributed applications but also an active community of developers, researchers, and folks that love machine learning.
Expand Down
13 changes: 4 additions & 9 deletions doc/source/ray-air/examples/pytorch_tabular_starter.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@

# __air_generic_preprocess_start__
import ray
from ray.data.preprocessors import StandardScaler

# Load data.
dataset = ray.data.read_csv("s3://anonymous@air-example-data/breast_cancer.csv")
Expand All @@ -15,21 +14,16 @@
test_dataset = valid_dataset.map_batches(
lambda df: df.drop("target", axis=1), batch_format="pandas"
)

# Create a preprocessor to scale some columns
columns_to_scale = ["mean radius", "mean texture"]
preprocessor = StandardScaler(columns=columns_to_scale)
# __air_generic_preprocess_end__

# __air_pytorch_preprocess_start__
import numpy as np
import pandas as pd

from ray.data.preprocessors import Concatenator, Chain
from ray.data.preprocessors import Concatenator, Chain, StandardScaler

# Chain the preprocessors together.
# Create a preprocessor to scale some columns and concatenate the result.
preprocessor = Chain(
preprocessor,
StandardScaler(columns=["mean radius", "mean texture"]),
Concatenator(exclude=["target"], dtype=np.float32),
)
# __air_pytorch_preprocess_end__
Expand Down Expand Up @@ -161,4 +155,5 @@ def to_tensor_iterator(dataset, batch_size):
predicted_probabilities.show()
# {'predictions': array([1.], dtype=float32)}
# {'predictions': array([0.], dtype=float32)}
# ...
# __air_pytorch_batchpred_end__
15 changes: 4 additions & 11 deletions doc/source/ray-air/examples/tf_tabular_starter.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,8 @@

# __air_generic_preprocess_start__
import ray
from ray.data.preprocessors import StandardScaler
from ray.air.config import ScalingConfig


# Load data.
dataset = ray.data.read_csv("s3://anonymous@air-example-data/breast_cancer.csv")

# Split data into train and validation.
Expand All @@ -16,21 +14,16 @@
test_dataset = valid_dataset.map_batches(
lambda df: df.drop("target", axis=1), batch_format="pandas"
)

# Create a preprocessor to scale some columns
columns_to_scale = ["mean radius", "mean texture"]
preprocessor = StandardScaler(columns=columns_to_scale)
# __air_generic_preprocess_end__

# __air_tf_preprocess_start__
import numpy as np
import pandas as pd

from ray.data.preprocessors import Concatenator, Chain
from ray.data.preprocessors import Concatenator, Chain, StandardScaler

# Chain the preprocessors together.
# Create a preprocessor to scale some columns and concatenate the result.
preprocessor = Chain(
preprocessor,
StandardScaler(columns=["mean radius", "mean texture"]),
Concatenator(exclude=["target"], dtype=np.float32),
)
# __air_tf_preprocess_end__
Expand Down
15 changes: 8 additions & 7 deletions doc/source/ray-air/examples/xgboost_starter.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@

# __air_generic_preprocess_start__
import ray
from ray.data.preprocessors import StandardScaler
from ray.air.config import ScalingConfig

# Load data.
dataset = ray.data.read_csv("s3://anonymous@air-example-data/breast_cancer.csv")
Expand All @@ -16,16 +14,18 @@
test_dataset = valid_dataset.map_batches(
lambda df: df.drop("target", axis=1), batch_format="pandas"
)

# Create a preprocessor to scale some columns
columns_to_scale = ["mean radius", "mean texture"]
preprocessor = StandardScaler(columns=columns_to_scale)
# __air_generic_preprocess_end__

# __air_xgb_preprocess_start__
# Create a preprocessor to scale some columns.
from ray.data.preprocessors import StandardScaler

preprocessor = StandardScaler(columns=["mean radius", "mean texture"])
# __air_xgb_preprocess_end__

# __air_xgb_train_start__
from ray.train.xgboost import XGBoostTrainer
from ray.air.config import ScalingConfig
from ray.train.xgboost import XGBoostTrainer

trainer = XGBoostTrainer(
scaling_config=ScalingConfig(
Expand Down Expand Up @@ -84,4 +84,5 @@
# {'predictions': 0.9970690608024597}
# {'predictions': 0.9943051934242249}
# {'predictions': 0.00334902573376894}
# ...
# __air_xgb_batchpred_end__
98 changes: 54 additions & 44 deletions doc/source/ray-air/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,54 @@ Ray AI Runtime (AIR)

Ray AI Runtime (AIR) is a scalable and unified toolkit for ML applications. AIR enables easy scaling of individual workloads, end-to-end workflows, and popular ecosystem frameworks, all in just Python.

..
https://docs.google.com/drawings/d/1atB1dLjZIi8ibJ2-CoHdd3Zzyl_hDRWyK2CJAVBBLdU/edit
.. image:: images/ray-air.svg

AIR comes with ready-to-use libraries for :ref:`Preprocessing <datasets>`, :ref:`Training <train-docs>`, :ref:`Tuning <tune-main>`, :ref:`Scoring <air-predictors>`, :ref:`Serving <rayserve>`, and :ref:`Reinforcement Learning <rllib-index>`, as well as an ecosystem of integrations.

Ray AIR focuses on the compute aspects of ML:
* It provides scalability by leveraging Ray’s distributed compute layer for ML workloads.
* It is designed to interoperate with other systems for storage and metadata needs.
Why AIR?
--------

Ray AIR aims to simplify the ecosystem of machine learning frameworks, platforms, and tools. It does this by leveraging Ray to provide a seamless, unified, and open experience for scalable ML:

.. image:: images/why-air-2.svg

..
https://docs.google.com/drawings/d/1oi_JwNHXVgtR_9iTdbecquesUd4hOk0dWgHaTaFj6gk/edit
**1. Seamless Dev to Prod**: AIR reduces friction going from development to production. With Ray and AIR, the same Python code scales seamlessly from a laptop to a large cluster.

**2. Unified ML API**: AIR's unified ML API enables swapping between popular frameworks, such as XGBoost, PyTorch, and HuggingFace, with just a single class change in your code.

**3. Open and Extensible**: AIR and Ray are fully open-source and can run on any cluster, cloud, or Kubernetes. Build custom components and integrations on top of scalable developer APIs.

When to use AIR?
----------------

AIR is for both data scientists and ML engineers alike.

.. image:: images/when-air.svg

..
https://docs.google.com/drawings/d/1Qw_h457v921jWQkx63tmKAsOsJ-qemhwhCZvhkxWrWo/edit
For data scientists, AIR can be used to scale individual workloads, and also end-to-end ML applications. For ML Engineers, AIR provides scalable platform abstractions that can be used to easily onboard and integrate tooling from the broader ML ecosystem.

Quick Start
-----------

Below, we walk through how AIR's unified ML API enables scaling of end-to-end ML workflows, focusing on
a few of the popular frameworks AIR integrates with (XGBoost, Pytorch, and Tensorflow). The ML workflow we're going to build is summarized by the following diagram:

..
https://docs.google.com/drawings/d/1z0r_Yc7-0NAPVsP2jWUkLV2jHVHdcJHdt9uN1GDANSY/edit
.. figure:: images/why-air.svg

AIR provides a unified API for the ML ecosystem.
This diagram shows how AIR enables an ecosystem of libraries to be run at scale in just a few lines of code.

Get started by installing Ray AIR:

Expand All @@ -31,30 +72,24 @@ Get started by installing Ray AIR:
pip install -U tensorflow>=2.6.2
pip install -U pyarrow>=6.0.1
Quick Start
-----------

Below, we demonstrate how AIR enables simple scaling of end-to-end ML workflows, focusing on
a few of the popular frameworks AIR integrates with (XGBoost, Pytorch, and Tensorflow):

Preprocessing
~~~~~~~~~~~~~

Below, let's start by preprocessing your data with Ray AIR's ``Preprocessors``:
First, let's start by loading a dataset from storage:

.. literalinclude:: examples/xgboost_starter.py
:language: python
:start-after: __air_generic_preprocess_start__
:end-before: __air_generic_preprocess_end__

If using Tensorflow or Pytorch, format your data for use with your training framework:
Then, we define a ``Preprocessor`` pipeline for our task:

.. tabbed:: XGBoost

.. code-block:: python
# No extra preprocessing is required for XGBoost.
# The data is already in the correct format.
.. literalinclude:: examples/xgboost_starter.py
:language: python
:start-after: __air_xgb_preprocess_start__
:end-before: __air_xgb_preprocess_end__

.. tabbed:: Pytorch

Expand Down Expand Up @@ -155,38 +190,13 @@ Use the trained model for scalable batch prediction with a ``BatchPredictor``.
:start-after: __air_tf_batchpred_start__
:end-before: __air_tf_batchpred_end__

Why Ray AIR?
------------

Ray AIR aims to simplify the ecosystem of machine learning frameworks, platforms, and tools. It does this by taking a scalable, single-system approach to ML infrastructure (i.e., leveraging Ray as a unified compute framework):
Project Status
--------------

**1. Seamless Dev to Prod**: AIR reduces friction going from development to production. Traditional orchestration approaches introduce separate systems and operational overheads. With Ray and AIR, the same Python code scales seamlessly from a laptop to a large cluster.

**2. Unified API**: Want to switch between frameworks like XGBoost and PyTorch, or try out a new library like HuggingFace? Thanks to the flexibility of AIR, you can do this by just swapping out a single class, without needing to set up new systems or change other aspects of your workflow.

**3. Open and Evolvable**: Ray core and libraries are fully open-source and can run on any cluster, cloud, or Kubernetes, reducing the costs of platform lock-in. Want to go out of the box? Run any framework you want using AIR's integration APIs, or build advanced use cases directly on Ray core.

.. figure:: images/why-air.png

AIR enables a single-system / single-script approach to scaling ML. Ray's
distributed Python APIs enable scaling of ML workloads without the burden of
setting up or orchestrating separate distributed systems.

AIR is for both data scientists and ML engineers. Consider using AIR when you want to:
* Scale a single workload.
* Scale end-to-end ML applications.
* Build a custom ML platform for your organization.

AIR Ecosystem
-------------

AIR comes with built-in integrations with the most popular ecosystem libraries. The following diagram provides an overview of the AIR libraries, ecosystem integrations, and their readiness.
AIR's developer APIs also enable *custom integrations* to be easily created.

..
https://docs.google.com/drawings/d/1pZkRrkAbRD8jM-xlGlAaVo3T66oBQ_HpsCzomMT7OIc/edit
AIR is currently in **beta**. If you have questions for the team or are interested in getting involved in the development process, fill out `this short form <https://forms.gle/wCCdbaQDtgErYycT6>`__.

.. image:: images/air-ecosystem.svg
For an overview of the AIR libraries, ecosystem integrations, and their readiness, check out the latest `AIR ecosystem map <https://docs.ray.io/en/master/_images/air-ecosystem.svg>`_.

Next Steps
----------
Expand Down
2 changes: 1 addition & 1 deletion doc/source/ray-air/images/ray-air.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions doc/source/ray-air/images/when-air.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions doc/source/ray-air/images/why-air-2.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed doc/source/ray-air/images/why-air.png
Binary file not shown.
1 change: 1 addition & 0 deletions doc/source/ray-air/images/why-air.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 1750760

Please sign in to comment.