[docs] Improve the "Why Ray" and "Why AIR" sections of the docs (#27480)

ray-project · Aug 8, 2022 · 1750760 · 1750760
1 parent 025c927
commit 1750760
Show file tree

Hide file tree

Showing 12 changed files with 95 additions and 75 deletions.
diff --git a/README.rst b/README.rst
@@ -35,7 +35,7 @@ Or more about `Ray Core`_ and its key abstractions:
 - `Actors`_: Stateful worker processes created in the cluster.
 - `Objects`_: Immutable values accessible across the cluster.
 
-Ray runs on any machine, cluster, cloud provider, and Kubernetes, and also features a growing
+Ray runs on any machine, cluster, cloud provider, and Kubernetes, and features a growing
 `ecosystem of community integrations`_.
 
 Install Ray with: ``pip install ray``. For nightly wheels, see the
@@ -49,6 +49,16 @@ Install Ray with: ``pip install ray``. For nightly wheels, see the
 .. _`RLlib`: https://docs.ray.io/en/latest/rllib/index.html
 .. _`ecosystem of community integrations`: https://docs.ray.io/en/latest/ray-overview/ray-libraries.html
 
+
+Why Ray?
+--------
+
+Today's ML workloads are increasingly compute-intensive. As convenient as they are, single-node development environments such as your laptop cannot scale to meet these demands.
+
+Ray is a unified way to scale Python and AI applications from a laptop to a cluster.
+
+With Ray, you can seamlessly scale the same code from a laptop to a cluster. Ray is designed to be general-purpose, meaning that it can performantly run any kind of workload. If your application is written in Python, you can scale it with Ray, no other infrastructure required.
+
 More Information
 ----------------
 

diff --git a/doc/source/data/dataset-tensor-support.rst b/doc/source/data/dataset-tensor-support.rst
@@ -194,7 +194,7 @@ Because Tensor datasets rely on Datasets-specific extension types, they can only
 .. _disable_tensor_extension_casting:
 
 Disabling Tensor Extension Casting
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+----------------------------------
 
 To disable automatic casting of Pandas and Arrow arrays to
 :class:`TensorArray <ray.data.extensions.tensor_extension.TensorArray>`, run the code

diff --git a/doc/source/index.md b/doc/source/index.md
@@ -103,9 +103,17 @@ Or more about [Ray Core](ray-core/walkthrough) and its key abstractions:
 - [Actors](ray-core/actors): Stateful worker processes created in the cluster.
 - [Objects](ray-core/objects): Immutable values accessible across the cluster.
 
-Ray runs on any machine, cluster, cloud provider, and Kubernetes, and also features a growing
+Ray runs on any machine, cluster, cloud provider, and Kubernetes, and features a growing
 [ecosystem of community integrations](ray-overview/ray-libraries).
 
+## Why Ray?
+
+Today's ML workloads are increasingly compute-intensive. As convenient as they are, single-node development environments such as your laptop cannot scale to meet these demands.
+
+Ray is a unified way to scale Python and AI applications from a laptop to a cluster.
+
+With Ray, you can seamlessly scale the same code from a laptop to a cluster. Ray is designed to be general-purpose, meaning that it can performantly run any kind of workload. If your application is written in Python, you can scale it with Ray, no other infrastructure required.
+
 ## How to get involved?
 
 Ray is more than a framework for distributed applications but also an active community of developers, researchers, and folks that love machine learning.

diff --git a/doc/source/ray-air/examples/pytorch_tabular_starter.py b/doc/source/ray-air/examples/pytorch_tabular_starter.py
@@ -3,7 +3,6 @@
 
 # __air_generic_preprocess_start__
 import ray
-from ray.data.preprocessors import StandardScaler
 
 # Load data.
 dataset = ray.data.read_csv("s3://anonymous@air-example-data/breast_cancer.csv")
@@ -15,21 +14,16 @@
 test_dataset = valid_dataset.map_batches(
     lambda df: df.drop("target", axis=1), batch_format="pandas"
 )
-
-# Create a preprocessor to scale some columns
-columns_to_scale = ["mean radius", "mean texture"]
-preprocessor = StandardScaler(columns=columns_to_scale)
 # __air_generic_preprocess_end__
 
 # __air_pytorch_preprocess_start__
 import numpy as np
-import pandas as pd
 
-from ray.data.preprocessors import Concatenator, Chain
+from ray.data.preprocessors import Concatenator, Chain, StandardScaler
 
-# Chain the preprocessors together.
+# Create a preprocessor to scale some columns and concatenate the result.
 preprocessor = Chain(
-    preprocessor,
+    StandardScaler(columns=["mean radius", "mean texture"]),
     Concatenator(exclude=["target"], dtype=np.float32),
 )
 # __air_pytorch_preprocess_end__
@@ -161,4 +155,5 @@ def to_tensor_iterator(dataset, batch_size):
 predicted_probabilities.show()
 # {'predictions': array([1.], dtype=float32)}
 # {'predictions': array([0.], dtype=float32)}
+# ...
 # __air_pytorch_batchpred_end__
diff --git a/doc/source/ray-air/examples/tf_tabular_starter.py b/doc/source/ray-air/examples/tf_tabular_starter.py
@@ -3,10 +3,8 @@
 
 # __air_generic_preprocess_start__
 import ray
-from ray.data.preprocessors import StandardScaler
-from ray.air.config import ScalingConfig
-
 
+# Load data.
 dataset = ray.data.read_csv("s3://anonymous@air-example-data/breast_cancer.csv")
 
 # Split data into train and validation.
@@ -16,21 +14,16 @@
 test_dataset = valid_dataset.map_batches(
     lambda df: df.drop("target", axis=1), batch_format="pandas"
 )
-
-# Create a preprocessor to scale some columns
-columns_to_scale = ["mean radius", "mean texture"]
-preprocessor = StandardScaler(columns=columns_to_scale)
 # __air_generic_preprocess_end__
 
 # __air_tf_preprocess_start__
 import numpy as np
-import pandas as pd
 
-from ray.data.preprocessors import Concatenator, Chain
+from ray.data.preprocessors import Concatenator, Chain, StandardScaler
 
-# Chain the preprocessors together.
+# Create a preprocessor to scale some columns and concatenate the result.
 preprocessor = Chain(
-    preprocessor,
+    StandardScaler(columns=["mean radius", "mean texture"]),
     Concatenator(exclude=["target"], dtype=np.float32),
 )
 # __air_tf_preprocess_end__

diff --git a/doc/source/ray-air/examples/xgboost_starter.py b/doc/source/ray-air/examples/xgboost_starter.py
@@ -3,8 +3,6 @@
 
 # __air_generic_preprocess_start__
 import ray
-from ray.data.preprocessors import StandardScaler
-from ray.air.config import ScalingConfig
 
 # Load data.
 dataset = ray.data.read_csv("s3://anonymous@air-example-data/breast_cancer.csv")
@@ -16,16 +14,18 @@
 test_dataset = valid_dataset.map_batches(
     lambda df: df.drop("target", axis=1), batch_format="pandas"
 )
-
-# Create a preprocessor to scale some columns
-columns_to_scale = ["mean radius", "mean texture"]
-preprocessor = StandardScaler(columns=columns_to_scale)
 # __air_generic_preprocess_end__
 
+# __air_xgb_preprocess_start__
+# Create a preprocessor to scale some columns.
+from ray.data.preprocessors import StandardScaler
+
+preprocessor = StandardScaler(columns=["mean radius", "mean texture"])
+# __air_xgb_preprocess_end__
 
 # __air_xgb_train_start__
-from ray.train.xgboost import XGBoostTrainer
 from ray.air.config import ScalingConfig
+from ray.train.xgboost import XGBoostTrainer
 
 trainer = XGBoostTrainer(
     scaling_config=ScalingConfig(
@@ -84,4 +84,5 @@
 # {'predictions': 0.9970690608024597}
 # {'predictions': 0.9943051934242249}
 # {'predictions': 0.00334902573376894}
+# ...
 # __air_xgb_batchpred_end__
diff --git a/doc/source/ray-air/getting-started.rst b/doc/source/ray-air/getting-started.rst
@@ -9,13 +9,54 @@ Ray AI Runtime (AIR)
 
 Ray AI Runtime (AIR) is a scalable and unified toolkit for ML applications. AIR enables easy scaling of individual workloads, end-to-end workflows, and popular ecosystem frameworks, all in just Python.
 
+..
+  https://docs.google.com/drawings/d/1atB1dLjZIi8ibJ2-CoHdd3Zzyl_hDRWyK2CJAVBBLdU/edit
+
 .. image:: images/ray-air.svg
 
 AIR comes with ready-to-use libraries for :ref:`Preprocessing <datasets>`, :ref:`Training <train-docs>`, :ref:`Tuning <tune-main>`, :ref:`Scoring <air-predictors>`, :ref:`Serving <rayserve>`, and :ref:`Reinforcement Learning <rllib-index>`, as well as an ecosystem of integrations.
 
-Ray AIR focuses on the compute aspects of ML:
- * It provides scalability by leveraging Ray’s distributed compute layer for ML workloads.
- * It is designed to interoperate with other systems for storage and metadata needs.
+Why AIR?
+--------
+
+Ray AIR aims to simplify the ecosystem of machine learning frameworks, platforms, and tools. It does this by leveraging Ray to provide a seamless, unified, and open experience for scalable ML:
+
+.. image:: images/why-air-2.svg
+
+..
+  https://docs.google.com/drawings/d/1oi_JwNHXVgtR_9iTdbecquesUd4hOk0dWgHaTaFj6gk/edit
+
+**1. Seamless Dev to Prod**: AIR reduces friction going from development to production. With Ray and AIR, the same Python code scales seamlessly from a laptop to a large cluster.
+
+**2. Unified ML API**: AIR's unified ML API enables swapping between popular frameworks, such as XGBoost, PyTorch, and HuggingFace, with just a single class change in your code.
+
+**3. Open and Extensible**: AIR and Ray are fully open-source and can run on any cluster, cloud, or Kubernetes. Build custom components and integrations on top of scalable developer APIs.
+
+When to use AIR?
+----------------
+
+AIR is for both data scientists and ML engineers alike.
+
+.. image:: images/when-air.svg
+
+..
+  https://docs.google.com/drawings/d/1Qw_h457v921jWQkx63tmKAsOsJ-qemhwhCZvhkxWrWo/edit
+
+For data scientists, AIR can be used to scale individual workloads, and also end-to-end ML applications. For ML Engineers, AIR provides scalable platform abstractions that can be used to easily onboard and integrate tooling from the broader ML ecosystem.
+
+Quick Start
+-----------
+
+Below, we walk through how AIR's unified ML API enables scaling of end-to-end ML workflows, focusing on
+a few of the popular frameworks AIR integrates with (XGBoost, Pytorch, and Tensorflow). The ML workflow we're going to build is summarized by the following diagram:
+
+..
+  https://docs.google.com/drawings/d/1z0r_Yc7-0NAPVsP2jWUkLV2jHVHdcJHdt9uN1GDANSY/edit
+
+.. figure:: images/why-air.svg
+
+  AIR provides a unified API for the ML ecosystem.
+  This diagram shows how AIR enables an ecosystem of libraries to be run at scale in just a few lines of code.
 
 Get started by installing Ray AIR:
 
@@ -31,30 +72,24 @@ Get started by installing Ray AIR:
     pip install -U tensorflow>=2.6.2
     pip install -U pyarrow>=6.0.1
 
-Quick Start
------------
-
-Below, we demonstrate how AIR enables simple scaling of end-to-end ML workflows, focusing on
-a few of the popular frameworks AIR integrates with (XGBoost, Pytorch, and Tensorflow):
-
 Preprocessing
 ~~~~~~~~~~~~~
 
-Below, let's start by preprocessing your data with Ray AIR's ``Preprocessors``:
+First, let's start by loading a dataset from storage:
 
 .. literalinclude:: examples/xgboost_starter.py
     :language: python
     :start-after: __air_generic_preprocess_start__
     :end-before: __air_generic_preprocess_end__
 
-If using Tensorflow or Pytorch, format your data for use with your training framework:
+Then, we define a ``Preprocessor`` pipeline for our task:
 
 .. tabbed:: XGBoost
 
-    .. code-block:: python
-        
-        # No extra preprocessing is required for XGBoost.
-        # The data is already in the correct format.
+    .. literalinclude:: examples/xgboost_starter.py
+        :language: python
+        :start-after: __air_xgb_preprocess_start__
+        :end-before: __air_xgb_preprocess_end__
 
 .. tabbed:: Pytorch
 
@@ -155,38 +190,13 @@ Use the trained model for scalable batch prediction with a ``BatchPredictor``.
         :start-after: __air_tf_batchpred_start__
         :end-before: __air_tf_batchpred_end__
 
-Why Ray AIR?
-------------
 
-Ray AIR aims to simplify the ecosystem of machine learning frameworks, platforms, and tools. It does this by taking a scalable, single-system approach to ML infrastructure (i.e., leveraging Ray as a unified compute framework):
+Project Status
+--------------
 
-**1. Seamless Dev to Prod**: AIR reduces friction going from development to production. Traditional orchestration approaches introduce separate systems and operational overheads. With Ray and AIR, the same Python code scales seamlessly from a laptop to a large cluster.
-
-**2. Unified API**: Want to switch between frameworks like XGBoost and PyTorch, or try out a new library like HuggingFace? Thanks to the flexibility of AIR, you can do this by just swapping out a single class, without needing to set up new systems or change other aspects of your workflow.
-
-**3. Open and Evolvable**: Ray core and libraries are fully open-source and can run on any cluster, cloud, or Kubernetes, reducing the costs of platform lock-in. Want to go out of the box? Run any framework you want using AIR's integration APIs, or build advanced use cases directly on Ray core.
-
-.. figure:: images/why-air.png
-
-  AIR enables a single-system / single-script approach to scaling ML. Ray's
-  distributed Python APIs enable scaling of ML workloads without the burden of
-  setting up or orchestrating separate distributed systems.
-
-AIR is for both data scientists and ML engineers. Consider using AIR when you want to:
- * Scale a single workload.
- * Scale end-to-end ML applications.
- * Build a custom ML platform for your organization.
-
-AIR Ecosystem
--------------
-
-AIR comes with built-in integrations with the most popular ecosystem libraries. The following diagram provides an overview of the AIR libraries, ecosystem integrations, and their readiness.
-AIR's developer APIs also enable *custom integrations* to be easily created.
-
-..
-  https://docs.google.com/drawings/d/1pZkRrkAbRD8jM-xlGlAaVo3T66oBQ_HpsCzomMT7OIc/edit
+AIR is currently in **beta**. If you have questions for the team or are interested in getting involved in the development process, fill out `this short form <https://forms.gle/wCCdbaQDtgErYycT6>`__.
 
-.. image:: images/air-ecosystem.svg
+For an overview of the AIR libraries, ecosystem integrations, and their readiness, check out the latest `AIR ecosystem map <https://docs.ray.io/en/master/_images/air-ecosystem.svg>`_.
 
 Next Steps
 ----------

diff --git a/doc/source/ray-air/images/ray-air.svg b/doc/source/ray-air/images/ray-air.svg
diff --git a/doc/source/ray-air/images/when-air.svg b/doc/source/ray-air/images/when-air.svg
diff --git a/doc/source/ray-air/images/why-air-2.svg b/doc/source/ray-air/images/why-air-2.svg
diff --git a/doc/source/ray-air/images/why-air.png b/doc/source/ray-air/images/why-air.png
diff --git a/doc/source/ray-air/images/why-air.svg b/doc/source/ray-air/images/why-air.svg