[docs] batch inference pass (#35041)

we're focusing more on explaining how batch inference works without Ray first, what the differences are, and what to know about batches and their formats to scale out your workloads.
ray-project · May 10, 2023 · 959328b · 959328b
1 parent a514ade
commit 959328b
Show file tree

Hide file tree

Showing 6 changed files with 306 additions and 51 deletions.
diff --git a/doc/source/data/batch_inference.rst b/doc/source/data/batch_inference.rst
@@ -8,8 +8,9 @@ Running Batch Inference
     In this tutorial you'll learn what batch inference is, why you might want to use
     Ray for it, and how to use Ray Data effectively for this task.
     If you are familiar with the basics of inference tasks, jump straight to
-    code in the :ref:`quickstart section <batch_inference_quickstart>` or the
-    :ref:`advanced guide<batch_inference_advanced_pytorch_example>`.
+    code in the :ref:`quickstart section <batch_inference_quickstart>`, our detailed
+    :ref:`walk-through<batch_inference_walk_through>`,
+    or our :ref:`in-depth guide for PyTorch models<batch_inference_advanced_pytorch_example>`.
 
 Batch inference refers to generating model predictions on a set of input data.
 The model can range from a simple Python function to a complex neural network.
@@ -18,8 +19,9 @@ batch of data on demand.
 This is in contrast to online inference, where the model is run immediately on a
 data point when it becomes available.
 
-Here's a simple schematic of batch inference, "mapping" batches to predictions
-via model inference:
+Here's a simple schematic of batch inference for the computer vision task classifying
+images as cats or docs, by "mapping" batches of input data to predictions
+via ML model inference:
 
 .. figure:: images/batch_inference.png
 
@@ -66,21 +68,96 @@ use case does not require scaling yet:
 Quick Start
 -----------
 
-Install Ray with the data processing library, Ray Data:
+If you're impatient and want to see a copy-paste example right away,
+here are a few simple examples.
+Just pick one of the frameworks you like and run the code in your terminal.
+If you want a more detailed rundown of the same examples, skip ahead to the
+:ref:`following batch inference walk-through with Ray<batch_inference_walk_through>`.
 
-.. code-block:: bash
 
-    pip install ray[data]
+.. tabs::
+
+    .. group-tab:: HuggingFace
+
+        .. literalinclude:: ./doc_code/hf_quick_start.py
+            :language: python
+            :start-after: __hf_super_quick_start__
+            :end-before: __hf_super_quick_end__
+
+    .. group-tab:: PyTorch
+
+        .. literalinclude:: ./doc_code/pytorch_quick_start.py
+            :language: python
+            :start-after: __pt_super_quick_start__
+            :end-before: __pt_super_quick_end__
+
+    .. group-tab:: TensorFlow
+
+        .. literalinclude:: ./doc_code/tf_quick_start.py
+            :language: python
+            :start-after: __tf_super_quick_start__
+            :end-before: __tf_super_quick_end__
+
+
+.. _batch_inference_walk_through:
+
+Walk-through: Batch Inference with Ray
+--------------------------------------
 
 Running batch inference is conceptually easy and requires three steps:
 
-1. Load your data and optionally apply any preprocessing you need.
-2. Define your model for inference.
-3. Run inference on your data by using the :meth:`ds.map_batches() <ray.data.Dataset.map_batches>`
-   method from Ray Data.
+1. Load your data and apply any preprocessing you need.
+2. Define your model and define a transformation that applies your model to your data.
+3. Run the transformation on your data.
+
+
+Let's take a look at a simple example of this process without using Ray.
+In each example we load ``batches`` of data, load a ``model``, define a ``transform``
+function and apply the model to the data to get ``results``.
+
+.. tabs::
+
+    .. group-tab:: HuggingFace
+
+        .. literalinclude:: ./doc_code/hf_quick_start.py
+            :language: python
+            :start-after: __hf_no_ray_start__
+            :end-before: __hf_no_ray_end__
+
+    .. group-tab:: PyTorch
+
+        .. literalinclude:: ./doc_code/pytorch_quick_start.py
+            :language: python
+            :start-after: __pt_no_ray_start__
+            :end-before: __pt_no_ray_end__
+
+    .. group-tab:: TensorFlow
+
+       .. literalinclude:: ./doc_code/tf_quick_start.py
+           :language: python
+           :start-after: __tf_no_ray_start__
+           :end-before: __tf_no_ray_end__
+
+.. note::
+
+    As a Python user, this should all look familiar to you.
+    The only part that you might be wondering about is that we're using
+    ``Dict[str, np.ndarray]`` as input type to our ``transform`` functions.
+    We do this to ease the transition to Ray, given that Ray Data uses
+    ``Dict[str, np.ndarray]`` as the default format for its batches.
+
+
+If you can follow the above examples conceptually, you should have no trouble scaling your batch
+inference workload to a compute cluster with Ray Data.
+If you're using Ray, the three steps for running batch inference read as follows:
 
-The last step also defines how your batch processing job gets distributed across your (local) cluster.
-We start with very simple use cases here and build up to more complex ones in other guides and tutorials.
+1. Load a Ray Data dataset and apply any preprocessing you need. This will distribute your data
+   across the cluster.
+2. Define your model in a class and define a transformation that applies your model to
+   your data batches (of format ``Dict[str, np.ndarray]`` by default).
+3. Run inference on your data by using the :meth:`ds.map_batches() <ray.data.Datastream.map_batches>`
+   method from Ray Data. In this step you also define how your batch processing job
+   gets distributed across your cluster.
 
 .. note::
 
@@ -89,26 +166,31 @@ We start with very simple use cases here and build up to more complex ones in ot
     demanding model setups, additional postprocessing, or other customizations.
     We'll cover these advanced use cases in the next sections.
 
+Let's scale out the above examples to a Ray cluster.
+To start, install Ray with the data processing library, Ray Data:
+
+.. code-block:: bash
+
+    pip install ray[data]
+
+
 1. Loading and preprocessing data
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 For this quick start guide we use very small, in-memory datasets by
 leveraging common Python libraries like NumPy and Pandas.
-In general, once you load your data using Ray Data, you also want to apply some preprocessing steps.
-We skip this step here for simplicity.
-In any case, the result of this step is a Ray Dataset ``ds`` that we can use to run inference on.
 
-.. margin::
+In fact, we're using the exact same datasets as in the previous section, but load
+them into Ray data.
+The result of this step is a Ray Datastream ``ds`` that we can use to run inference on.
 
-    For larger data sets, you can use Ray Data to load data from cloud storage like S3 or GCS.
-    We'll cover this later on.
 
 .. tabs::
 
     .. group-tab:: HuggingFace
 
-        Create a Pandas
-        DataFrame with text data to run a GPT-2 model on.
+        Create a Pandas DataFrame with text data and convert it to a Ray Datastream
+        with the :meth:`ray.data.from_pandas() <ray.data.Datastream.from_pandas>` method.
 
         .. literalinclude:: ./doc_code/hf_quick_start.py
             :language: python
@@ -118,7 +200,8 @@ In any case, the result of this step is a Ray Dataset ``ds`` that we can use to
     .. group-tab:: PyTorch
 
         Create a NumPy array with 100
-        entries, which represents the input to a feed-forward neural network.
+        entries and convert it to a Ray Datastream with the
+        :meth:`ray.data.from_numpy() <ray.data.Datastream.from_numpy>` method.
 
         .. literalinclude:: ./doc_code/pytorch_quick_start.py
             :language: python
@@ -127,10 +210,11 @@ In any case, the result of this step is a Ray Dataset ``ds`` that we can use to
 
     .. group-tab:: TensorFlow
 
-       Create a NumPy array with 100
-       entries, which represents the input to a feed-forward neural network.
+        Create a NumPy array with 100
+        entries and convert it to a Ray Datastream with the
+        :meth:`ray.data.from_numpy() <ray.data.Datastream.from_numpy>` method.
 
-       .. literalinclude:: ./doc_code/tf_quick_start.py
+        .. literalinclude:: ./doc_code/tf_quick_start.py
            :language: python
            :start-after: __tf_quickstart_load_start__
            :end-before: __tf_quickstart_load_end__
@@ -141,6 +225,8 @@ In any case, the result of this step is a Ray Dataset ``ds`` that we can use to
 Next, you want to set up your model for inference, by defining a predictor.
 The core idea is to define a class that loads your model in its ``__init__`` method and
 and implements a ``__call__`` method that takes a batch of data and returns a batch of predictions.
+The ``__call__`` method is essentially the same as the ``transform`` function from the previous section.
+
 Below you find examples for PyTorch, TensorFlow, and HuggingFace.
 
 .. tabs::
@@ -199,15 +285,6 @@ Once you have your Ray Dataset ``ds`` and your predictor class, you can use
 In the example below, we use two CPUs to run inference in parallel and then print the results.
 We cover resource allocation in more detail in :ref:`the configuration section of this guide <batch_inference_config>`.
 
-.. note::
-
-    Defining your :meth:`ds.map_batches() <ray.data.Dataset.map_batches>` function requires
-    you to write a Python function that takes a batch of data and returns a batch of predictions.
-    An easy way to do this and validate it is to use :meth:`ds.take_batch(N) <ray.data.Dataset.take_batch>` to get a batch of data
-    first, and then locally test your predictor function on that batch, without using Ray.
-    Once you are happy with the results, you can use the same function in ``map_batches``
-    on the full dataset. The examples below show you how.
-
 .. tabs::
 
     .. group-tab:: HuggingFace
@@ -231,10 +308,42 @@ We cover resource allocation in more detail in :ref:`the configuration section o
             :start-after: __tf_quickstart_prediction_start__
             :end-before: __tf_quickstart_prediction_end__
 
+
+Note how defining your :meth:`ds.map_batches() <ray.data.Dataset.map_batches>` function requires
+you to write a Python method that takes a batch of data and returns a batch of predictions.
+An easy way to do this and validate it is to use :meth:`ds.take_batch(N) <ray.data.Dataset.take_batch>` to get a batch of data
+first, and then locally test your predictor method on that batch, without using Ray.
+Once you are happy with the results, you can use the same function in ``map_batches``
+on the full dataset. Below you see how to do that in our running examples.
+
+.. tabs::
+
+    .. group-tab:: HuggingFace
+
+        .. literalinclude:: ./doc_code/hf_quick_start.py
+            :language: python
+            :start-after: __hf_quickstart_prediction_test_start__
+            :end-before: __hf_quickstart_prediction_test_end__
+
+    .. group-tab:: PyTorch
+
+        .. literalinclude:: ./doc_code/pytorch_quick_start.py
+            :language: python
+            :start-after: __pt_quickstart_prediction_test_start__
+            :end-before: __pt_quickstart_prediction_test_end__
+
+    .. group-tab:: TensorFlow
+
+        .. literalinclude:: ./doc_code/tf_quick_start.py
+            :language: python
+            :start-after: __tf_quickstart_prediction_test_start__
+            :end-before: __tf_quickstart_prediction_test_end__
+
+
 .. _batch_inference_advanced_pytorch_example:
 
-Advanced batch inference guide
-------------------------------
+Advanced Guide to Batch Inference with PyTorch
+----------------------------------------------
 
 Let's use batch inference on a pre-trained PyTorch model for image classification
 to illustrate advanced concepts of batch processing with Ray.
@@ -387,10 +496,16 @@ stateful class with Ray for our pretrained ResNet model:
 
         <2> The ``__call__`` method is used to apply the model to a batch of data.
 
-        <3> We're free to use any custom code in a stateful class, and here we prepare the data to run on GPUs.
+        <3> We're free to use any custom code in a stateful class.
 
         <4> Finally, we return the ``"class"`` key of the model predictions as Numpy array.
 
+.. note::
+
+    Of course, you can also use GPUs for inference with Ray.
+    Jump ahead to the :ref:`GPU usage section <batch_inference_gpu>` to see how
+    to modify the current example to use GPUs.
+
 
 Scalable inference with Ray Data
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -559,6 +674,9 @@ If encountering OOMs, decreasing your ``batch_size`` may help.
   The default ``batch_size`` of ``4096`` may be too large for datasets with large rows
   (e.g. tables with many columns or a collection of large images).
 
+
+.. _batch_inference_gpu:
+
 Using GPUs in batch inference
 -----------------------------
 

diff --git a/doc/source/data/doc_code/hf_quick_start.py b/doc/source/data/doc_code/hf_quick_start.py
@@ -2,15 +2,48 @@
 # isort: skip_file
 # fmt: off
 
-# __hf_quickstart_load_start__
+# __hf_super_quick_start__
 import ray
 import numpy as np
-import pandas as pd
 from typing import Dict
 
+ds = ray.data.from_numpy(np.asarray(["Complete this", "for me"]))
+
+class HuggingFacePredictor:
+    def __init__(self):
+        from transformers import pipeline
+        self.model = pipeline("text-generation", model="gpt2")
+
+    def __call__(self, batch: Dict[str, np.ndarray]):
+        model_out = self.model(list(batch["data"]), max_length=20)
+        return {"output": model_out}
+
+scale = ray.data.ActorPoolStrategy(size=2)
+predictions = ds.map_batches(HuggingFacePredictor, compute=scale)
+predictions.show(limit=1)
+# __hf_super_quick_end__
+
+# __hf_no_ray_start__
+import numpy as np
+from typing import Dict
+from transformers import pipeline
+
+batches = {"data": np.asarray(["Complete this", "for me"])}
+
+model = pipeline("text-generation", model="gpt2")
+
+def transform(batch: Dict[str, np.ndarray]):
+    return model(list(batch["data"]), max_length=20)
 
-prompts = pd.DataFrame(["Complete these sentences", "for me"], columns=["text"])
-ds = ray.data.from_pandas(prompts)
+results = transform(batches)
+# __hf_no_ray_end__
+
+# __hf_quickstart_load_start__
+import ray
+import numpy as np
+from typing import Dict
+
+ds = ray.data.from_numpy(np.asarray(["Complete this", "for me"]))
 # __hf_quickstart_load_end__
 
 
@@ -21,16 +54,19 @@ def __init__(self):  # <1>
         self.model = pipeline("text-generation", model="gpt2")
 
     def __call__(self, batch: Dict[str, np.ndarray]):  # <2>
-        model_out = self.model(list(batch["text"]), max_length=20)
-        return pd.DataFrame({"output": model_out})
+        model_out = self.model(list(batch["data"]), max_length=20)
+        return {"output": np.asarray(model_out)}
 # __hf_quickstart_model_end__
 
 
-# __hf_quickstart_prediction_start__
+# __hf_quickstart_prediction_test_start__
 hfp = HuggingFacePredictor()
 batch = ds.take_batch(10)
 test = hfp(batch)
+# __hf_quickstart_prediction_test_end__
+
 
+# __hf_quickstart_prediction_start__
 scale = ray.data.ActorPoolStrategy(size=2)
 predictions = ds.map_batches(HuggingFacePredictor, compute=scale)