Skip to content

Commit

Permalink
[docs] batch inference pass (#35041)
Browse files Browse the repository at this point in the history
we're focusing more on explaining how batch inference works without Ray first, what the differences are, and what to know about batches and their formats to scale out your workloads.
  • Loading branch information
maxpumperla authored May 10, 2023
1 parent a514ade commit 959328b
Show file tree
Hide file tree
Showing 6 changed files with 306 additions and 51 deletions.
192 changes: 155 additions & 37 deletions doc/source/data/batch_inference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,9 @@ Running Batch Inference
In this tutorial you'll learn what batch inference is, why you might want to use
Ray for it, and how to use Ray Data effectively for this task.
If you are familiar with the basics of inference tasks, jump straight to
code in the :ref:`quickstart section <batch_inference_quickstart>` or the
:ref:`advanced guide<batch_inference_advanced_pytorch_example>`.
code in the :ref:`quickstart section <batch_inference_quickstart>`, our detailed
:ref:`walk-through<batch_inference_walk_through>`,
or our :ref:`in-depth guide for PyTorch models<batch_inference_advanced_pytorch_example>`.

Batch inference refers to generating model predictions on a set of input data.
The model can range from a simple Python function to a complex neural network.
Expand All @@ -18,8 +19,9 @@ batch of data on demand.
This is in contrast to online inference, where the model is run immediately on a
data point when it becomes available.

Here's a simple schematic of batch inference, "mapping" batches to predictions
via model inference:
Here's a simple schematic of batch inference for the computer vision task classifying
images as cats or docs, by "mapping" batches of input data to predictions
via ML model inference:

.. figure:: images/batch_inference.png

Expand Down Expand Up @@ -66,21 +68,96 @@ use case does not require scaling yet:
Quick Start
-----------

Install Ray with the data processing library, Ray Data:
If you're impatient and want to see a copy-paste example right away,
here are a few simple examples.
Just pick one of the frameworks you like and run the code in your terminal.
If you want a more detailed rundown of the same examples, skip ahead to the
:ref:`following batch inference walk-through with Ray<batch_inference_walk_through>`.

.. code-block:: bash

pip install ray[data]
.. tabs::

.. group-tab:: HuggingFace

.. literalinclude:: ./doc_code/hf_quick_start.py
:language: python
:start-after: __hf_super_quick_start__
:end-before: __hf_super_quick_end__

.. group-tab:: PyTorch

.. literalinclude:: ./doc_code/pytorch_quick_start.py
:language: python
:start-after: __pt_super_quick_start__
:end-before: __pt_super_quick_end__

.. group-tab:: TensorFlow

.. literalinclude:: ./doc_code/tf_quick_start.py
:language: python
:start-after: __tf_super_quick_start__
:end-before: __tf_super_quick_end__


.. _batch_inference_walk_through:

Walk-through: Batch Inference with Ray
--------------------------------------

Running batch inference is conceptually easy and requires three steps:

1. Load your data and optionally apply any preprocessing you need.
2. Define your model for inference.
3. Run inference on your data by using the :meth:`ds.map_batches() <ray.data.Dataset.map_batches>`
method from Ray Data.
1. Load your data and apply any preprocessing you need.
2. Define your model and define a transformation that applies your model to your data.
3. Run the transformation on your data.


Let's take a look at a simple example of this process without using Ray.
In each example we load ``batches`` of data, load a ``model``, define a ``transform``
function and apply the model to the data to get ``results``.

.. tabs::

.. group-tab:: HuggingFace

.. literalinclude:: ./doc_code/hf_quick_start.py
:language: python
:start-after: __hf_no_ray_start__
:end-before: __hf_no_ray_end__

.. group-tab:: PyTorch

.. literalinclude:: ./doc_code/pytorch_quick_start.py
:language: python
:start-after: __pt_no_ray_start__
:end-before: __pt_no_ray_end__

.. group-tab:: TensorFlow

.. literalinclude:: ./doc_code/tf_quick_start.py
:language: python
:start-after: __tf_no_ray_start__
:end-before: __tf_no_ray_end__

.. note::

As a Python user, this should all look familiar to you.
The only part that you might be wondering about is that we're using
``Dict[str, np.ndarray]`` as input type to our ``transform`` functions.
We do this to ease the transition to Ray, given that Ray Data uses
``Dict[str, np.ndarray]`` as the default format for its batches.


If you can follow the above examples conceptually, you should have no trouble scaling your batch
inference workload to a compute cluster with Ray Data.
If you're using Ray, the three steps for running batch inference read as follows:

The last step also defines how your batch processing job gets distributed across your (local) cluster.
We start with very simple use cases here and build up to more complex ones in other guides and tutorials.
1. Load a Ray Data dataset and apply any preprocessing you need. This will distribute your data
across the cluster.
2. Define your model in a class and define a transformation that applies your model to
your data batches (of format ``Dict[str, np.ndarray]`` by default).
3. Run inference on your data by using the :meth:`ds.map_batches() <ray.data.Datastream.map_batches>`
method from Ray Data. In this step you also define how your batch processing job
gets distributed across your cluster.

.. note::

Expand All @@ -89,26 +166,31 @@ We start with very simple use cases here and build up to more complex ones in ot
demanding model setups, additional postprocessing, or other customizations.
We'll cover these advanced use cases in the next sections.

Let's scale out the above examples to a Ray cluster.
To start, install Ray with the data processing library, Ray Data:

.. code-block:: bash
pip install ray[data]
1. Loading and preprocessing data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For this quick start guide we use very small, in-memory datasets by
leveraging common Python libraries like NumPy and Pandas.
In general, once you load your data using Ray Data, you also want to apply some preprocessing steps.
We skip this step here for simplicity.
In any case, the result of this step is a Ray Dataset ``ds`` that we can use to run inference on.

.. margin::
In fact, we're using the exact same datasets as in the previous section, but load
them into Ray data.
The result of this step is a Ray Datastream ``ds`` that we can use to run inference on.

For larger data sets, you can use Ray Data to load data from cloud storage like S3 or GCS.
We'll cover this later on.

.. tabs::

.. group-tab:: HuggingFace

Create a Pandas
DataFrame with text data to run a GPT-2 model on.
Create a Pandas DataFrame with text data and convert it to a Ray Datastream
with the :meth:`ray.data.from_pandas() <ray.data.Datastream.from_pandas>` method.

.. literalinclude:: ./doc_code/hf_quick_start.py
:language: python
Expand All @@ -118,7 +200,8 @@ In any case, the result of this step is a Ray Dataset ``ds`` that we can use to
.. group-tab:: PyTorch

Create a NumPy array with 100
entries, which represents the input to a feed-forward neural network.
entries and convert it to a Ray Datastream with the
:meth:`ray.data.from_numpy() <ray.data.Datastream.from_numpy>` method.

.. literalinclude:: ./doc_code/pytorch_quick_start.py
:language: python
Expand All @@ -127,10 +210,11 @@ In any case, the result of this step is a Ray Dataset ``ds`` that we can use to

.. group-tab:: TensorFlow

Create a NumPy array with 100
entries, which represents the input to a feed-forward neural network.
Create a NumPy array with 100
entries and convert it to a Ray Datastream with the
:meth:`ray.data.from_numpy() <ray.data.Datastream.from_numpy>` method.

.. literalinclude:: ./doc_code/tf_quick_start.py
.. literalinclude:: ./doc_code/tf_quick_start.py
:language: python
:start-after: __tf_quickstart_load_start__
:end-before: __tf_quickstart_load_end__
Expand All @@ -141,6 +225,8 @@ In any case, the result of this step is a Ray Dataset ``ds`` that we can use to
Next, you want to set up your model for inference, by defining a predictor.
The core idea is to define a class that loads your model in its ``__init__`` method and
and implements a ``__call__`` method that takes a batch of data and returns a batch of predictions.
The ``__call__`` method is essentially the same as the ``transform`` function from the previous section.

Below you find examples for PyTorch, TensorFlow, and HuggingFace.

.. tabs::
Expand Down Expand Up @@ -199,15 +285,6 @@ Once you have your Ray Dataset ``ds`` and your predictor class, you can use
In the example below, we use two CPUs to run inference in parallel and then print the results.
We cover resource allocation in more detail in :ref:`the configuration section of this guide <batch_inference_config>`.

.. note::

Defining your :meth:`ds.map_batches() <ray.data.Dataset.map_batches>` function requires
you to write a Python function that takes a batch of data and returns a batch of predictions.
An easy way to do this and validate it is to use :meth:`ds.take_batch(N) <ray.data.Dataset.take_batch>` to get a batch of data
first, and then locally test your predictor function on that batch, without using Ray.
Once you are happy with the results, you can use the same function in ``map_batches``
on the full dataset. The examples below show you how.

.. tabs::

.. group-tab:: HuggingFace
Expand All @@ -231,10 +308,42 @@ We cover resource allocation in more detail in :ref:`the configuration section o
:start-after: __tf_quickstart_prediction_start__
:end-before: __tf_quickstart_prediction_end__


Note how defining your :meth:`ds.map_batches() <ray.data.Dataset.map_batches>` function requires
you to write a Python method that takes a batch of data and returns a batch of predictions.
An easy way to do this and validate it is to use :meth:`ds.take_batch(N) <ray.data.Dataset.take_batch>` to get a batch of data
first, and then locally test your predictor method on that batch, without using Ray.
Once you are happy with the results, you can use the same function in ``map_batches``
on the full dataset. Below you see how to do that in our running examples.

.. tabs::

.. group-tab:: HuggingFace

.. literalinclude:: ./doc_code/hf_quick_start.py
:language: python
:start-after: __hf_quickstart_prediction_test_start__
:end-before: __hf_quickstart_prediction_test_end__

.. group-tab:: PyTorch

.. literalinclude:: ./doc_code/pytorch_quick_start.py
:language: python
:start-after: __pt_quickstart_prediction_test_start__
:end-before: __pt_quickstart_prediction_test_end__

.. group-tab:: TensorFlow

.. literalinclude:: ./doc_code/tf_quick_start.py
:language: python
:start-after: __tf_quickstart_prediction_test_start__
:end-before: __tf_quickstart_prediction_test_end__


.. _batch_inference_advanced_pytorch_example:

Advanced batch inference guide
------------------------------
Advanced Guide to Batch Inference with PyTorch
----------------------------------------------

Let's use batch inference on a pre-trained PyTorch model for image classification
to illustrate advanced concepts of batch processing with Ray.
Expand Down Expand Up @@ -387,10 +496,16 @@ stateful class with Ray for our pretrained ResNet model:

<2> The ``__call__`` method is used to apply the model to a batch of data.

<3> We're free to use any custom code in a stateful class, and here we prepare the data to run on GPUs.
<3> We're free to use any custom code in a stateful class.

<4> Finally, we return the ``"class"`` key of the model predictions as Numpy array.

.. note::

Of course, you can also use GPUs for inference with Ray.
Jump ahead to the :ref:`GPU usage section <batch_inference_gpu>` to see how
to modify the current example to use GPUs.


Scalable inference with Ray Data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -559,6 +674,9 @@ If encountering OOMs, decreasing your ``batch_size`` may help.
The default ``batch_size`` of ``4096`` may be too large for datasets with large rows
(e.g. tables with many columns or a collection of large images).


.. _batch_inference_gpu:

Using GPUs in batch inference
-----------------------------

Expand Down
50 changes: 43 additions & 7 deletions doc/source/data/doc_code/hf_quick_start.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,48 @@
# isort: skip_file
# fmt: off

# __hf_quickstart_load_start__
# __hf_super_quick_start__
import ray
import numpy as np
import pandas as pd
from typing import Dict

ds = ray.data.from_numpy(np.asarray(["Complete this", "for me"]))

class HuggingFacePredictor:
def __init__(self):
from transformers import pipeline
self.model = pipeline("text-generation", model="gpt2")

def __call__(self, batch: Dict[str, np.ndarray]):
model_out = self.model(list(batch["data"]), max_length=20)
return {"output": model_out}

scale = ray.data.ActorPoolStrategy(size=2)
predictions = ds.map_batches(HuggingFacePredictor, compute=scale)
predictions.show(limit=1)
# __hf_super_quick_end__

# __hf_no_ray_start__
import numpy as np
from typing import Dict
from transformers import pipeline

batches = {"data": np.asarray(["Complete this", "for me"])}

model = pipeline("text-generation", model="gpt2")

def transform(batch: Dict[str, np.ndarray]):
return model(list(batch["data"]), max_length=20)

prompts = pd.DataFrame(["Complete these sentences", "for me"], columns=["text"])
ds = ray.data.from_pandas(prompts)
results = transform(batches)
# __hf_no_ray_end__

# __hf_quickstart_load_start__
import ray
import numpy as np
from typing import Dict

ds = ray.data.from_numpy(np.asarray(["Complete this", "for me"]))
# __hf_quickstart_load_end__


Expand All @@ -21,16 +54,19 @@ def __init__(self): # <1>
self.model = pipeline("text-generation", model="gpt2")

def __call__(self, batch: Dict[str, np.ndarray]): # <2>
model_out = self.model(list(batch["text"]), max_length=20)
return pd.DataFrame({"output": model_out})
model_out = self.model(list(batch["data"]), max_length=20)
return {"output": np.asarray(model_out)}
# __hf_quickstart_model_end__


# __hf_quickstart_prediction_start__
# __hf_quickstart_prediction_test_start__
hfp = HuggingFacePredictor()
batch = ds.take_batch(10)
test = hfp(batch)
# __hf_quickstart_prediction_test_end__


# __hf_quickstart_prediction_start__
scale = ray.data.ActorPoolStrategy(size=2)
predictions = ds.map_batches(HuggingFacePredictor, compute=scale)

Expand Down
Loading

0 comments on commit 959328b

Please sign in to comment.