Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new End-to-End tutorial in Serve that walks users through deploying a model #20765

Merged
merged 58 commits into from
Jan 24, 2022

Conversation

shrekris-anyscale
Copy link
Contributor

Why are these changes needed?

Currently, the docs have an end-to-end tutorial walking users through deploying a Counter function on Serve. This PR adds an end-to-end tutorial walking users through deploying an entire Hugging Face model using Serve, providing a better understanding of how to deploy an actual model via Serve.

Related issue number

Closes #19250.

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • This is a docs PR, so no testing necessary.

Copy link
Contributor

@simon-mo simon-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! It's right level of details and explanation. Content wise:

  • trim down the modeling specific code using huggingface pipelines
  • add fastapi example for typed http requests handling

Comment on lines 16 to 28
.. code-block:: python

from transformers import AutoTokenizer, AutoModelWithLMHead

def summarize(text, max_length=150):
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-summarize-news")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-summarize-news")

input_ids = tokenizer.encode(text, return_tensors="pt", add_special_tokens=True)
generated_ids = model.generate(input_ids=input_ids, num_beams=2, max_length=max_length, repetition_penalty=2.5, length_penalty=1.0, early_stopping=True)
preds = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True) for g in generated_ids]

return preds[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ``ray`` and ``ray serve`` libraries give us access to Ray Serve's deployments,
so we can access our model over HTTP. The requests library handles HTTP request routing:

.. code-block:: python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


``ray.init()`` will start a single-node Ray cluster on your local machine,
which will allow you to use all your CPU cores to serve requests in parallel.
To start a multi-node cluster, see :doc:`../cluster/index`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:ref:`serve-deploy-tutorial`

instead, this will link to https://github.com/ray-project/ray/blob/master/doc/source/serve/deployment.rst and provide a stable link

doc/source/serve/deploy_model_tutorial.rst Outdated Show resolved Hide resolved
@simon-mo
Copy link
Contributor

simon-mo commented Dec 6, 2021

@triciasfu @edoakes ping for review!


When the Python script exits, Ray Serve will shut down.
If you would rather keep Ray Serve running in the background you can use
``serve.start(detached=True)`` (see :doc:`deployment` for details).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably use serve.start(detached=True) as the default in the code and then add a snippet about how users can also use serve.start() [without detached] since thats the primary use case

Comment on lines 187 to 188
When the Python script exits, Ray Serve will shut down.
If you would rather keep Ray Serve running in the background you can use ``serve.start(detached=True)`` (see :doc:`deployment` for details).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above - default to detached=True


We can achieve this by converting our ``summarize`` function into a class:

.. code-block:: python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also include FastAPI into this tutorial!

``tokenizer`` and ``model`` only once and keep their values in memory instead of
reloading them upon each HTTP query.

We can achieve this by converting our ``summarize`` function into a class:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above here:
We probably want to add 1-2 sentences on how to run this (i.e. To deploy, first start ray by running ray start --head, then run your python script: python example.py)

@@ -0,0 +1,387 @@
====================================
End-to-End Model Deployment Tutorial
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be called "End-to-End Tutorial" instead of "End-to-End Model Deployment Tutorial" to match the current E2E tutorial title?

Copy link
Contributor

@simon-mo simon-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Awesome work

@@ -44,23 +44,133 @@ Ray Serve can be used in two primary ways to deploy your models at scale:
Ray Serve Quickstart
====================

Ray Serve supports Python versions 3.6 through 3.8. To install Ray Serve, run the following command:
Ray Serve supports Python versions 3.6 through 3.9.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just link to Ray's support matrix: https://docs.ray.io/en/master/installation.html

@edoakes edoakes merged commit 03d93ba into ray-project:master Jan 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add more in-depth tutorial
4 participants