-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new End-to-End tutorial in Serve that walks users through deploying a model #20765
Add a new End-to-End tutorial in Serve that walks users through deploying a model #20765
Conversation
…HuggingFace model contributors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! It's right level of details and explanation. Content wise:
- trim down the modeling specific code using huggingface pipelines
- add fastapi example for typed http requests handling
.. code-block:: python | ||
|
||
from transformers import AutoTokenizer, AutoModelWithLMHead | ||
|
||
def summarize(text, max_length=150): | ||
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-summarize-news") | ||
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-summarize-news") | ||
|
||
input_ids = tokenizer.encode(text, return_tensors="pt", add_special_tokens=True) | ||
generated_ids = model.generate(input_ids=input_ids, num_beams=2, max_length=max_length, repetition_penalty=2.5, length_penalty=1.0, early_stopping=True) | ||
preds = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True) for g in generated_ids] | ||
|
||
return preds[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just use https://huggingface.co/transformers/main_classes/pipelines.html#transformers.SummarizationPipeline to make this one line?
The ``ray`` and ``ray serve`` libraries give us access to Ray Serve's deployments, | ||
so we can access our model over HTTP. The requests library handles HTTP request routing: | ||
|
||
.. code-block:: python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use literalinclude to make the script testable https://github.com/ray-project/ray/blob/master/doc/source/serve/pipeline.rst#basic-api
|
||
``ray.init()`` will start a single-node Ray cluster on your local machine, | ||
which will allow you to use all your CPU cores to serve requests in parallel. | ||
To start a multi-node cluster, see :doc:`../cluster/index`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:ref:`serve-deploy-tutorial`
instead, this will link to https://github.com/ray-project/ray/blob/master/doc/source/serve/deployment.rst and provide a stable link
@triciasfu @edoakes ping for review! |
|
||
When the Python script exits, Ray Serve will shut down. | ||
If you would rather keep Ray Serve running in the background you can use | ||
``serve.start(detached=True)`` (see :doc:`deployment` for details). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably use serve.start(detached=True) as the default in the code and then add a snippet about how users can also use serve.start() [without detached] since thats the primary use case
When the Python script exits, Ray Serve will shut down. | ||
If you would rather keep Ray Serve running in the background you can use ``serve.start(detached=True)`` (see :doc:`deployment` for details). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above - default to detached=True
|
||
We can achieve this by converting our ``summarize`` function into a class: | ||
|
||
.. code-block:: python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also include FastAPI into this tutorial!
``tokenizer`` and ``model`` only once and keep their values in memory instead of | ||
reloading them upon each HTTP query. | ||
|
||
We can achieve this by converting our ``summarize`` function into a class: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above here:
We probably want to add 1-2 sentences on how to run this (i.e. To deploy, first start ray by running ray start --head, then run your python script: python example.py)
Co-authored-by: Simon Mo <[email protected]>
Co-authored-by: Simon Mo <[email protected]>
Co-authored-by: Simon Mo <[email protected]>
@@ -0,0 +1,387 @@ | |||
==================================== | |||
End-to-End Model Deployment Tutorial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be called "End-to-End Tutorial" instead of "End-to-End Model Deployment Tutorial" to match the current E2E tutorial title?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Awesome work
doc/source/serve/index.rst
Outdated
@@ -44,23 +44,133 @@ Ray Serve can be used in two primary ways to deploy your models at scale: | |||
Ray Serve Quickstart | |||
==================== | |||
|
|||
Ray Serve supports Python versions 3.6 through 3.8. To install Ray Serve, run the following command: | |||
Ray Serve supports Python versions 3.6 through 3.9. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just link to Ray's support matrix: https://docs.ray.io/en/master/installation.html
Why are these changes needed?
Currently, the docs have an end-to-end tutorial walking users through deploying a
Counter
function on Serve. This PR adds an end-to-end tutorial walking users through deploying an entire Hugging Face model using Serve, providing a better understanding of how to deploy an actual model via Serve.Related issue number
Closes #19250.
Checks
scripts/format.sh
to lint the changes in this PR.