-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[serve] Add experimental support for StreamingResponse
using RayObjectRefGenerator
#35720
Changes from all commits
452ed1f
3ebe327
c140a5c
b83af80
509b311
d0795e5
f8a90f6
0a9169d
05f468a
122b705
7a8fe2c
d880763
f501c22
1942394
3e0212e
c9a932e
ffe20fd
0e89ad7
d520e47
7610474
aaa0582
a52f74b
37c3bdd
fd83edd
ef08b64
74a2e31
8b9ba39
a4b62ac
d350b5d
9ed05d9
e2f1980
de41fbe
177ff88
805c7bb
fba4a5d
fa7fe24
5dc6b98
6e51a5e
7c449be
5afd081
0e87d73
d7ebad1
2b046b6
c726484
1151a28
f5d3956
1f79ad8
5e09c4c
7113e3d
c72dc03
b7be576
e0e74cb
11686d4
98d7292
4d55f33
5397b78
391eb0f
f4415d1
e49d1a5
ce1e79d
6c0448b
9450a07
a827426
fe6c650
e4106b0
2d58c66
a3b5397
ff16f54
a789ecf
6402eb0
aa8823a
f21f89c
afa8699
4c3442f
3d48b01
88781fb
4d6eace
7a050a7
5bde9fd
d16c620
3aa69b9
61acf41
86b741a
c16f437
7c066f8
9db27af
7f04906
6bfa34f
5c08172
31c7820
cc31c93
3751f89
8b2df0d
b047feb
34f585e
12271c6
4efcf7b
97611c5
5fdc3bc
8617ad5
1cc5772
d039bb4
8dc379c
d5ccabb
2d21721
e35fb13
8b5f9be
dca76c6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# flake8: noqa | ||
|
||
# __begin_example__ | ||
import time | ||
from typing import Generator | ||
|
||
import requests | ||
from starlette.responses import StreamingResponse | ||
from starlette.requests import Request | ||
|
||
from ray import serve | ||
|
||
|
||
@serve.deployment | ||
class StreamingResponder: | ||
def generate_numbers(self, max: int) -> Generator[str, None, None]: | ||
for i in range(max): | ||
yield str(i) | ||
time.sleep(0.1) | ||
|
||
def __call__(self, request: Request) -> StreamingResponse: | ||
max = request.query_params.get("max", "25") | ||
gen = self.generate_numbers(int(max)) | ||
return StreamingResponse(gen, status_code=200, media_type="text/plain") | ||
|
||
|
||
serve.run(StreamingResponder.bind()) | ||
|
||
r = requests.get("http://localhost:8000?max=10", stream=True) | ||
start = time.time() | ||
r.raise_for_status() | ||
for chunk in r.iter_content(chunk_size=None, decode_unicode=True): | ||
print(f"Got result {round(time.time()-start, 1)}s after start: '{chunk}'") | ||
# __end_example__ | ||
|
||
|
||
r = requests.get("http://localhost:8000?max=10", stream=True) | ||
r.raise_for_status() | ||
for i, chunk in enumerate(r.iter_content(chunk_size=None, decode_unicode=True)): | ||
assert chunk == str(i) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ | |
This section helps you understand how to: | ||
- send HTTP requests to Serve deployments | ||
- use Ray Serve to integrate with FastAPI | ||
- use customized HTTP Adapters | ||
- use customized HTTP adapters | ||
- choose which feature to use for your use case | ||
|
||
## Choosing the right HTTP feature | ||
|
@@ -74,6 +74,51 @@ Existing middlewares, **automatic OpenAPI documentation generation**, and other | |
Serve currently does not support WebSockets. If you have a use case that requires it, please [let us know](https://github.com/ray-project/ray/issues/new/choose)! | ||
``` | ||
|
||
(serve-http-streaming-response)= | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we pull this in a separate page for discoverability (or rename this to something better than HTTP Guide)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We already have a lot of user guides and this feels like the natural place for it (where we explain starlette & fastapi behaviors) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no preference on the naming if you have a better one suggest it There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Once we add batching + a real-world example we can have a separate page for that. Not enough time before branch cut for those though There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That sounds good, we have more time for docs changes btw, so we can do it in the follow-up PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah ok I'll work on it first thing next week then |
||
## Streaming Responses | ||
|
||
```{warning} | ||
Support for HTTP streaming responses is experimental. To enable this feature, set `RAY_SERVE_ENABLE_EXPERIMENTAL_STREAMING=1` on the cluster before starting Ray. If you encounter any issues, [file an issue on GitHub](https://github.com/ray-project/ray/issues/new/choose). | ||
``` | ||
|
||
Some applications must stream incremental results back to the caller. | ||
This is common for text generation using large language models (LLMs) or video processing applications. | ||
The full forward pass may take multiple seconds, so providing incremental results as they're available provides a much better user experience. | ||
|
||
To use HTTP response streaming, return a [StreamingResponse](https://www.starlette.io/responses/#streamingresponse) that wraps a generator from your HTTP handler. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also include setting the env variable and how you would do it in yaml. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is just above. it must be set on the cluster, can't be set in the yaml There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how would you set it up on Anyscale Services? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Set the env variable in cluster env There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got it, definitely need to improve in 2.6. Let's document in Anyscale docs too once you're done with the ray changes. |
||
This is supported for basic HTTP ingress deployments using a `__call__` method and when using the [FastAPI integration](serve-fastapi-http). | ||
|
||
The code below defines a Serve application that incrementally streams numbers up to a provided `max`. | ||
The client-side code is also updated to handle the streaming outputs. | ||
This code uses the `stream=True` option to the [requests](https://requests.readthedocs.io/en/latest/user/advanced/#streaming-requests) library. | ||
|
||
```{literalinclude} ../serve/doc_code/streaming_example.py | ||
:start-after: __begin_example__ | ||
:end-before: __end_example__ | ||
:language: python | ||
``` | ||
|
||
Save this code in `stream.py` and run it: | ||
|
||
```bash | ||
$ RAY_SERVE_ENABLE_EXPERIMENTAL_STREAMING=1 python stream.py | ||
[2023-05-25 10:44:23] INFO ray._private.worker::Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 | ||
(ServeController pid=40401) INFO 2023-05-25 10:44:25,296 controller 40401 deployment_state.py:1259 - Deploying new version of deployment default_StreamingResponder. | ||
(HTTPProxyActor pid=40403) INFO: Started server process [40403] | ||
(ServeController pid=40401) INFO 2023-05-25 10:44:25,333 controller 40401 deployment_state.py:1498 - Adding 1 replica to deployment default_StreamingResponder. | ||
Got result 0.0s after start: '0' | ||
Got result 0.1s after start: '1' | ||
Got result 0.2s after start: '2' | ||
Got result 0.3s after start: '3' | ||
Got result 0.4s after start: '4' | ||
Got result 0.5s after start: '5' | ||
Got result 0.6s after start: '6' | ||
Got result 0.7s after start: '7' | ||
Got result 0.8s after start: '8' | ||
Got result 0.9s after start: '9' | ||
(ServeReplica:default_StreamingResponder pid=41052) INFO 2023-05-25 10:49:52,230 default_StreamingResponder default_StreamingResponder#qlZFCa yomKnJifNJ / default replica.py:634 - __CALL__ OK 1017.6ms | ||
``` | ||
|
||
(serve-http-adapters)= | ||
|
||
## HTTP Adapters | ||
|
@@ -190,7 +235,7 @@ PredictorDeployment.deploy(..., http_adapter=User) | |
DAGDriver.bind(other_node, http_adapter=User) | ||
|
||
``` | ||
### List of Built-in Adapters | ||
### List of built-in adapters | ||
|
||
Here is a list of adapters; please feel free to [contribute more](https://github.com/ray-project/ray/issues/new/choose)! | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using a real model as the example here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather not do that here, it complicates communicating the feature. we can add a full example later