[Serve] Migrate from Flask.Request to Starlette Request (#12852)

ray-project · Dec 21, 2020 · 8b4b4bf · 8b4b4bf
1 parent 5b48480
commit 8b4b4bf
Show file tree

Hide file tree

Showing 26 changed files with 140 additions and 174 deletions.
diff --git a/doc/source/serve/faq.rst b/doc/source/serve/faq.rst
@@ -117,33 +117,33 @@ policies <serve-split-traffic>`, finding the next available replica, and
 batching requests together.
 
 When the request arrives in the model, you can access the data similarly to how
-you would with HTTP request. Here are some examples how ServeRequest mirrors Flask.Request:
+you would with HTTP request. Here are some examples how ServeRequest mirrors Starlette.Request:
 
 .. list-table::
    :header-rows: 1
 
    * - HTTP
      - ServeHandle
      - | Request
-       | (Flask.Request and ServeRequest)
+       | (Starlette.Request and ServeRequest)
    * - ``requests.get(..., headers={...})``
      - ``handle.options(http_headers={...})``
      - ``request.headers``
    * - ``requests.post(...)``
      - ``handle.options(http_method="POST")``
-     - ``requests.method``
-   * - ``request.get(..., json={...})``
+     - ``request.method``
+   * - ``requests.get(..., json={...})``
      - ``handle.remote({...})``
-     - ``request.json``
-   * - ``request.get(..., form={...})``
+     - ``await request.json()``
+   * - ``requests.get(..., form={...})``
      - ``handle.remote({...})``
-     - ``request.form``
-   * - ``request.get(..., params={"a":"b"})``
+     - ``await request.form()``
+   * - ``requests.get(..., params={"a":"b"})``
      - ``handle.remote(a="b")``
-     - ``request.args``
-   * - ``request.get(..., data="long string")``
+     - ``request.query_params``
+   * - ``requests.get(..., data="long string")``
      - ``handle.remote("long string")``
-     - ``request.data``
+     - ``await request.body()``
    * - ``N/A``
      - ``handle.remote(python_object)``
      - ``request.data``
@@ -157,9 +157,9 @@ you would with HTTP request. Here are some examples how ServeRequest mirrors Fla
 
     .. code-block:: python
 
-        import flask
+        import starlette.requests
 
-        if isinstance(request, flask.Request):
+        if isinstance(request, starlette.requests.Request):
             print("Request coming from web!")
         elif isinstance(request, ServeRequest):
             print("Request coming from Python!")
@@ -170,10 +170,10 @@ you would with HTTP request. Here are some examples how ServeRequest mirrors Fla
 
     .. code-block:: python
 
-        handle.remote(flask_request)
+        handle.remote(starlette_request)
 
     In this case, Serve will `not` wrap it in ServeRequest. You can directly
-    process the request as a ``flask.Request``.
+    process the request as a ``starlette.requests.Request``.
 
 How fast is Ray Serve?
 ----------------------
@@ -187,13 +187,6 @@ You can checkout our `microbenchmark instruction <https://github.com/ray-project
 to benchmark on your hardware.
 
 
-Does Ray Serve use Flask?
--------------------------
-Flask is only used as a web request object for servable to consume the data.
-We actually use the fastest Python web server: `Uvicorn <https://www.uvicorn.org/>`_ as our web server,
-alongside with the power of Python asyncio.
-**Flask is ONLY the request object that we are using, Uvicorn (not flask) provides the webserver.**
-
 Can I use asyncio along with Ray Serve?
 ---------------------------------------
 Yes! You can make your servable methods ``async def`` and Serve will run them

diff --git a/doc/source/serve/index.rst b/doc/source/serve/index.rst
@@ -33,6 +33,9 @@ Since Serve is built on Ray, it also allows you to scale to many machines, in yo
   If you want to try out Serve, join our `community slack <https://forms.gle/9TSdDYUgxYs8SA9e8>`_
   and discuss in the #serve channel.
 
+.. note::
+  Starting with Ray version 1.3.0, Ray Serve backends must take in a Starlette Request object instead of a Flask Request object.  
+  See the `migration guide <https://docs.google.com/document/d/1CG4y5WTTc4G_MRQGyjnb_eZ7GK3G9dUX6TNLKLnKRAc/edit?usp=sharing>`_ for details.
 
 Installation
 ============

diff --git a/doc/source/serve/key-concepts.rst b/doc/source/serve/key-concepts.rst
@@ -19,10 +19,8 @@ Backends
 Backends define the implementation of your business logic or models that will handle requests when queries come in to :ref:`serve-endpoint`.
 In order to support seamless scalability backends can have many replicas, which are individual processes running in the Ray cluster to handle requests.
 To define a backend, first you must define the "handler" or the business logic you'd like to respond with.
-The handler should take as input a `Flask Request object <https://flask.palletsprojects.com/en/1.1.x/api/?highlight=request#flask.Request>`_. 
-The handler should return any JSON-serializable object as output.  For a more customizable response type, the handler may return a 
+The handler should take as input a `Starlette Request object <https://www.starlette.io/requests/>`_ and return any JSON-serializable object as output.  For a more customizable response type, the handler may return a 
 `Starlette Response object <https://www.starlette.io/responses/>`_.  
-In the future, Ray Serve will support `Starlette Request objects <https://www.starlette.io/requests/>`_ as input as well.
 
 A backend is defined using :mod:`client.create_backend <ray.serve.api.Client.create_backend>`, and the implementation can be defined as either a function or a class.
 Use a function when your response is stateless and a class when you might need to maintain some state (like a model).
@@ -32,15 +30,15 @@ A backend consists of a number of *replicas*, which are individual copies of the
 
 .. code-block:: python
 
-  def handle_request(flask_request):
+  def handle_request(starlette_request):
     return "hello world"
 
   class RequestHandler:
     # Take the message to return as an argument to the constructor.
     def __init__(self, msg):
         self.msg = msg
 
-    def __call__(self, flask_request):
+    def __call__(self, starlette_request):
         return self.msg
 
   client.create_backend("simple_backend", handle_request)

diff --git a/doc/source/serve/package-ref.rst b/doc/source/serve/package-ref.rst
@@ -23,7 +23,7 @@ Handle API
     :members: remote, options
 
 When calling from Python, the backend implementation will receive ``ServeRequest``
-objects instead of Flask requests.
+objects instead of Starlette requests.
 
 .. autoclass:: ray.serve.utils.ServeRequest
     :members:

diff --git a/doc/source/serve/tutorials/batch.rst b/doc/source/serve/tutorials/batch.rst
@@ -30,13 +30,13 @@ You can use the ``@serve.accept_batch`` decorator to annotate a function or a cl
 This annotation is needed because batched backends have different APIs compared
 to single request backends. In a batched backend, the inputs are a list of values.
 
-For single query backend, the input type is a single Flask request or
+For single query backend, the input type is a single Starlette request or
 :mod:`ServeRequest <ray.serve.utils.ServeRequest>`:
 
 .. code-block:: python
 
     def single_request(
-        request: Union[Flask.Request, ServeRequest],
+        request: Union[starlette.requests.Request, ServeRequest],
     ):
         pass
 
@@ -47,7 +47,7 @@ types:
 
     @serve.accept_batch
     def batched_request(
-        request: List[Union[Flask.Request, ServeRequest]],
+        request: List[Union[starlette.requests.Request, ServeRequest]],
     ):
         pass
 
@@ -84,8 +84,8 @@ Ray Serve was able to evaluate them in batches.
 
 What if you want to evaluate a whole batch in Python? Ray Serve allows you to send
 queries via the Python API. A batch of queries can either come from the web server
-or the Python API. Requests coming from the Python API will have the similar API
-as Flask.Request. See more on the API :ref:`here<serve-handle-explainer>`.
+or the Python API. Requests coming from the Python API will have a similar API
+to Starlette Request. See more on the API :ref:`here<serve-handle-explainer>`.
 
 .. literalinclude:: ../../../../python/ray/serve/examples/doc/tutorial_batch.py
     :start-after: __doc_define_servable_v1_begin__

diff --git a/python/ray/serve/examples/doc/quickstart_class.py b/python/ray/serve/examples/doc/quickstart_class.py
@@ -10,7 +10,7 @@ class Counter:
     def __init__(self):
         self.count = 0
 
-    def __call__(self, flask_request):
+    def __call__(self, starlette_request):
         self.count += 1
         return {"current_counter": self.count}
 

diff --git a/python/ray/serve/examples/doc/quickstart_function.py b/python/ray/serve/examples/doc/quickstart_function.py
@@ -6,8 +6,8 @@
 client = serve.start()
 
 
-def echo(flask_request):
-    return "hello " + flask_request.args.get("name", "serve!")
+def echo(starlette_request):
+    return "hello " + starlette_request.query_params.get("name", "serve!")
 
 
 client.create_backend("hello", echo)

diff --git a/python/ray/serve/examples/doc/snippet_model_composition.py b/python/ray/serve/examples/doc/snippet_model_composition.py
@@ -16,13 +16,13 @@
 
 
 def model_one(request):
-    print("Model 1 called with data ", request.args.get("data"))
+    print("Model 1 called with data ", request.query_params.get("data"))
     return random()
 
 
 def model_two(request):
-    print("Model 2 called with data ", request.args.get("data"))
-    return request.args.get("data")
+    print("Model 2 called with data ", request.query_params.get("data"))
+    return request.query_params.get("data")
 
 
 class ComposedModel:
@@ -32,8 +32,8 @@ def __init__(self):
         self.model_two = client.get_handle("model_two")
 
     # This method can be called concurrently!
-    async def __call__(self, flask_request):
-        data = flask_request.data
+    async def __call__(self, starlette_request):
+        data = await starlette_request.body()
 
         score = await self.model_one.remote(data=data)
         if score > 0.5:

diff --git a/python/ray/serve/examples/doc/tutorial_batch.py b/python/ray/serve/examples/doc/tutorial_batch.py
@@ -14,8 +14,10 @@
 
 # __doc_define_servable_v0_begin__
 @serve.accept_batch
-def batch_adder_v0(flask_requests: List):
-    numbers = [int(request.args["number"]) for request in flask_requests]
+def batch_adder_v0(starlette_requests: List):
+    numbers = [
+        int(request.query_params["number"]) for request in starlette_requests
+    ]
 
     input_array = np.array(numbers)
     print("Our input array has shape:", input_array.shape)
@@ -58,7 +60,7 @@ def send_query(number):
 # __doc_define_servable_v1_begin__
 @serve.accept_batch
 def batch_adder_v1(requests: List):
-    numbers = [int(request.args["number"]) for request in requests]
+    numbers = [int(request.query_params["number"]) for request in requests]
     input_array = np.array(numbers)
     print("Our input array has shape:", input_array.shape)
     # Sleep for 200ms, this could be performing CPU intensive computation

diff --git a/python/ray/serve/examples/doc/tutorial_deploy.py b/python/ray/serve/examples/doc/tutorial_deploy.py
@@ -48,9 +48,9 @@ def __init__(self):
         with open("/tmp/iris_labels.json") as f:
             self.label_list = json.load(f)
 
-    def __call__(self, flask_request):
-        payload = flask_request.json
-        print("Worker: received flask request with data", payload)
+    async def __call__(self, starlette_request):
+        payload = await starlette_request.json()
+        print("Worker: received starlette request with data", payload)
 
         input_vector = [
             payload["sepal length"],
@@ -143,9 +143,9 @@ def __init__(self):
         with open("/tmp/iris_labels_2.json") as f:
             self.label_list = json.load(f)
 
-    def __call__(self, flask_request):
-        payload = flask_request.json
-        print("Worker: received flask request with data", payload)
+    async def __call__(self, starlette_request):
+        payload = await starlette_request.json()
+        print("Worker: received starlette request with data", payload)
 
         input_vector = [
             payload["sepal length"],

diff --git a/python/ray/serve/examples/doc/tutorial_pytorch.py b/python/ray/serve/examples/doc/tutorial_pytorch.py
@@ -27,8 +27,8 @@ def __init__(self):
                 mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
         ])
 
-    def __call__(self, flask_request):
-        image_payload_bytes = flask_request.data
+    async def __call__(self, starlette_request):
+        image_payload_bytes = await starlette_request.body()
         pil_image = Image.open(BytesIO(image_payload_bytes))
         print("[1/3] Parsed image data: {}".format(pil_image))
 

diff --git a/python/ray/serve/examples/doc/tutorial_sklearn.py b/python/ray/serve/examples/doc/tutorial_sklearn.py
@@ -54,9 +54,9 @@ def __init__(self):
         with open(LABEL_PATH) as f:
             self.label_list = json.load(f)
 
-    def __call__(self, flask_request):
-        payload = flask_request.json
-        print("Worker: received flask request with data", payload)
+    async def __call__(self, starlette_request):
+        payload = await starlette_request.json()
+        print("Worker: received starlette request with data", payload)
 
         input_vector = [
             payload["sepal length"],

diff --git a/python/ray/serve/examples/doc/tutorial_tensorflow.py b/python/ray/serve/examples/doc/tutorial_tensorflow.py
@@ -51,10 +51,10 @@ def __init__(self, model_path):
         self.model_path = model_path
         self.model = tf.keras.models.load_model(model_path)
 
-    def __call__(self, flask_request):
+    async def __call__(self, starlette_request):
         # Step 1: transform HTTP request -> tensorflow input
         # Here we define the request schema to be a json array.
-        input_array = np.array(flask_request.json["array"])
+        input_array = np.array((await starlette_request.json())["array"])
         reshaped_array = input_array.reshape((1, 28, 28))
 
         # Step 2: tensorflow input -> tensorflow output

diff --git a/python/ray/serve/examples/echo.py b/python/ray/serve/examples/echo.py
@@ -9,8 +9,8 @@
 from ray import serve
 
 
-def echo(flask_request):
-    return ["hello " + flask_request.args.get("name", "serve!")]
+def echo(starlette_request):
+    return ["hello " + starlette_request.query_params.get("name", "serve!")]
 
 
 client = serve.start()

diff --git a/python/ray/serve/examples/echo_actor.py b/python/ray/serve/examples/echo_actor.py
@@ -1,6 +1,6 @@
 """
 Example actor that adds an increment to a number. This number can
-come from either web (parsing Flask request) or python call.
+come from either web (parsing Starlette request) or python call.
 
 This actor can be called from HTTP as well as from Python.
 """
@@ -30,9 +30,10 @@ class MagicCounter:
     def __init__(self, increment):
         self.increment = increment
 
-    def __call__(self, flask_request, base_number=None):
+    def __call__(self, starlette_request, base_number=None):
         if serve.context.web:
-            base_number = int(flask_request.args.get("base_number", "0"))
+            base_number = int(
+                starlette_request.query_params.get("base_number", "0"))
         return base_number + self.increment
 
 

diff --git a/python/ray/serve/examples/echo_actor_batch.py b/python/ray/serve/examples/echo_actor_batch.py
@@ -1,6 +1,6 @@
 """
 Example actor that adds an increment to a number. This number can
-come from either web (parsing Flask request) or python call.
+come from either web (parsing Starlette request) or python call.
 The queries incoming to this actor are batched.
 This actor can be called from HTTP as well as from Python.
 """
@@ -31,12 +31,13 @@ def __init__(self, increment):
         self.increment = increment
 
     @serve.accept_batch
-    def __call__(self, flask_request_list, base_number=None):
+    def __call__(self, starlette_request_list, base_number=None):
         # batch_size = serve.context.batch_size
         if serve.context.web:
             result = []
-            for flask_request in flask_request_list:
-                base_number = int(flask_request.args.get("base_number", "0"))
+            for starlette_request in starlette_request_list:
+                base_number = int(
+                    starlette_request.query_params.get("base_number", "0"))
                 result.append(base_number)
             return list(map(lambda x: x + self.increment, result))
         else:

diff --git a/python/ray/serve/examples/echo_batching.py b/python/ray/serve/examples/echo_batching.py
@@ -11,7 +11,7 @@ def __init__(self, increment):
         self.increment = increment
 
     @serve.accept_batch
-    def __call__(self, flask_request, base_number=None):
+    def __call__(self, starlette_request, base_number=None):
         # __call__ fn should preserve the batch size
         # base_number is a python list
 

diff --git a/python/ray/serve/examples/echo_full.py b/python/ray/serve/examples/echo_full.py
@@ -12,8 +12,8 @@
 
 # a backend can be a function or class.
 # it can be made to be invoked from web as well as python.
-def echo_v1(flask_request):
-    response = flask_request.args.get("response", "web")
+def echo_v1(starlette_request):
+    response = starlette_request.query_params.get("response", "web")
     return response
 
 
@@ -32,7 +32,7 @@ def echo_v1(flask_request):
 
 
 # We can also add a new backend and split the traffic.
-def echo_v2(flask_request):
+def echo_v2(starlette_request):
     # magic, only from web.
     return "something new"