[Bug] Failed Serve Deployment are still registered, not cleanedup. Yet Couldn't list deployment. #23581

ZhijieWang · 2022-03-30T04:18:05Z

Search before asking

I searched the issues and found no similar issues.

Ray Component

Ray Serve

Issue Severity

Low: It annoys or frustrates me for a moment.

What happened + What you expected to happen

Going through XGboost Ray Serve demo, serve failed due to a reference to local file.

Starting the interactive session
Connected to serve-demo.
[Warning] Unable to print information for interactive session with job name serve.py_03-30-2022_03:55:03. Please view cluster at https://console.anyscale.com/projects/prj_VmhLQbLgr2dnNEvzpA4sCU8Z/clusters/ses_v4tA3sLicjU7fHT2GQFSyA1E.
2022-03-29 20:55:18,476	INFO api.py:426 -- Connecting to existing Serve instance in namespace 'serve'.
2022-03-29 20:55:18,647	INFO api.py:249 -- Updating deployment 'XGB'. component=serve deployment=XGB
Traceback (most recent call last):
  File "serve.py", line 33, in <module>
    XGB.deploy()
  File "/opt/homebrew/anaconda3/lib/python3.8/site-packages/ray/serve/api.py", line 806, in deploy
    return _get_global_client().deploy(
  File "/opt/homebrew/anaconda3/lib/python3.8/site-packages/ray/serve/api.py", line 93, in check
    return f(self, *args, **kwargs)
  File "/opt/homebrew/anaconda3/lib/python3.8/site-packages/ray/serve/api.py", l
import pickle
import json
import ray
from ray import serve

@serve.deployment(num_replicas=2, route_prefix="/regressor")
class XGB:
    def __init__(self):
        with open("model.pkl", "rb") as f:
ine 255, in deploy
    self._wait_for_goal(goal_id)
  File "/opt/homebrew/anaconda3/lib/python3.8/site-packages/ray/serve/api.py", line 182, in _wait_for_goal
    async_goal_exception = ray.get(ready)[0]
  File "/opt/homebrew/anaconda3/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
    return getattr(ray, func.__name__)(*args, **kwargs)
  File "/opt/homebrew/anaconda3/lib/python3.8/site-packages/ray/util/client/api.py", line 42, in get
    return self.worker.get(vals, timeout=timeout)
  File "/opt/homebrew/anaconda3/lib/python3.8/site-packages/ray/util/client/worker.py", line 359, in get
    res = self._get(to_get, op_timeout)
  File "/opt/homebrew/anaconda3/lib/python3.8/site-packages/ray/util/client/worker.py", line 386, in _get
    raise err
types.RayTaskError(RuntimeError): ray::ServeController.wait_for_goal() (pid=537, ip=10.0.221.129)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/serve/utils.py", line 241, in wrap_to_ray_error
    raise exception
RuntimeError: Failed to reach deployment goal. Check the serve logs for details.
(base) ➜  serve-demo vim serve.py
(base) ➜  serve-demo cat serve.py

It is related to #15632

However, the log above indicates there is an existing deployment, so does the Ray dashboard. The problem is, I was not able to list the serve deployment.

Versions / Dependencies

Ray 1.11.0
Xgboost
python 3.8

Reproduction script

import pickle
import json
import ray
from ray import serve

@serve.deployment(num_replicas=2, route_prefix="/regressor")
class XGB:
    def __init__(self):
        with open("model.pkl", "rb") as f:
            self.model = pickle.load(f)

    async def __call__(self, starlette_request):
        payload = await starlette_request.json()
        print("Worker: received starlette request with data", payload)

        input_vector = [
            payload["Pregnancies"],
            payload["Glucose"],
            payload["Blood Pressure"],
            payload["Skin Thickness"],
            payload["Insulin"],
            payload["BMI"],
            payload["DiabetesPedigree"],
            payload["Age"],
        ]
        prediction = self.model.predict([input_vector])[0]
        return {"result": prediction}

ray.init("anyscale://serve-demo")
# now we initialize /connect to the Ray service
serve.start(detached=True)
# Deploy the model.
XGB.deploy()

Anything else

Also, i searched around, there is no way to kill serve if it is started with detach mode. At least nothing in the doc.

Are you willing to submit a PR?

Yes I am willing to submit a PR!

The text was updated successfully, but these errors were encountered:

simon-mo · 2022-03-30T18:06:06Z

no way to kill serve if it is started with detach mode
This is fixed in #23476

ZhijieWang · 2022-03-30T18:26:18Z

thanks

ZhijieWang · 2022-04-11T19:19:16Z

thanks

ZhijieWang added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Mar 30, 2022

simon-mo added serve Ray Serve Related Issue platform and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Mar 30, 2022

simon-mo added the P2 Important issue, but not time-critical label Mar 30, 2022

simon-mo closed this as completed Apr 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Failed Serve Deployment are still registered, not cleanedup. Yet Couldn't list deployment. #23581

[Bug] Failed Serve Deployment are still registered, not cleanedup. Yet Couldn't list deployment. #23581

ZhijieWang commented Mar 30, 2022 •

edited

Loading

simon-mo commented Mar 30, 2022

ZhijieWang commented Mar 30, 2022

ZhijieWang commented Apr 11, 2022

[Bug] Failed Serve Deployment are still registered, not cleanedup. Yet Couldn't list deployment. #23581

[Bug] Failed Serve Deployment are still registered, not cleanedup. Yet Couldn't list deployment. #23581

Comments

ZhijieWang commented Mar 30, 2022 • edited Loading

Search before asking

Ray Component

Issue Severity

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Anything else

Are you willing to submit a PR?

simon-mo commented Mar 30, 2022

ZhijieWang commented Mar 30, 2022

ZhijieWang commented Apr 11, 2022

ZhijieWang commented Mar 30, 2022 •

edited

Loading