Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Failed Serve Deployment are still registered, not cleanedup. Yet Couldn't list deployment. #23581

Closed
1 of 2 tasks
ZhijieWang opened this issue Mar 30, 2022 · 3 comments
Closed
1 of 2 tasks
Labels
bug Something that is supposed to be working; but isn't P2 Important issue, but not time-critical serve Ray Serve Related Issue

Comments

@ZhijieWang
Copy link

ZhijieWang commented Mar 30, 2022

Search before asking

  • I searched the issues and found no similar issues.

Ray Component

Ray Serve

Issue Severity

Low: It annoys or frustrates me for a moment.

What happened + What you expected to happen

Going through XGboost Ray Serve demo, serve failed due to a reference to local file.

Starting the interactive session
Connected to serve-demo.
[Warning] Unable to print information for interactive session with job name serve.py_03-30-2022_03:55:03. Please view cluster at https://console.anyscale.com/projects/prj_VmhLQbLgr2dnNEvzpA4sCU8Z/clusters/ses_v4tA3sLicjU7fHT2GQFSyA1E.
2022-03-29 20:55:18,476	INFO api.py:426 -- Connecting to existing Serve instance in namespace 'serve'.
2022-03-29 20:55:18,647	INFO api.py:249 -- Updating deployment 'XGB'. component=serve deployment=XGB
Traceback (most recent call last):
  File "serve.py", line 33, in <module>
    XGB.deploy()
  File "/opt/homebrew/anaconda3/lib/python3.8/site-packages/ray/serve/api.py", line 806, in deploy
    return _get_global_client().deploy(
  File "/opt/homebrew/anaconda3/lib/python3.8/site-packages/ray/serve/api.py", line 93, in check
    return f(self, *args, **kwargs)
  File "/opt/homebrew/anaconda3/lib/python3.8/site-packages/ray/serve/api.py", l
import pickle
import json
import ray
from ray import serve

@serve.deployment(num_replicas=2, route_prefix="/regressor")
class XGB:
    def __init__(self):
        with open("model.pkl", "rb") as f:
ine 255, in deploy
    self._wait_for_goal(goal_id)
  File "/opt/homebrew/anaconda3/lib/python3.8/site-packages/ray/serve/api.py", line 182, in _wait_for_goal
    async_goal_exception = ray.get(ready)[0]
  File "/opt/homebrew/anaconda3/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
    return getattr(ray, func.__name__)(*args, **kwargs)
  File "/opt/homebrew/anaconda3/lib/python3.8/site-packages/ray/util/client/api.py", line 42, in get
    return self.worker.get(vals, timeout=timeout)
  File "/opt/homebrew/anaconda3/lib/python3.8/site-packages/ray/util/client/worker.py", line 359, in get
    res = self._get(to_get, op_timeout)
  File "/opt/homebrew/anaconda3/lib/python3.8/site-packages/ray/util/client/worker.py", line 386, in _get
    raise err
types.RayTaskError(RuntimeError): ray::ServeController.wait_for_goal() (pid=537, ip=10.0.221.129)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/serve/utils.py", line 241, in wrap_to_ray_error
    raise exception
RuntimeError: Failed to reach deployment goal. Check the serve logs for details.
(base) ➜  serve-demo vim serve.py
(base) ➜  serve-demo cat serve.py

It is related to #15632

However, the log above indicates there is an existing deployment, so does the Ray dashboard. The problem is, I was not able to list the serve deployment.
Screen Shot 2022-03-29 at 9 15 05 PM
Screen Shot 2022-03-29 at 9 14 58 PM

Versions / Dependencies

Ray 1.11.0
Xgboost
python 3.8

Reproduction script

import pickle
import json
import ray
from ray import serve

@serve.deployment(num_replicas=2, route_prefix="/regressor")
class XGB:
    def __init__(self):
        with open("model.pkl", "rb") as f:
            self.model = pickle.load(f)

    async def __call__(self, starlette_request):
        payload = await starlette_request.json()
        print("Worker: received starlette request with data", payload)

        input_vector = [
            payload["Pregnancies"],
            payload["Glucose"],
            payload["Blood Pressure"],
            payload["Skin Thickness"],
            payload["Insulin"],
            payload["BMI"],
            payload["DiabetesPedigree"],
            payload["Age"],
        ]
        prediction = self.model.predict([input_vector])[0]
        return {"result": prediction}

ray.init("anyscale://serve-demo")
# now we initialize /connect to the Ray service
serve.start(detached=True)
# Deploy the model.
XGB.deploy()

Anything else

Also, i searched around, there is no way to kill serve if it is started with detach mode. At least nothing in the doc.

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@ZhijieWang ZhijieWang added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Mar 30, 2022
@simon-mo simon-mo added serve Ray Serve Related Issue platform and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Mar 30, 2022
@simon-mo
Copy link
Contributor

no way to kill serve if it is started with detach mode
This is fixed in #23476

@simon-mo simon-mo added the P2 Important issue, but not time-critical label Mar 30, 2022
@ZhijieWang
Copy link
Author

thanks

@ZhijieWang
Copy link
Author

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P2 Important issue, but not time-critical serve Ray Serve Related Issue
Projects
None yet
Development

No branches or pull requests

2 participants