You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched the issues and found no similar issues.
Ray Component
Ray Serve
Issue Severity
High: It blocks me to complete my task.
What happened + What you expected to happen
Error seems legit to me, filing P0 and self-assigned for now.
(ServeController pid=769) 2022-04-05 15:16:52,987 INFO http_state.py:113 -- Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:sGYxOM:SERVE_PROXY_ACTOR-node:172.31.92.105-0' on node 'node:172.31.92.105-0' listening on '127.0.0.1:8000'
2022-04-05 15:16:54,419 INFO api.py:827 -- Started Serve instance in namespace 'e170177d-9c49-449f-a67a-44edd4b0ef22'.
2022-04-05 15:16:54,420 INFO single_deployment_1k_noop_replica.py:116 -- Ray serve http_host: 127.0.0.1, http_port: 8000
2022-04-05 15:16:54,420 INFO single_deployment_1k_noop_replica.py:118 -- Deploying with 1000 target replicas ....
2022-04-05 15:16:54,425 INFO api.py:647 -- Updating deployment 'echo'. component=serve deployment=echo
(HTTPProxyActor pid=816) INFO: Started server process [816]
(ServeController pid=769) 2022-04-05 15:16:54,520 INFO deployment_state.py:1211 -- Adding 1000 replicas to deployment 'echo'. component=serve deployment=echo
Traceback (most recent call last):
File "workloads/single_deployment_1k_noop_replica.py", line 152, in <module>
main()
File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "workloads/single_deployment_1k_noop_replica.py", line 119, in main
all_endpoints = deploy_replicas(num_replicas, max_batch_size)
File "workloads/single_deployment_1k_noop_replica.py", line 72, in deploy_replicas
Echo.deploy()
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/serve/deployment.py", line 250, in deploy
_blocking=_blocking,
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/serve/api.py", line 190, in check
return f(self, *args, **kwargs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/serve/api.py", line 394, in deploy
self._wait_for_deployment_healthy(name)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/serve/api.py", line 328, in _wait_for_deployment_healthy
raise RuntimeError(f"Deployment {name} is UNHEALTHY: {status.message}")
RuntimeError: Deployment echo is UNHEALTHY: Failed to update deployment:
'>' not supported between instances of 'NoneType' and 'int'.
(ServeController pid=769) 2022-04-05 15:17:24,967 INFO deployment_state.py:1237 -- Removing 1000 replicas from deployment 'echo'. component=serve deployment=echo
(ServeController pid=769) 2022-04-05 15:17:29,812 INFO http_state.py:113 -- Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:sGYxOM:SERVE_PROXY_ACTOR-node:172.31.74.200-0' on node 'node:172.31.74.200-0' listening on '127.0.0.1:8000'
Versions / Dependencies
nightly
Reproduction script
.
Anything else
.
Are you willing to submit a PR?
Yes I am willing to submit a PR!
The text was updated successfully, but these errors were encountered:
i tried re-running serve_single_deployment_1k_noop_replica test on release tool using the wheel from 21ee7e4 but ran into some import errors. Will give it another try later after we have the rebased wheel on this commit .
Traceback (most recent call last):
File "run_release_test.py", line 137, in main
no_terminate=no_terminate,
File "/Users/jiaodong/Workspace/ray/release/ray_release/glue.py", line 314, in run_release_test
raise pipeline_exception
File "/Users/jiaodong/Workspace/ray/release/ray_release/glue.py", line 204, in run_release_test
command_runner.prepare_remote_env()
File "/Users/jiaodong/Workspace/ray/release/ray_release/command_runner/sdk_runner.py", line 58, in prepare_remote_env
) from e
ray_release.exception.RemoteEnvSetupError: Error setting up remote environment: No module named 'ray.job_submission'
so im closing this issue triggered by nightly as it suggests we might be seeing an unhealthy replica but failed to retrieve correct exception message. I will watch nightly for a few more days this week, and will re-open this issue if the problem persists on nightly.
Search before asking
Ray Component
Ray Serve
Issue Severity
High: It blocks me to complete my task.
What happened + What you expected to happen
Error seems legit to me, filing P0 and self-assigned for now.
Versions / Dependencies
nightly
Reproduction script
.
Anything else
.
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: