-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Serve] Add hash function of RunningReplicaInfo #32772
[Serve] Add hash function of RunningReplicaInfo #32772
Conversation
0c64878
to
972a1ac
Compare
972a1ac
to
fae27a1
Compare
d54a3c1
to
78a872c
Compare
size = "small", | ||
srcs = serve_tests_srcs, | ||
tags = ["exclusive", "team:serve"], | ||
deps = [":serve_lib"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test file wasn't running in CI before? 😮
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exactly...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! Is it feasible to add a test which fails without this change?
] | ||
) | ||
) | ||
object.__setattr__(self, "_hash", hash_val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my education what's the benefit of this over self._hash = hash_val
? If not obvious maybe add a code comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh it is a hacky way to set attribute, since we use frozen
for this dataclass class. (Not allow to set attribute as you mentioned)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! Would be good to put this in a code comment
Signed-off-by: Sihan Wang <[email protected]>
78a872c
to
d9902cf
Compare
Signed-off-by: Sihan Wang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Learned some new things 😀
Co-authored-by: Cindy Zhang <[email protected]> Signed-off-by: Sihan Wang <[email protected]>
Linkcheck failure unrelated |
When the longpoll client timeout happens, the all internal objects will be cleaned up. The longpoll client will try to poll again. When this happens, longpoll client will receive another object which having same information but different object id (ActorHandle) from the controller. Then after compute_iterable_delta is called, it will replace the same replica in in_flight_queries, and cause router clean up ongoing request_ref, and keep assigning new request to the replica, which will break the max_concurrent_queries parameter. Related issue number Closes ray-project#32652 Signed-off-by: Sihan Wang <[email protected]>
* [Serve] Add hash function of RunningReplicaInfo (#32772) When the longpoll client timeout happens, the all internal objects will be cleaned up. The longpoll client will try to poll again. When this happens, longpoll client will receive another object which having same information but different object id (ActorHandle) from the controller. Then after compute_iterable_delta is called, it will replace the same replica in in_flight_queries, and cause router clean up ongoing request_ref, and keep assigning new request to the replica, which will break the max_concurrent_queries parameter. Related issue number Closes #32652 Signed-off-by: Sihan Wang <[email protected]> * [Serve] Fix the max_concurrent_queries issue (#33022) For the `hashable` object, __eq__ and __hash__ both need to be provided for correctness. https://docs.python.org/3.9/glossary.html#term-hashable And add tests to make sure the long poll timeout issue won't happen. --------- Signed-off-by: Sihan Wang <[email protected]>
When the longpoll client timeout happens, the all internal objects will be cleaned up. The longpoll client will try to poll again. When this happens, longpoll client will receive another object which having same information but different object id (ActorHandle) from the controller. Then after compute_iterable_delta is called, it will replace the same replica in in_flight_queries, and cause router clean up ongoing request_ref, and keep assigning new request to the replica, which will break the max_concurrent_queries parameter. Related issue number Closes ray-project#32652 Signed-off-by: Edward Oakes <[email protected]>
When the longpoll client timeout happens, the all internal objects will be cleaned up. The longpoll client will try to poll again. When this happens, longpoll client will receive another object which having same information but different object id (ActorHandle) from the controller. Then after compute_iterable_delta is called, it will replace the same replica in in_flight_queries, and cause router clean up ongoing request_ref, and keep assigning new request to the replica, which will break the max_concurrent_queries parameter. Related issue number Closes ray-project#32652
When the longpoll client timeout happens, the all internal objects will be cleaned up. The longpoll client will try to poll again. When this happens, longpoll client will receive another object which having same information but different object id (ActorHandle) from the controller. Then after compute_iterable_delta is called, it will replace the same replica in in_flight_queries, and cause router clean up ongoing request_ref, and keep assigning new request to the replica, which will break the max_concurrent_queries parameter. Related issue number Closes ray-project#32652 Signed-off-by: elliottower <[email protected]>
Why are these changes needed?
When the longpoll client timeout happens, the all internal objects will be cleaned up. The longpoll client will try to poll again. When this happens, longpoll client will receive another object which having same information but different object id (ActorHandle) from the controller.
Then after
compute_iterable_delta
is called, it will replace the same replica inin_flight_queries
, and cause router clean up ongoing request_ref, and keep assigning new request to the replica, which will break themax_concurrent_queries
parameter.Related issue number
Closes #32652
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.