-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[xray] Hide "append-only log" semantics in global state API. #2852
Comments
For client table, it makes sense to have a single entry for each client. For object table, we might want to have all "current" clients for this object, so that people can know where the object is stored. For task table, it might be helpful to provide an option to return a list of entries, so that developers can leverage this information for debugging purpose, e.g. investigate failure & reconstruction for a task. Thoughts? |
For the object table, we could have one entry per object, and that entry could include a list of clients or of creation/eviction events. For the task table, we actually don't store a log in the GCS, updates to the task table overwrite the current entry. This is the case because we use Lines 440 to 451 in 588c573
|
I agree that Client table should have one entry per client. The bug that I'm trying to fix is caused by the multi-entry for one client. |
Maybe we can create a new class named |
Ok, I understand now. The However, even with an |
|
The I think it's good to keep the whole record of the nodes that joined and left the cluster around, since that information may be useful for debugging and other reasons. Even if we use a
So we still need to handle the case where a node manager tries to connect to a dead node manager. |
Certain global state commands expose unnecessary implementation details.
ray.global_state.client_table()
returns a log, which can contain multiple entries for the same "client". This came up in [tune] Trial executor crashes on node removal in xray #2851.ray.global_state.object_table()
returns a list of entries for each object ID, we should probably just have one entry per object ID.ray.global_state.task_table()
returns a list of entries for each task ID, which should be a single entry.The text was updated successfully, but these errors were encountered: