-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Workflow] Serialization cleanup #18328
[Workflow] Serialization cleanup #18328
Conversation
…workflow_objectref
@@ -197,7 +229,13 @@ def commit_step(store: workflow_storage.WorkflowStorage, | |||
""" | |||
from ray.workflow.common import Workflow | |||
if isinstance(ret, Workflow): | |||
store.save_subworkflow(ret) | |||
assert not ret.executed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite like the new logic here, but I don't like the old logic as well 😅
The new one use a lot of apis inside workflow_storage which originally we want to hide from up layer.
The old one coupled storage with app layer a lot.
We probably can think about the positioning of workflow_storage
a little bit.
Let's revisit what we have right now (with this PR):
- storage: an abstraction for database
- workflow_storage: contains the schema for workflow and use storage to write
- key for different things
- logic for how to updating things for different cases
- serialization: how to convert objects to bytes + dumps
I feel once we have open in workflow_storage, we'll decouple serialization/workflow_storage (right now, we need key and also serialization.dump_to_storage() to make things work.
Then the next thing is about how to reduce the application-related code in workflow storage. One thing we can do is to move them to the place where they are used like what's doing here.
So basically, we are going to have:
- storage: an abstraction for database
- workflow_storage: contains the schema for workflow build on top of the storage
- serialization: how to convert objects to bytes, like a pickler.
- application logic will be put in the application layer. and use serialization + workflow_storage to support specific functions
Just a pre-review of the PR and I feel the structure now is much better than before. Let me know when it's fully ready for a review. Thanks for the effort here to make things clean! |
@wuisawesome could you try this test
Another thing is that, it looks like quote is added for the key which we might want to remove:
|
Nice catch (this bug was introduced during the upload dedupe PR), but the fix was easy enough, so it's now included here.
hmmm i don't see this issue locally, my paths look like |
@@ -82,6 +84,26 @@ def test_dedupe_serialization_2(workflow_start_regular_shared): | |||
assert get_num_uploads() == 2 | |||
|
|||
|
|||
def test_same_object_many_workflows(workflow_start_regular_shared): | |||
"""Ensure that when we dedupe uploads, we upload the object once per workflow, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means object ref can't be shared across jobs. I think it makes sense right now. Maybe we can think more about how to make it more efficient in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah ideally we could share them across jobs, but we would have to worry a lot more about garbage collection.
I see. I think it's because I'm using Linux. I'm OK with this one for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG. Please fix the test first.
Wait sorry, which test? Or do you mean the print statement? |
Why are these changes needed?
This PR cleans up some workflow serialization code. In particular, it removes the circular dependency between workflow_storage and serialization. Now serialization is a lower level layer, that only interacts with the underlying base storage.
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.