-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Train] TorchCheckpoint: Specifying pickle_protocol in torch.save()
#35615
[Train] TorchCheckpoint: Specifying pickle_protocol in torch.save()
#35615
Conversation
Signed-off-by: woshiyyya <[email protected]>
Signed-off-by: woshiyyya <[email protected]>
Signed-off-by: woshiyyya <[email protected]>
Signed-off-by: woshiyyya <[email protected]>
Signed-off-by: woshiyyya <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this! Agree with @matthewdeng, let's use DEFAULT_PROTOCOL here
@woshiyyya Please make sure to also create the cherry pick PR yourself after this is merged! |
Signed-off-by: woshiyyya <[email protected]>
…dump_with_pickle_5
Signed-off-by: woshiyyya <[email protected]>
6c00a3c
to
b600ded
Compare
Signed-off-by: woshiyyya <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you fix the conflict and ping me when tests pass?
torch.save()
torch.save()
…dump_with_pickle_5
@matthewdeng Conflicts resolved! Will ping you after CI passed. |
Signed-off-by: woshiyyya <[email protected]>
Signed-off-by: woshiyyya <[email protected]>
f5870d8
to
374a4d7
Compare
Signed-off-by: woshiyyya <[email protected]>
Signed-off-by: woshiyyya <[email protected]>
…dump_with_pickle_5
Signed-off-by: woshiyyya <[email protected]>
Signed-off-by: woshiyyya <[email protected]>
…dump_with_pickle_5
Signed-off-by: woshiyyya <[email protected]>
…ray-project#35615) Signed-off-by: woshiyyya <[email protected]>
…#35615) (#35790) Signed-off-by: woshiyyya <[email protected]>
…ray-project#35615) Signed-off-by: woshiyyya <[email protected]>
…ray-project#35615) Signed-off-by: woshiyyya <[email protected]> Signed-off-by: e428265 <[email protected]>
Why are these changes needed?
TorchCheckpoint failed to serialize large model (>4GiB) and throw serialization error. Specifying
pickle_protocol=5
to resolve this issue.Details in #35611
Related issue number
Close #35611
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.