-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retrieving the Trained Model #1094
Comments
Also interested in this. Did you ever figure it out? |
Not as of now. |
Sorry for replying late. In section "Option 2", you can see:
The return object is a
or
(Reference: https://pytorch.org/tutorials/beginner/saving_loading_models.html) |
I think the question (at least for me) was if we could turn the model back into the non-pipelined version for modification and saving? |
Hmm, do you mean getting back the full model at the end of training, but before saving the final checkpoint? That said, imagine we would do a torch.load later, that would be a good time for gluing the model back together, because: It is only a matter of loading from a single checkpoint file vs multiple checkpoint files. |
OK, so here is what I want to do, Obtain gradients of each layer from each rank of the stage from the pipe object, and send it to the CPUs. Get some modifications done on the gradients on the CPU, then bring it back to the subsequent ranks of the pipeline stage and update the model with modified gradients. Is this possible with Pippy? |
How can we get back our trained model once we train using the pipe object and Gpipe Scheduler as a normal nn.Module class?
The text was updated successfully, but these errors were encountered: