Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

always_save_nemo not working properly #10481

Closed
aimarz opened this issue Sep 14, 2024 · 3 comments
Closed

always_save_nemo not working properly #10481

aimarz opened this issue Sep 14, 2024 · 3 comments
Labels
bug Something isn't working stale

Comments

@aimarz
Copy link

aimarz commented Sep 14, 2024

I'm using nemo 24.07 container to train LLMs.

I want to save checkpoints in .nemo format, so I set exp_manager.checkpoint_callback_params.always_save_nemo: true. And it works in the sense that it saves the checkpoint as a .nemo, but the problem is that all the files get the same name, regardless of the step. Therefore, it constantly replaces the same .nemo file, and, in the end of the training, I just get the last checkpoint. So it is useless.

I have tried setting postfix: {step}.nemo like it is done with the megatron checkpoint, but it does not work. It does not seem like any .format() is called on it, so I don't know how to change the name depending on the step number.

Please fix this issue somehow.

@aimarz aimarz added the bug Something isn't working label Sep 14, 2024
@ericharper
Copy link
Collaborator

We likely won't be able to update this ourselves but if you'd like to submit a fix, we can help review it.

Thanks!

Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Oct 26, 2024
Copy link
Contributor

github-actions bot commented Nov 2, 2024

This issue was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

2 participants