-
Notifications
You must be signed in to change notification settings - Fork 26.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unscale_() has already been called on this optimizer since the last update(). #24849
Comments
cc @muellerzr and @pacman100 |
Hello @paxvinci, I am running following example and unable to reproduce the issue: Command:
output logs
Using latest transformers and accelerate main branch |
Please share a minimal reproducer so that we can deep dive if the issue still persists |
I cannot share the json file due to confidential data. I reinstalled the last transformers and I restarted the train session. If I'll face again the error I'll send an update. |
Update: I downloaded the last version of the transformers via pip and I started again the training. After a couple of problems due to BSOD I restarted the training from checkpoints but I still receive "Can't find a valid checkpoint at" . There is a warning after the creation of the model
I tried to chage from LlamaTokenizer to LLaMATokenizer but the class does not exists. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hi all,
I'm facing the error in the subject. I saw this problem have been already solved but I still have this. This is how I configured the parameters for the trainer.
The strange behaviour is that the problem raises after the end of the first epoch.
System Info
The environment is WSL
Linux 5.15.90.1-microsoft-standard-WSL2 #1 SMP Fri Jan 27 02:56:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
pip list
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
Not raising the error and continue with the epoch #2
The text was updated successfully, but these errors were encountered: