-
Notifications
You must be signed in to change notification settings - Fork 459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial Loss increased from 10 (0.3.0 v) to 60 (0.4.0) ! #678
Comments
Not only in BIoMed data. The same results in your provided data. |
@Xuekai-Zhu Can you say more on what you mean by the "presence or absence" of that checkpoint? And can you share the code you're using for loading? |
I use the following command to run OLMo without modifying the source code.
|
you can directly refer to the above image. Using or not using a pretrained checkpoint, as well as the version of OLMo, can result in different initial loss values. |
Loosely speaking, v0.3.0 produces correct loss values, but the loss values in v0.4.0 are incorrect. |
Since you are building from source, it's possible that you were affected by the bug that was fixed in #680. Could you pull the commit and see if that fixes your issue? |
I am seeing your issue locally now and it is not fixed by #680. I am investigating |
Thank you very much! |
Upon further investigation, instances of bad loss we observed outside of #680 were due to bad setup (bad container or incorrect config). In particular, I ran from a checkpoint while passing If you find out what's causing the issue for you in 0.4.0, please let us know. We will also update here if we run into the issue again. |
🐛 Describe the bug
There is a significant discrepancy in the initial loss values between different versions of olmo and the presence or absence of the step-738020 checkpoint. This suggests potential issues with the model initialization or checkpoint handling in version 0.4.0. I believe the following results can be reproduced, since this bug has costed me for a week.
Task:
Results
olmo v0.4.0 : w/ step-738020 ckpt -- intial loss is 71
olmo v0.4.0 : w/o step-738020 ckpt -- intial loss is 32
olmo v0.3.0 : w/ step-738020 ckpt -- intial loss is 2
olmo v0.3.0 : w/o step-738020 ckpt -- intial loss is 11
Versions
Build from source
The text was updated successfully, but these errors were encountered: