-
Notifications
You must be signed in to change notification settings - Fork 26.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for GrokAdamW optimizer #32521
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this! Is there an associated issue to add this as a feature request?
I'll let @muellerzr and @SunMarc chip in on whether this is something we want to add. One thing to note is that this implementation is under the MIT license. @muellerzr do you know how we typically handle difference licensing for integrations like this?
As commented - tests would need to be added - the LOMO PR is a good reference here
@amyeroberts I do not, best we wait until @LysandreJik is back for that question! (As long as it's not too problematic, I don't see an issue with adding more optimizers) |
There is not an associated issue. I am hoping to add it, in order to make it more accessible to users. I have updated the license to Apache 2.0 to accommodate your license. I added the tests as requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!
What do you guys think about the grokking functions? As it is, I don't see how they can easily pass those in. And so I just use the default grokking function (which honestly 99.9% of people will do). Otherwise they would need to pass the function as a string which would then be eval'd - fugly. But this seems a good compromise for simplicity sake and if they want custom grokking functions they can use it manually. |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@ehartford currently they'd need to pass them in via |
Hello, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks for adding and iterating on this ❤️
@ehartford running this i get hit with this error. can you check Thanks for the contribution anyways
|
Please update GrokAdamW |
Thanks a ton , working now! |
What does this PR do?
Add support for GrokAdamW optimizer
This PR adds support for the GrokAdamW optimizer to the
transformers
library.Changes Introduced
Trainer
class.grokadamw
package if not already installed.Motivation
The GrokAdamW optimizer enhances training performance and stability for certain models, providing users with more optimization options.
Dependencies
pip install grokadamw
.Code Changes
Testing
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@muellerzr and @SunMarc