-
Notifications
You must be signed in to change notification settings - Fork 26.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gemma2: eager attention by default #32865
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
config._attn_implementation = "sdpa" | ||
model = Gemma2Model(config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we should check that you can set the attention in the canonical way - rather than overriding the private attribute
config._attn_implementation = "sdpa" | |
model = Gemma2Model(config) | |
model = Gemma2Model(config, attn_implementation="sdpa") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amyeroberts hehe I had the same idea, but this actually doesn't work :D
The API you're suggesting is for .from_pretrained()
, not for __init__()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can pass _attn_implementation
I think no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API you're suggesting is for .from_pretrained(), not for init()
Oh, true!
we can pass _attn_implementation I think no?
I don't think we can for the model init as it just accepts config
as an input arg
config._attn_implementation = "sdpa" | ||
model = Gemma2Model(config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can pass _attn_implementation
I think no?
What does this PR do?
See title :)
We know that SDPA yields inferior modeling results, so we should use
eager
by default. This has been the source of some model quality GH issues, e.g. #32848Slow tests for gemma 2 ran, no regressions