-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Gemma Support #393
Add Gemma Support #393
Conversation
Hi @TechxGenus, great to see Gemma support. I tested your code and the quantization seems to work, although I have some issues measuring perplexity on the Gemma model series in general. I am getting some odd sizes for the model once saved (6GB shard + 600MB shard):
However, I tested the fused modules and it seems that I get the following error:
|
Yes, the quantized model file size is odd. This may be related to Google's design as Gemma has a very large embedding layer.
It looks correct. |
I reproduced this error when running gemma-7b-it-AWQ, though gemma-2b-AWQ works well. Additionally, I discovered that the latest transformers seem to modify the implementation of model.generate, and previous fusion layers needed to be modified to work. I tested |
I fixed the error and should be able to generate results normally now. |
Excellent work @TechxGenus. Thanks for your contribution. |
Add latest google gemma model.