-
Notifications
You must be signed in to change notification settings - Fork 26.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Starcoder2 model - bis #29215
Starcoder2 model - bis #29215
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Co-authored-by: Arthur <[email protected]>
Co-authored-by: Arthur <[email protected]>
Co-authored-by: Arthur <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #29228 I mention that static cache is not a blocker for the PR 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks ! LGTM once we add the docs !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go!
Starcoder2 has been released with the paper [Stacoder-2](https://drive.google.com/file/d/17iGn3c-sYNiLyRSY-A85QOzgzGnGiVI3/view) by BigCode team. | ||
|
||
Documentation page about the model is coming soon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fairly short. Let's add that the main difference with mistral is dropout, as the authors would be nice to explain how much this influenced training for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah makes sense, I will take care of that after the official release
Thank you for the PR @RaymondLi0. cc @loubnabnl and @lvwerra, next time let's make sure doc and paper links are fully completed before we merge, we require this for every model / organisation regardless of the release date 😉 let's make this an exception |
* Copy model * changes * misc * fixes * add embed and residual dropout (#30) * misc * remove rms norm and gated MLP * remove copied mentions where its not a copy anymore * remove unused _shape * copied from mistral instead * fix copies * fix copies * add not doctested * fix * fix copyright * Update docs/source/en/model_doc/starcoder2.md Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/starcoder2/configuration_starcoder2.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/starcoder2/configuration_starcoder2.py Co-authored-by: Arthur <[email protected]> * fix doc * revert some changes * add fa2 tests * fix styling nit * fix * push dummy docs --------- Co-authored-by: Joel Lamy-Poirier <[email protected]> Co-authored-by: younesbelkada <[email protected]> Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: Arthur <[email protected]>
The Starcoder2 model, adapted from Mistral.
All changes are done through options, so Mistral itself is still supported.Main changes:
*Embedding and residual dropout
It does not support absolute embeddings, so can't support Santacoder or Starcoder
Starcoder2-3B model: https://huggingface.co/bigcode/starcoder2-3b
Todo:
Core generation
] Adds support for static KV cache #27931, [CLeanup
] Revert SDPA attention changes that got in the static kv cache PR #29027 (and future changes from Feb. 19) (in a future PR?)@younesbelkada @ArthurZucker @jlamypoirier