-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any plans on 8192 context version? #72
Comments
A simple finetune (lora is enough) for a stretched rope would be enough. |
@Green-Sky We observed that fine-tuning may still cause performance degradation. It is better to have a native 8192 pretrained model. |
sounds like you are not using rope scaling. some rope scaling variants can get away without finetuning. |
You can try LongLLaMA which is a long-context (8192 and beyond) finetune of OpenLLaMA: https://github.com/CStanKonrad/long_llama It uses a different method than PI (see https://arxiv.org/abs//2307.03170 for details). There is no degradation on short context compared to the original 3B checkpoint and we are working to release larger models soon. |
Thanks! How does it compare to native long context base models such as StarCoder 8192? BTW, if we want the 8192 version of OpenLLaMA, maybe we need a JAX FlashAttention kernel like this? |
StarCoderPlus uses StarCoder + RefinedWeb dataset for training but with a longer context length. Are there plans to release a version with a longer context length, such as 8192?
The text was updated successfully, but these errors were encountered: