Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any plans on 8192 context version? #72

Open
imoneoi opened this issue Jul 9, 2023 · 5 comments
Open

Any plans on 8192 context version? #72

imoneoi opened this issue Jul 9, 2023 · 5 comments

Comments

@imoneoi
Copy link

imoneoi commented Jul 9, 2023

StarCoderPlus uses StarCoder + RefinedWeb dataset for training but with a longer context length. Are there plans to release a version with a longer context length, such as 8192?

@Green-Sky
Copy link

A simple finetune (lora is enough) for a stretched rope would be enough.
see eg ggerganov/llama.cpp#1965

@imoneoi
Copy link
Author

imoneoi commented Jul 9, 2023

@Green-Sky We observed that fine-tuning may still cause performance degradation. It is better to have a native 8192 pretrained model.

@Green-Sky
Copy link

sounds like you are not using rope scaling. some rope scaling variants can get away without finetuning.

@syzymon
Copy link

syzymon commented Jul 9, 2023

You can try LongLLaMA which is a long-context (8192 and beyond) finetune of OpenLLaMA: https://github.com/CStanKonrad/long_llama
https://huggingface.co/syzymon/long_llama_3b

It uses a different method than PI (see https://arxiv.org/abs//2307.03170 for details). There is no degradation on short context compared to the original 3B checkpoint and we are working to release larger models soon.

@imoneoi
Copy link
Author

imoneoi commented Jul 10, 2023

Thanks! How does it compare to native long context base models such as StarCoder 8192?

BTW, if we want the 8192 version of OpenLLaMA, maybe we need a JAX FlashAttention kernel like this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants