Any plans on 8192 context version? #72

imoneoi · 2023-07-09T05:58:04Z

StarCoderPlus uses StarCoder + RefinedWeb dataset for training but with a longer context length. Are there plans to release a version with a longer context length, such as 8192?

Green-Sky · 2023-07-09T09:39:52Z

A simple finetune (lora is enough) for a stretched rope would be enough.
see eg ggerganov/llama.cpp#1965

imoneoi · 2023-07-09T14:07:21Z

@Green-Sky We observed that fine-tuning may still cause performance degradation. It is better to have a native 8192 pretrained model.

Green-Sky · 2023-07-09T14:15:06Z

sounds like you are not using rope scaling. some rope scaling variants can get away without finetuning.

syzymon · 2023-07-09T21:05:53Z

You can try LongLLaMA which is a long-context (8192 and beyond) finetune of OpenLLaMA: https://github.com/CStanKonrad/long_llama
https://huggingface.co/syzymon/long_llama_3b

It uses a different method than PI (see https://arxiv.org/abs//2307.03170 for details). There is no degradation on short context compared to the original 3B checkpoint and we are working to release larger models soon.

imoneoi · 2023-07-10T09:07:46Z

Thanks! How does it compare to native long context base models such as StarCoder 8192?

BTW, if we want the 8192 version of OpenLLaMA, maybe we need a JAX FlashAttention kernel like this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any plans on 8192 context version? #72

Any plans on 8192 context version? #72

imoneoi commented Jul 9, 2023

Green-Sky commented Jul 9, 2023

imoneoi commented Jul 9, 2023

Green-Sky commented Jul 9, 2023

syzymon commented Jul 9, 2023 •

edited

Loading

imoneoi commented Jul 10, 2023

Any plans on 8192 context version? #72

Any plans on 8192 context version? #72

Comments

imoneoi commented Jul 9, 2023

Green-Sky commented Jul 9, 2023

imoneoi commented Jul 9, 2023

Green-Sky commented Jul 9, 2023

syzymon commented Jul 9, 2023 • edited Loading

imoneoi commented Jul 10, 2023

syzymon commented Jul 9, 2023 •

edited

Loading