llama : adjust default context size + print warnings #10136

ggerganov · 2024-11-02T10:39:08Z

By default, the examples will use a context size of 4096, instead of the training context of the model. In a lot of cases, the default training context can be very big - 32k to 128k tokens, which causes enormous KV cache allocation and failures for regular hardware.

Also, add warning logs when the specified context size per sequence does not match the training context.

ggml-ci

ngxson

Thanks! This should prevent me from burning my swapfile whenever I forget to specify -c

Tested and it shows the log too:

> ./llama-cli -m ../models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf -cnv -p "You are a helpful assistant"
...
llama_new_context_with_model: n_ctx_per_seq (4096) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
...

ggerganov · 2024-11-02T12:08:53Z

Is 4096 a good value, or should we go lower?

ngxson · 2024-11-02T13:10:35Z

According to HF hub statistics, the most used model nowadays is the llama 3 (3.1, 3.2) 8B

With a context size of 4096, the KV takes around 512MB which I think is a very reasonable amount.

llama : adjust default context size + print warnings

52d537b

ggml-ci

ggerganov requested review from slaren and ngxson November 2, 2024 10:39

ngxson approved these changes Nov 2, 2024

View reviewed changes

ggml-ci : add missing gpu-layers + adjust context sizes

b49b9d1

github-actions bot added the devops improvements to build systems and github actions label Nov 2, 2024

slaren approved these changes Nov 2, 2024

View reviewed changes

ggerganov merged commit 1926d6e into master Nov 2, 2024
60 checks passed

ggerganov deleted the gg/default-ctx branch November 2, 2024 13:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : adjust default context size + print warnings #10136

llama : adjust default context size + print warnings #10136

ggerganov commented Nov 2, 2024 •

edited

Loading

ngxson left a comment

ggerganov commented Nov 2, 2024

ngxson commented Nov 2, 2024

llama : adjust default context size + print warnings #10136

llama : adjust default context size + print warnings #10136

Conversation

ggerganov commented Nov 2, 2024 • edited Loading

ngxson left a comment

Choose a reason for hiding this comment

ggerganov commented Nov 2, 2024

ngxson commented Nov 2, 2024

ggerganov commented Nov 2, 2024 •

edited

Loading