Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: add KV cache quantization options #5684

Merged
merged 1 commit into from
Feb 23, 2024

Conversation

AlpinDale
Copy link
Contributor

PR adds the KV cache quantization args to the server example.

@AlpinDale AlpinDale changed the title server: add KV cache quantization option server: add KV cache quantization options Feb 23, 2024
@ggerganov ggerganov merged commit fd43d66 into ggerganov:master Feb 23, 2024
55 checks passed
@AlpinDale AlpinDale deleted the server/kv-cache branch February 23, 2024 19:53
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
@K-Mistele
Copy link
Contributor

Where is the original issue / PR for adding KV quantization into the main binary? I would like to understand the work that is done, as I think there are some more neat things that can be done with KV cache quantization like this: https://arxiv.org/abs/2401.18079

@slaren
Copy link
Collaborator

slaren commented Apr 30, 2024

@K-Mistele #4312

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants