server: add KV cache quantization options #5684

AlpinDale · 2024-02-23T16:07:23Z

PR adds the KV cache quantization args to the server example.

K-Mistele · 2024-04-30T21:13:18Z

Where is the original issue / PR for adding KV quantization into the main binary? I would like to understand the work that is done, as I think there are some more neat things that can be done with KV cache quantization like this: https://arxiv.org/abs/2401.18079

slaren · 2024-04-30T21:32:37Z

@K-Mistele #4312

server: add KV cache quantization

9e73cc1

AlpinDale changed the title ~~server: add KV cache quantization option~~ server: add KV cache quantization options Feb 23, 2024

ggerganov approved these changes Feb 23, 2024

View reviewed changes

ggerganov merged commit fd43d66 into ggerganov:master Feb 23, 2024
55 checks passed

AlpinDale deleted the server/kv-cache branch February 23, 2024 19:53

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024

server : add KV cache quantization options (ggerganov#5684)

5617803

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024

server : add KV cache quantization options (ggerganov#5684)

44a0c99

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: add KV cache quantization options #5684

server: add KV cache quantization options #5684

AlpinDale commented Feb 23, 2024

K-Mistele commented Apr 30, 2024

slaren commented Apr 30, 2024

server: add KV cache quantization options #5684

server: add KV cache quantization options #5684

Conversation

AlpinDale commented Feb 23, 2024

K-Mistele commented Apr 30, 2024

slaren commented Apr 30, 2024