Adding support for DeepSeek-V2 MLA #8589

Azirine · 2024-07-19T13:35:24Z

Azirine
Jul 19, 2024

DeepSeek-V2-Chat-0628 currently uses excessive VRAM, possibly due to running as MHA instead of MLA.

Discussion here:
https://old.reddit.com/r/LocalLLaMA/comments/1e6ba6a/deepseekv2chat0628_weight_release_1_open_weight/ldtybpo/

ggerganov · 2024-07-22T10:29:03Z

ggerganov
Jul 22, 2024
Maintainer

I think we already support MLA. What makes you think that we use excessive VRAM?

0 replies

Azirine · 2024-07-23T00:26:43Z

Azirine
Jul 23, 2024
Author

KV Cache sizes are extremely big, as seen here:
https://old.reddit.com/r/LocalLLaMA/comments/1e6ba6a/deepseekv2chat0628_weight_release_1_open_weight/ldv84z5/

I'm running Q3_K_S (101.7 GB) with M3 Max 128 GB memory and 122 GB allocated as VRAM, it swaps with small context size (<=2k, even with 256 for some reason). 4k/8k are unusable.

1 reply

ggerganov Jul 24, 2024
Maintainer

Ah yes, we are not storing the compressed data. This most likely has to wait after the KV cache functionality is refactored to allow overloading it more easily

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for DeepSeek-V2 MLA #8589

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Adding support for DeepSeek-V2 MLA #8589

Azirine Jul 19, 2024

Replies: 2 comments · 1 reply

ggerganov Jul 22, 2024 Maintainer

Azirine Jul 23, 2024 Author

ggerganov Jul 24, 2024 Maintainer

Azirine
Jul 19, 2024

Replies: 2 comments 1 reply

ggerganov
Jul 22, 2024
Maintainer

Azirine
Jul 23, 2024
Author

ggerganov Jul 24, 2024
Maintainer