How can I run a model quantized to w4a16(4-bit weights and 16-bit activations) with llama.cpp? #10084

xiboliyaxiangjiaojun · 2024-10-29T13:14:07Z

xiboliyaxiangjiaojun
Oct 29, 2024

It seems that gguf doesn't support the quantization of W4A16.
And I also failed to convert the model "Meta-Llama-3.1-70B-Instruct-quantized.w4a16" to gguf using the convert_hf_to_gguf.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I run a model quantized to w4a16(4-bit weights and 16-bit activations) with llama.cpp? #10084

{{title}}

Replies: 0 comments

Select a reply

How can I run a model quantized to w4a16(4-bit weights and 16-bit activations) with llama.cpp? #10084

xiboliyaxiangjiaojun Oct 29, 2024

Replies: 0 comments

xiboliyaxiangjiaojun
Oct 29, 2024