How can I run a model quantized to w4a16(4-bit weights and 16-bit activations) with llama.cpp? #10084
Unanswered
xiboliyaxiangjiaojun
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
It seems that gguf doesn't support the quantization of W4A16.
And I also failed to convert the model "Meta-Llama-3.1-70B-Instruct-quantized.w4a16" to gguf using the convert_hf_to_gguf.py.
Beta Was this translation helpful? Give feedback.
All reactions