Slight quantization improvement for Q4_K and Q5_K #5361

ikawrakow · 2024-02-06T12:00:03Z

Nothing earth-shattering, but I noticed that I have not brought over all changes I made to Q4_K and Q5_K quantization in my repo, so here it is. We get small PPL improvements. E.g., for Mistral-7B with context of 512 and imatrix from wiki.traiin.raw (QError is defined as PPL(Q)/PPL(fp16)-1, where PPL(fp16) = 5.6924 for Mistral-7B):

Quantization	PPL Master	PPL PR	QError(PR)/QError(Master)
Q4_K_S	5.7428	5.7375	0.895
Q5_K_S	5.7107	5.7052	0.699

BarfingLemurs · 2024-02-06T12:33:19Z

@ikawrakow could you add some PR links to recent quantization changes in the readme?

here are some added a good while back: https://github.com/ggerganov/llama.cpp#quantization

ikawrakow · 2024-02-06T15:27:58Z

@ikawrakow could you add some PR links to recent quantization changes in the readme?

here are some added a good while back: https://github.com/ggerganov/llama.cpp#quantization

See #5366

* Q4_K: slightly better quantization * Q5_K: slightly better quantization --------- Co-authored-by: Iwan Kawrakow <[email protected]>

Kawrakow added 2 commits February 6, 2024 07:53

Q4_K: slightly better quantization

f58d49e

Q5_K: slightly better quantization

d3cc153

ggerganov approved these changes Feb 6, 2024

View reviewed changes

ikawrakow merged commit f57fadc into master Feb 6, 2024
56 checks passed

ikawrakow deleted the ik/q4k_tuning branch February 6, 2024 15:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slight quantization improvement for Q4_K and Q5_K #5361

Slight quantization improvement for Q4_K and Q5_K #5361

ikawrakow commented Feb 6, 2024

BarfingLemurs commented Feb 6, 2024

ikawrakow commented Feb 6, 2024

Slight quantization improvement for Q4_K and Q5_K #5361

Slight quantization improvement for Q4_K and Q5_K #5361

Conversation

ikawrakow commented Feb 6, 2024

BarfingLemurs commented Feb 6, 2024

ikawrakow commented Feb 6, 2024