Slight quantization improvement for Q4_K and Q5_K #5361
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Nothing earth-shattering, but I noticed that I have not brought over all changes I made to
Q4_K
andQ5_K
quantization in my repo, so here it is. We get small PPL improvements. E.g., for Mistral-7B with context of 512 and imatrix fromwiki.traiin.raw
(QError
is defined as PPL(Q)/PPL(fp16)-1, where PPL(fp16) = 5.6924 for Mistral-7B):