Root mean square of token probability differences as new quantization quality metric #2875
Replies: 4 comments 6 replies
-
Beta Was this translation helpful? Give feedback.
-
For a specific quantization, do you get different values for different tensors? If so, it seems like this could be a good way to automatically determine stuff like the k-quants strategies that try to put more bits in the more important tensors. |
Beta Was this translation helpful? Give feedback.
-
Interesting analysis. In contrast to standard perplexity evaluation of a model, where we have information only for 1 "correct" token for each context, here we have more information available in the predicted probabilities across the entire vocab. It makes sense that if utilized correctly, this information can result in a better (faster and more accurate) way to evaluate the effects on the quality change of the model due to quantization. In some sense, this metric can be interpreted as "how different two models are?" and probably can have other applications beyond evaluating the quality of quantum models. For example, the RMS_p of fine-tuned models with respect to the base model might be a way to compare the amount of "behavior change" as a result of the fine-tuning. |
Beta Was this translation helpful? Give feedback.
-
Nice to see less logits thrown away. |
Beta Was this translation helpful? Give feedback.
-
I have investigated potential new metrics other than differences in perplexity for judging the quality of a quantization format. The full report can be found here.$\mathrm{RMS}_p$ ) between the quantized and unquantized model as a new metric.
The TLDR is that I propose using the root mean square of the differences in token probability (
I think this would have the following advantages:
This is what a plot of$\mathrm{RMS}_p$ looks like:
This is the corresponding table:
I chose not to add the uncertainties for perplexity because they would be misleading in this context due to the very high correlation.
I very much welcome feedback for my idea, particularly from @ggerganov and @ikawrakow who have spent a lot of time on quantization formats.
Beta Was this translation helpful? Give feedback.
All reactions