What does it mean for a quantization to have half a bit? #1563

xzuyn · 2023-05-22T20:36:11Z

xzuyn
May 22, 2023

q4_0, q5_0, & q8_0 have 0.5 bits/weight less than the _1 quantizations. How does this work, and what does it mean? I remember seeing an explanation somewhere here, but I can't find it anymore.

KerfuffleV2 · 2023-05-23T02:28:44Z

KerfuffleV2
May 23, 2023
Collaborator

How does this work, and what does it mean? I remember seeing an explanation somewhere here, but I can't find it anymore.

You'll probably get a better/more detailed explanation from someone else.

The simplified version is the parameters are divided into chunks/groups and then there's extra metadata about the scale. This means each chunk can be quantized more accurately. Just forcing everything into 4 bits for example means that the scale has to be able to handle the highest and lowest values.

However, parts of the data may have a bunch of values in one range, other parts may have values in a different range. I.E. if one chunk has 100, 101, 102, 103 and another has -100, -101, -102, -103 then you'll lose a huge amount of accuracy trying to make everything fit in 4 bits. On the other hand if you use a scale of 100 for the first chunk and values 0, 1, 2, 3 and a scale of -103 for the second and values 3, 2, 1, 0 you just add the scale to the values and no information is lost since it's possible to express values 0-15 using 4 bits. (The scaling factor part isn't quantized, so it would be a 16bit or 32bit float.)

This is just explaining the idea in a very general/simplified way. Anyway, if you average out how many bits are used including the scale values that occur per chunk, you can end up with half a bit.

0 replies

SlyEcho · 2023-05-26T07:47:40Z

SlyEcho
May 26, 2023
Collaborator Sponsor

You can't have something that is half a bit, obviously, except in average.

It's kind of like saying the average family has 2.1 children.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What does it mean for a quantization to have half a bit? #1563

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

What does it mean for a quantization to have half a bit? #1563

xzuyn May 22, 2023

Replies: 2 comments

KerfuffleV2 May 23, 2023 Collaborator

SlyEcho May 26, 2023 Collaborator Sponsor

xzuyn
May 22, 2023

KerfuffleV2
May 23, 2023
Collaborator

SlyEcho
May 26, 2023
Collaborator Sponsor