Quantizing a already quantized model (exl2) #351

lufixSch · 2024-02-24T10:44:38Z

lufixSch
Feb 24, 2024

Would it be possible to quantize a model (to a lower bpw) which has already been quantized.

If it is possible, would it be lower quality than quantizing from the original model and are there combinations, which would lead to better results than other (e.g 8bpw -> 4bpw vs. 6bpw -> 4bpw).

Answered by turboderp

Feb 26, 2024

It's not directly supported, no. It expects every linear layer loaded to have an FP16 weight matrix to work on, and while this could be converted on the fly from quantized weights, that functionality just isn't there at the moment.

I could add it, I suppose, but re-quantizing a model would still be less than ideal. FP16->8bpw->4bpw would produce a somewhat degraded model compared to FP16->4bpw. FP16->6bpw->4bpw would most likely be worse still. It would also use a bit more memory than converting from FP16.

But it's definitely doable, in principle, with some tweaks to the code.

View full answer

turboderp · 2024-02-26T10:01:53Z

turboderp
Feb 26, 2024
Maintainer

It's not directly supported, no. It expects every linear layer loaded to have an FP16 weight matrix to work on, and while this could be converted on the fly from quantized weights, that functionality just isn't there at the moment.

I could add it, I suppose, but re-quantizing a model would still be less than ideal. FP16->8bpw->4bpw would produce a somewhat degraded model compared to FP16->4bpw. FP16->6bpw->4bpw would most likely be worse still. It would also use a bit more memory than converting from FP16.

But it's definitely doable, in principle, with some tweaks to the code.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantizing a already quantized model (exl2) #351

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Quantizing a already quantized model (exl2) #351

lufixSch Feb 24, 2024

Replies: 1 comment

turboderp Feb 26, 2024 Maintainer

lufixSch
Feb 24, 2024

turboderp
Feb 26, 2024
Maintainer