Skip to content

Quantizing a already quantized model (exl2) #351

Answered by turboderp
lufixSch asked this question in Q&A
Discussion options

You must be logged in to vote

It's not directly supported, no. It expects every linear layer loaded to have an FP16 weight matrix to work on, and while this could be converted on the fly from quantized weights, that functionality just isn't there at the moment.

I could add it, I suppose, but re-quantizing a model would still be less than ideal. FP16->8bpw->4bpw would produce a somewhat degraded model compared to FP16->4bpw. FP16->6bpw->4bpw would most likely be worse still. It would also use a bit more memory than converting from FP16.

But it's definitely doable, in principle, with some tweaks to the code.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by lufixSch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants