-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Qwen2 #24
Comments
Hello, thank you for your interest in EETQ. The code you modified is for vllm, whose code is not merged yet, vllm-project/vllm#3614, which make me confused about how you will use it. Could you please specify it? If you want to quantize Qwen2 with EETQ on transformers or TGI, I think you could directly use it under these two frameworks. |
I am not using vllm. My change is not related to vllm. I am trying to do this:
The code changes I made here, enable this code to function. qwen2 is not supported because, it is not in EETQ_CAUSAL_LM_MODEL_MAP |
If you want to use EETQ to quantize a model and inference in an existing inference framework like TGI, transformers or vllm, you have to customize the quantization for each framework because cutlass kernel will change the layout of quantized weight. The above code is customized for vllm (Sorry for unclear description in README). If you use it in other framework, it may output wrong tokens. |
@ehartford
|
I wanna quantize my model to eetq format to publish it so people can download the eetq quantized version of my model. Just like they do with gptq, gguf, exl2, etc. |
Please add Qwen2 support
The text was updated successfully, but these errors were encountered: