-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Model]: We can able to run phi-3.5 vision instruct model but wanted to run in int4 quantization #8463
Comments
Are you using a custom quantized model? I don't see it on HuggingFace. |
i am using only phi-3 .5 vision instruct model and wanted to run in vllm with 4 bit quantization and one more douts i have can use engine configure for phi-3.5 model like this """ Example usage: python save_sharded_state.py Then, the model can be loaded with llm = LLM( from vllm import LLM, EngineArgs parser = FlexibleArgumentParser() def main(args): if name == "main": |
@Isotr0py are you familiar with this? |
I'm not sure which quantization "int4 quantization" exactly means here, because seems that there is no BNB 4-bit quantized Phi3-V model released in HF. (The code given above is using If "int4 quantization" just means 4-Bit quantization, Phi-3.5-vision-instruct-AWQ with awq quantization should work on VLLM. |
How many gpu is need to execute awq quantization? |
It costs about 4GB VRAM to run 4-bit awq quantized Phi-3.5-vision-instruct. BTW, the AWQ model I uploaded is calibrated with default dataset in I think vllm can't run tensorrt currently. (FYI, #5134 (comment)) |
ok |
Does this work for you? |
The model to consider.
The closest model vllm already supports.
phi-3.5 vision instruct model and need reference for this
What's your difficulty of supporting the model you want?
does not contain any information about quantization for phi-3.5 vision instruct model
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: