-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Add GLM-4v support #5358
Conversation
This may also be solved by #5018. |
Thanks for implementing this model! To improve performance, you should try to use vLLM layers instead of the default PyTorch implementations (see here). Also, can you add a test to ensure the model's consistency with its HuggingFace version (similar to the one for LLaVA). |
During testing, I noticed that the model output using vLLM is inconsistent with the output from the official GLM4v code, resulting in a decrease in image understanding capabilities. I have ensured that the parameter settings are consistent (at least numerically), yet this issue persists. Could you please explain what might be causing this discrepancy? |
@HuggingAha Thank you for your testing. To assist you better, could you please provide more details such as the input and output, sampling parameters, and any benchmarks you are testing? If it’s a generation task, due to slight differences in the logits between the vLLM and HuggingFace’s implementation (up to the first decimal place), differences may arise after sampling to a certain length. I believe this is normal cumulative error. However, if you find a significant difference in logits or a severe performance drop when measuring benchmarks, it is likely an issue with the implementation. Please feel free to provide more details to help me fix the problem. |
I'm getting the following error: File "/workspace/vllm/vllm/model_executor/models/chatglm.py", line 392, in load_weights |
There appears to be an issue with loading the model weights. It appears that the code is trying to load the weights for the glm-4v, but the parameters for the vision encoder are absent from the weights. If the weights are correct and the configuration is accurate, this could indicate a bug in the code. To assist with troubleshooting, please provide additional details. |
looking forward that vllm can support glm-4v soon |
I also find this problem. I just run the code on vllm=0.5.0 following your usage. |
I also find "KeyError: 'transformer.vision.patch_embedding.cls_embedding'", 'KeyError: 'transformer.vision.eoi'. Please help me solve this problem. |
looking forward support soon... |
Based on my tests, the issue might be due to the misalignment of positional encodings, where the |
In order to use the KV cache in vLLM, we have to reserve placeholders in the multimodal embeddings before they are passed to the language model. The placeholders are then filled with the image embeddings in https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/utils.py#L6-L41 So it is necessary to rewrite the code in such a way. Make sure that the embeddings follow the same order as the HF implementation after they are merged. Btw, #5276 has been merged, so you can start to make changes accordingly to support dynamic number of image tokens. You may refer to this guide for more details. |
thanks for your contribution! I met a problem when running with vllm server. I start the server using Python: python -m vllm.entrypoints.openai.api_server --model THUDM/glm-4v-9b --dtype auto --api-key token-abc123 --trust-remote-code --image-input-type "pixel_values" --image-token-id 151339 --image-input-shape "1,3,1120,1120" --image-feature-size 1602 --disable-image-processor --enforce-eager and then use the official OpenAI Python client to requese: from openai import OpenAI
import base64
openai_api_key = "token-abc123"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base
)
query = '描述这张图片'
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
# Path to your image
image_path = "liucheng.png"
# Getting the base64 string
base64_image = encode_image(image_path)
chat_response = client.chat.completions.create(
model="THUDM/glm-4v-9b",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": query},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
}
]
}]
)
print(chat_response) There is a error when using the openai client.
|
What's the progress so far? |
By the way, you can use |
After the native GLM-4V supports VLLM deployment, does the finetuned GLM-4V support VLLM deployment? |
This sample code doesn't seem to work anymore. from PIL import Image
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
from vllm.multimodal.image import ImagePixelData
max_model_len, tp_size = 8192, 1
model_name = "THUDM/glm-4v-9b"
boi_token_id = 151339
eoi_token_id = 151340
llm = LLM(
model=model_name,
tensor_parallel_size=tp_size,
max_model_len=max_model_len,
trust_remote_code=True,
enforce_eager=True,
image_input_type="pixel_values",
image_token_id=boi_token_id,
image_input_shape="1,3,1120,1120",
image_feature_size=1602,
disable_image_processor=True
)
stop_token_ids = [151329, 151336, 151338]
sampling_params = SamplingParams(temperature=0, max_tokens=1024, stop_token_ids=stop_token_ids)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
query = 'Describe this picture.'
image = Image.open("docs/source/assets/logos/vllm-logo-text-light.png").convert('RGB')
inputs = tokenizer.apply_chat_template(
[{"role": "user", "image": image, "content": query}],
add_generation_prompt=True,
tokenize=True,
return_tensors="pt",
return_dict=True
)
image_tensor = inputs['images']
input_ids = inputs['input_ids'][0].tolist()
# boi_token_pos, eoi_token_pos = input_ids.index(boi_token_id), input_ids.index(eoi_token_id)
# input_ids = input_ids[:boi_token_pos] + [boi_token_id] * 1602 + input_ids[eoi_token_pos + 1:]
outputs = llm.generate(
{
"prompt_token_ids": input_ids,
"multi_modal_data": ImagePixelData(image_tensor),
},
sampling_params=sampling_params
)
print(outputs[0].outputs[0].text) |
@B-201 Sorry about that. It should work now. |
@HuggingAha Fixed. Please see “About custom position_ids”. Thanks for the test, by the way. |
Sorry, I didn't understand what you mean. I tested it on the latest commit. |
I just edited the sample code: |
@songxxzp Thank you for your contribution. Can I install vllm from this https://github.com/songxxzp/vllm/tree/glm4v. I got some error while run pip install -e. |
Sorry to bother you, but I was wondering if this PR will be merged soon? |
Does it support batch inference of vLLM + GLM4V? |
What is the use case of custom |
Hi, thanks for your contributions! I just test the latest sample code, met an error, maybe it's because the different version of transformers? can you help~ |
You can try my PR, which includes a precompiled .whl file and support bnb 4-bit quantation. #7672 |
Closing as superseded by #9242 |
Overview
This PR support the glm-4v-9b model while maintaining compatibility with
chatglm
.FIX #5417
FIX #6097
Changes
vision_config
forChatGLMConfig
vllm/model_executor/models/glm4_vision_encoder.py
.vision
module forChatGLMModel
, makingChatGLMForCausalLM
multimodal capable.Fixed the logic forAlready fixed by [Model] Add base class for LoRA-supported models #5018vision_language_config
to ensure proper configuration of vision-language models whenlora_config
is present.position_ids
(glm-4v use the same position for images tokens).About custom `position_ids`
glm-4v use the same position for images tokens:
Therefore, we need to maintain
position_ids
inSequenceData
to support that.Major Code Changes
Maintain
position_ids
inSequenceData
(Only when position_ids is passed):vllm/sequence.py
Calculate
position_ids
for glm-4v:vllm/model_executor/models/chatglm.py
Use custom
position_ids
in model runner:vllm/worker/model_runner.py
About the bugfix
Code Changes
The previous code used an
elif
statement that prevented the check for subclasses ofVisionLanguageModelBase
whenlora_config
was set. This has been updated to use anif
statement to ensure thatvision_language_config
is processed correctly regardless of whetherlora_config
is present.Before
In
_get_model_initialization_kwargs
ofvllm/model_executor/model_loader/loader.py
.After
Usage
glm-4-9b-chat
glm-4v-9b
PR Checklist (Click to Expand)
Thank you for your contribution to vLLM! Before submitting the pull request, please ensure the PR meets the following criteria. This helps vLLM maintain the code quality and improve the efficiency of the review process.
PR Title and Classification
Only specific types of PRs will be reviewed. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:
[Bugfix]
for bug fixes.[CI/Build]
for build or continuous integration improvements.[Doc]
for documentation fixes and improvements.[Model]
for adding a new model or improving an existing model. Model name should appear in the title.[Frontend]
For changes on the vLLM frontend (e.g., OpenAI API server,LLM
class, etc.)[Kernel]
for changes affecting CUDA kernels or other compute kernels.[Core]
for changes in the core vLLM logic (e.g.,LLMEngine
,AsyncLLMEngine
,Scheduler
, etc.)[Hardware][Vendor]
for hardware-specific changes. Vendor name should appear in the prefix (e.g.,[Hardware][AMD]
).[Misc]
for PRs that do not fit the above categories. Please use this sparingly.Note: If the PR spans more than one category, please include all relevant prefixes.
Code Quality
The PR need to meet the following code quality standards:
format.sh
to format your code.docs/source/
if the PR modifies the user-facing behaviors of vLLM. It helps vLLM user understand and utilize the new features or changes.Notes for Large Changes
Please keep the changes as concise as possible. For major architectural changes (>500 LOC excluding kernel/data/config/test), we would expect a GitHub issue (RFC) discussing the technical design and justification. Otherwise, we will tag it with
rfc-required
and might not go through the PR.What to Expect for the Reviews
The goal of the vLLM team is to be a transparent reviewing machine. We would like to make the review process transparent and efficient and make sure no contributor feel confused or frustrated. However, the vLLM team is small, so we need to prioritize some PRs over others. Here is what you can expect from the review process:
action-required
label on the PR if there are changes required. The contributor should address the comments and ping the reviewer to re-review the PR.Thank You
Finally, thank you for taking the time to read these guidelines and for your interest in contributing to vLLM. Your contributions make vLLM a great tool for everyone!