Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix][Model] Add base class for vision-language models #4809

Merged
merged 7 commits into from
May 19, 2024

Conversation

DarkLight1337
Copy link
Member

@DarkLight1337 DarkLight1337 commented May 14, 2024

This PR adds a base class VLMBase to avoid importing LlavaForConditionalGeneration in vllm/model_executor/model_loader/loader.py, thus solving #4807.

Along the way, I have also ported the improved error handling logic regarding image_feature_size for LLaVA model.

FIX #4807

@DarkLight1337 DarkLight1337 changed the title [Bugfix][Model] Use ClassVar to indicate vision models [Bugfix][Model] Use ClassVar to indicate vision models and improve error handling when incorrect image_feature_size is passed May 14, 2024
@DarkLight1337 DarkLight1337 changed the title [Bugfix][Model] Use ClassVar to indicate vision models and improve error handling when incorrect image_feature_size is passed [Bugfix][Model] Add base class for vision-language models. May 15, 2024
@DarkLight1337 DarkLight1337 changed the title [Bugfix][Model] Add base class for vision-language models. [Bugfix][Model] Add base class for vision-language models May 15, 2024
Copy link
Collaborator

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the change looks good to me. cc @simon-mo

@@ -172,7 +174,7 @@ def forward(
image_features = image_input
vision_embeddings = self.multi_modal_projector(image_features)
inputs_embeds = self.language_model.get_input_embeddings(input_ids)
_merge_vision_embeddings(
inputs_embeds = _merge_vision_embeddings(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a bug?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is just to make explicit the fact that inputs_embeds is modified.

vllm/model_executor/models/vlm_base.py Outdated Show resolved Hide resolved
Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks for the fix!

@DarkLight1337
Copy link
Member Author

@simon-mo The models-test keeps getting interrupted, causing the CI to fail.

Copy link
Collaborator

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me retry one more time

@WoosukKwon
Copy link
Collaborator

@DarkLight1337 @ywang96 @rkooo567 Is this PR ready for merge?

@DarkLight1337
Copy link
Member Author

@DarkLight1337 @ywang96 @rkooo567 Is this PR ready for merge?

Yes.

@WoosukKwon WoosukKwon merged commit f68470e into vllm-project:main May 19, 2024
55 checks passed
@DarkLight1337 DarkLight1337 deleted the vlm-tag branch May 20, 2024 02:11
Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: ModelRegistry.load_model_cls() circular import error on llama-llava
4 participants