Track progress for VLMs refactoring #33374

zucchini-nlp · 2024-09-08T17:13:56Z

This issue tracks the progress on improving the handling and testing of Vision-Language Models. The main goals are to enhance/enable generation tests, handle other generation techniques like assisted decoding and ensure all models pass CI checks.

I already started working on it and merged/opened some PRs. This issue should help us track how much is left until VLMs are standardized from modeling code perspective.

Motivation

,

Your contribution

.

The text was updated successfully, but these errors were encountered:

zucchini-nlp · 2024-09-08T17:15:11Z

cc @gante 😄

zucchini-nlp added WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress Vision Generation Multimodal labels Sep 8, 2024

zucchini-nlp changed the title ~~Progress Tracking for VLMs Refactoring~~ Track progress for VLMs refactoring Sep 8, 2024

zucchini-nlp self-assigned this Sep 8, 2024

This was referenced Sep 9, 2024

Fix: Qwen2-VL training on video datasets #33307

Merged

VLMs: patch_size -> num_image_tokens in processing #33424

Open

zucchini-nlp mentioned this issue Sep 17, 2024

VLMs: enable generation tests #33533

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track progress for VLMs refactoring #33374

Track progress for VLMs refactoring #33374

zucchini-nlp commented Sep 8, 2024 •

edited

Loading

zucchini-nlp commented Sep 8, 2024

Track progress for VLMs refactoring #33374

Track progress for VLMs refactoring #33374

Comments

zucchini-nlp commented Sep 8, 2024 • edited Loading

Motivation

Your contribution

zucchini-nlp commented Sep 8, 2024

zucchini-nlp commented Sep 8, 2024 •

edited

Loading