Add support for GIT model in VQA pipelines #23348

marechaux · 2023-05-13T12:08:48Z

What does this PR do?

This PR implement support for generative model in VQA pipeline (and more precisely GIT model).

Fixes part of #21110

This is my first contribution here, I am uncertain if my approach is correct. Please advise me if any modifications are necessary 😃

Before submitting

Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. => Add support for BLIP and GIT in image-to-text and VQA pipelines #21110
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@NielsRogge

HuggingFaceDocBuilderDev · 2023-05-13T12:36:24Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

marechaux · 2023-05-15T10:10:30Z

I have two remarks here :

The CI is red due to this change : https://github.com/huggingface/transformers/pull/23348/files#diff-b452cc4f093e4b991e92054cf7d504edab44be07c4957969df85a5562238313cR48 the model now used in test is hf-internal-testing/tiny-random-ViltForQuestionAnswering instead of hf-internal-testing/tiny-vilt-random-vqa (and it seems like the new model image processor can't process images, I am not sure about how to fix it)
The test for GIT in VQA pipeline doesn't run because hf-internal-testing/tiny-random-GitForVisualQuestionAnswering doesn't exist, I need help about this point as well

docs/source/en/model_doc/git.mdx

src/transformers/pipelines/visual_question_answering.py

marechaux · 2023-05-22T10:29:36Z

Thanks for the review, I updated my changes following your comments. However, I have several doubt on my approach.

Beam search for scores

I use beam scores to provide a score to follow the "signature" of the pipeline described here. Is it a correct ?

The beam search is so slow that it makes the pipeline test timeout, it runs locally but in more than 120s, which make me think I'm not in the right way here

Tokenizer padding

Also when I use the pipeline with microsoft/git-base-textvqa in batch mode, I have this warning :

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

This warning is legitimate, but I don't know how to fix it as padding_side can only be set at tokenizer init.

Broken `Vilt` tests

Due to this change , the model used in unit test for Vilt model is now hf-internal-testing/tiny-random-ViltForQuestionAnswering instead of hf-internal-testing/tiny-vilt-random-vqa.

It seems like the new model image processor can't process images, I am not sure about how to fix it

Should I fix the model directly in the hub ?

marechaux · 2023-06-01T12:08:49Z

Gently ping here @NielsRogge :-)

NielsRogge · 2023-06-03T08:48:00Z

src/transformers/pipelines/visual_question_answering.py

+            generate_kwargs = copy.deepcopy(generate_kwargs)
+            generate_kwargs["return_dict_in_generate"] = True
+            generate_kwargs["output_scores"] = True
+            if "num_beams" in generate_kwargs:
+                if top_k > generate_kwargs["num_beams"]:
+                    pass  # raise
+            elif top_k > 1:
+                generate_kwargs["num_beams"] = top_k
+            else:
+                # activate beam search with two beam to compute scores
+                generate_kwargs["num_beams"] = 2
+            generate_kwargs["num_return_sequences"] = top_k
+            if "max_new_tokens" not in generate_kwargs:
+                # defaulting max_new_tokens to 100
+                generate_kwargs["max_new_tokens"] = 100
+            generate_outputs = self.model.generate(**model_inputs, **generate_kwargs)
+            model_outputs = {
+                "sequences_scores": generate_outputs.sequences_scores.reshape(
+                    (model_inputs["input_ids"].shape[0], top_k)
+                ),
+                "sequences": generate_outputs.sequences.reshape((model_inputs["input_ids"].shape[0], top_k, -1)),
+            }


No need for all of this, I'd just run a generate as done here. No need to check things like num_beams, num_return_sequences, output_scores, etc.

For a generative model, it will just generate an answer so no need for any top k scores. For a classifier, the top k scores make sense as we can for instance take the top 5 predicted class

Also the forward here should only add **generate_kwargs

NielsRogge · 2023-06-03T08:48:49Z

src/transformers/pipelines/visual_question_answering.py

+            if self.model_type == ModelType.CLASSIFIER:
+                probs = model_outputs.logits.sigmoid()[0]
+                scores, ids = probs.topk(top_k)
+                ids = ids.tolist()
+                answers = [self.model.config.id2label[_id] for _id in ids]
+            elif self.model_type == ModelType.GENERATIVE:
+                scores = model_outputs["sequences_scores"][0]
+                decoded_outputs = self.tokenizer.batch_decode(model_outputs["sequences"][0], skip_special_tokens=False)
+                answers = [self.postprocess_git_output_single(o) for o in decoded_outputs]


This looks OK

github-actions · 2023-06-27T15:02:27Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

marechaux force-pushed the add-support-for-git-in-vqa-pipeline branch from c174002 to 0ec5d88 Compare May 13, 2023 12:22

marechaux force-pushed the add-support-for-git-in-vqa-pipeline branch from 0ec5d88 to 0cb5fb5 Compare May 15, 2023 10:05

NielsRogge reviewed May 15, 2023

View reviewed changes

docs/source/en/model_doc/git.mdx Outdated Show resolved Hide resolved

marechaux force-pushed the add-support-for-git-in-vqa-pipeline branch 2 times, most recently from 8512048 to f799040 Compare May 16, 2023 15:50

marechaux requested a review from NielsRogge May 16, 2023 16:14

NielsRogge reviewed May 21, 2023

View reviewed changes

src/transformers/pipelines/visual_question_answering.py Outdated Show resolved Hide resolved

NielsRogge reviewed May 21, 2023

View reviewed changes

src/transformers/pipelines/visual_question_answering.py Outdated Show resolved Hide resolved

NielsRogge reviewed May 21, 2023

View reviewed changes

src/transformers/pipelines/visual_question_answering.py Outdated Show resolved Hide resolved

add support for GIT model in VQA pipelines

c3c9b03

marechaux force-pushed the add-support-for-git-in-vqa-pipeline branch from f799040 to c3c9b03 Compare May 22, 2023 08:51

marechaux requested a review from NielsRogge May 22, 2023 12:59

NielsRogge reviewed Jun 3, 2023

View reviewed changes

NielsRogge mentioned this pull request Jun 29, 2023

Add support for BLIP and GIT in image-to-text and VQA pipelines #21110

Open

github-actions bot closed this Jul 5, 2023

This was referenced Aug 14, 2023

Add GitForCausalLM model in VQA pipeline #25509

Closed

Add Blip2 model in VQA pipeline #25532

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for GIT model in VQA pipelines #23348

Add support for GIT model in VQA pipelines #23348

marechaux commented May 13, 2023

HuggingFaceDocBuilderDev commented May 13, 2023

marechaux commented May 15, 2023

marechaux commented May 22, 2023 •

edited

Loading

marechaux commented Jun 1, 2023

NielsRogge Jun 3, 2023 •

edited

Loading

NielsRogge Jun 3, 2023

NielsRogge Jun 3, 2023

github-actions bot commented Jun 27, 2023

Add support for GIT model in VQA pipelines #23348

Add support for GIT model in VQA pipelines #23348

Conversation

marechaux commented May 13, 2023

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented May 13, 2023

marechaux commented May 15, 2023

marechaux commented May 22, 2023 • edited Loading

Beam search for scores

Tokenizer padding

Broken Vilt tests

marechaux commented Jun 1, 2023

NielsRogge Jun 3, 2023 • edited Loading

Choose a reason for hiding this comment

NielsRogge Jun 3, 2023

Choose a reason for hiding this comment

NielsRogge Jun 3, 2023

Choose a reason for hiding this comment

github-actions bot commented Jun 27, 2023

marechaux commented May 22, 2023 •

edited

Loading

Broken `Vilt` tests

NielsRogge Jun 3, 2023 •

edited

Loading