Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Blip2 model in VQA pipeline #25532

Merged

Conversation

jpizarrom
Copy link
Contributor

@jpizarrom jpizarrom commented Aug 16, 2023

What does this PR do?

Add Blip2ForConditionalGeneration model in VisualQuestionAnsweringPipeline.
Fixes part of #21110 and is based on #23348 #21227 .

Who can review?

Hi @NielsRogge what do you think of this??
Thanks!

TODOs

  • add Blip2 model in VQA pipeline
  • use require_torch_gpu in test
  • use can_generate in vqa pipeline for Blip2ForConditionalGeneration
  • use float16 in the test_large_model_pt_blip2
  • check if it is necessary to cast the input in torch.float16 inside _forward

@ArthurZucker
Copy link
Collaborator

cc @amyeroberts and @younesbelkada

@jpizarrom jpizarrom marked this pull request as ready for review August 16, 2023 15:17
@jpizarrom
Copy link
Contributor Author

jpizarrom commented Aug 26, 2023

Hi @amyeroberts and @younesbelkada, this PR is ready for review. Could you please take a look? Thanks :)

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great to me, I left one comment to make sure the slow test will not blow up GPU memory in our daily CI runners!
Let's also wait for amy and @Narsil 's review before merging

@slow
@require_torch
def test_large_model_pt_blip2(self):
vqa_pipeline = pipeline("visual-question-answering", model="Salesforce/blip2-opt-2.7b")
Copy link
Contributor

@younesbelkada younesbelkada Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vqa_pipeline = pipeline("visual-question-answering", model="Salesforce/blip2-opt-2.7b")
vqa_pipeline = pipeline("visual-question-answering", model="Salesforce/blip2-opt-2.7b", model_kwargs={"torch_dtype": torch.float16}, device=0)

Make also sure to cast the input in torch.float16 inside _forward if needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @younesbelkada , thanks for your feedback :)
model_kwargs were updated in test_large_model_pt_blip2, as recommended by you, but i am not sure how to check if casting to torch.float16 inside _forward is needed, could you please give me some hints about what should I check? Thanks

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Copy link
Contributor

@Narsil Narsil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I'm not a big fan of using the names of the configs directly to detect if generative or not, I feel like using the ForXXX should be a better hint.

We also used model.can_generate() as a hint in other pipelines.

Pinging @ylacombe who used that flag. (Just FYI no need to do anything).

@jpizarrom jpizarrom force-pushed the add_blip2_model_to_vqa_pipeline branch 2 times, most recently from f4848e1 to 1754398 Compare August 28, 2023 15:49
@jpizarrom
Copy link
Contributor Author

jpizarrom commented Aug 28, 2023

LGTM.

I'm not a big fan of using the names of the configs directly to detect if generative or not, I feel like using the ForXXX should be a better hint.

We also used model.can_generate() as a hint in other pipelines.

Pinging @ylacombe who used that flag. (Just FYI no need to do anything).

@Narsil, Thanks a lot for your feedback :)

At the moment model.can_generate() return False for Blip2ForConditionalGeneration, that is the reason why I was following the proposal of this non merged PR https://github.com/huggingface/transformers/pull/23348/files#diff-620bada7977c3d0040ed961581379598e53a9ef02fdbb26c570cac738c279c0eR64

Maybe could it be expected that can_generate method returns True for Blip2ForConditionalGeneration? if this is the case, we could use it. (i will take a look on it)

can_generate returns True for another model, does it make sense to do this in Blip2ForConditionalGeneration, or it could affect something else?

def can_generate(self) -> bool:
"""
Returns True. This model can `generate` and must therefore have this property set to True in order to be used
in the TTS pipeline.
"""
return True

@jpizarrom jpizarrom force-pushed the add_blip2_model_to_vqa_pipeline branch 3 times, most recently from 9cce838 to ff5db0c Compare August 28, 2023 19:20
@Narsil
Copy link
Contributor

Narsil commented Aug 29, 2023

I'm not the best person to comment on how can_generate works and what should or shouldn't be done.

The main thing about pipeline:

  • They should try to be model agnostic as much as possible so when newer models come in they work out of the box.

But the current code is acceptable.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this!

Before approving let's get @gante's opinion on the best/canonical way to detect if the model should generate the answer or not within the pipeline

Comment on lines 110 to 113
self.assertEqual(
outputs,
[{"answer": ANY(str)}],
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can go on one line

Suggested change
self.assertEqual(
outputs,
[{"answer": ANY(str)}],
)
self.assertEqual(outputs, [{"answer": ANY(str)}])

Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding detection of generative models: can_generate() was built precisely to confirm whether the model can safely call generate() (in theory, all models can do it due to the inheritance structure, in practice only a few can use it). This includes pipelines uses 👍

However, I don't think we should overload the function in the class -- see my comment below, going to open a PR with a more general solution :)

cc @amyeroberts @jpizarrom

src/transformers/models/blip_2/modeling_blip_2.py Outdated Show resolved Hide resolved
@gante
Copy link
Member

gante commented Aug 29, 2023

Generalizable solution here ☝️

@jpizarrom jpizarrom force-pushed the add_blip2_model_to_vqa_pipeline branch from ff5db0c to d309aa4 Compare August 29, 2023 20:55
Copy link
Contributor

@NielsRogge NielsRogge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clean and minimal PR, I like it!

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this functionality!

@amyeroberts amyeroberts merged commit 09dc995 into huggingface:main Aug 30, 2023
3 checks passed
@NielsRogge
Copy link
Contributor

NielsRogge commented Aug 30, 2023

Feel free to tweet/linkedin about it @jpizarrom and we'll amplify :)

@jpizarrom jpizarrom deleted the add_blip2_model_to_vqa_pipeline branch August 30, 2023 20:17
parambharat pushed a commit to parambharat/transformers that referenced this pull request Sep 26, 2023
* Add Blip2 model in VQA pipeline

* use require_torch_gpu for test_large_model_pt_blip2

* use can_generate in vqa pipeline

* test Blip2ForConditionalGeneration using float16

* remove custom can_generate from Blip2ForConditionalGeneration
blbadger pushed a commit to blbadger/transformers that referenced this pull request Nov 8, 2023
* Add Blip2 model in VQA pipeline

* use require_torch_gpu for test_large_model_pt_blip2

* use can_generate in vqa pipeline

* test Blip2ForConditionalGeneration using float16

* remove custom can_generate from Blip2ForConditionalGeneration
EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 18, 2023
* Add Blip2 model in VQA pipeline

* use require_torch_gpu for test_large_model_pt_blip2

* use can_generate in vqa pipeline

* test Blip2ForConditionalGeneration using float16

* remove custom can_generate from Blip2ForConditionalGeneration
@RainyLayx
Copy link

I use the newest library of transformers,but it still reports "The model 'Blip2ForConditionalGeneration' is not supported for vqa. Supported models are ['ViltForQuestionAnswering'].",so how can I use blip2 in pipeline to deal with the vqa task?Are there any test codes of BLIP2 in pipeline?

@jpizarrom
Copy link
Contributor Author

Hi @RainyLayx i was able to run this sample with transformers==4.37.1

from transformers import pipeline
import requests
from PIL import Image

vqa_pipeline = pipeline("visual-question-answering", model="Salesforce/blip2-opt-2.7b")

image =  Image.open(requests.get("https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png", stream=True).raw)
question = "Question: Is there a parrot? Answer:"

print(vqa_pipeline(image, question, top_k=1))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants