-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for GIT model in VQA pipelines #23348
Add support for GIT model in VQA pipelines #23348
Conversation
c174002
to
0ec5d88
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
0ec5d88
to
0cb5fb5
Compare
I have two remarks here :
|
8512048
to
f799040
Compare
f799040
to
c3c9b03
Compare
Thanks for the review, I updated my changes following your comments. However, I have several doubt on my approach. Beam search for scoresI use beam scores to provide a score to follow the "signature" of the pipeline described here. Is it a correct ? The beam search is so slow that it makes the pipeline test timeout, it runs locally but in more than 120s, which make me think I'm not in the right way here Tokenizer paddingAlso when I use the pipeline with
This warning is legitimate, but I don't know how to fix it as Broken
|
Gently ping here @NielsRogge :-) |
generate_kwargs = copy.deepcopy(generate_kwargs) | ||
generate_kwargs["return_dict_in_generate"] = True | ||
generate_kwargs["output_scores"] = True | ||
if "num_beams" in generate_kwargs: | ||
if top_k > generate_kwargs["num_beams"]: | ||
pass # raise | ||
elif top_k > 1: | ||
generate_kwargs["num_beams"] = top_k | ||
else: | ||
# activate beam search with two beam to compute scores | ||
generate_kwargs["num_beams"] = 2 | ||
generate_kwargs["num_return_sequences"] = top_k | ||
if "max_new_tokens" not in generate_kwargs: | ||
# defaulting max_new_tokens to 100 | ||
generate_kwargs["max_new_tokens"] = 100 | ||
generate_outputs = self.model.generate(**model_inputs, **generate_kwargs) | ||
model_outputs = { | ||
"sequences_scores": generate_outputs.sequences_scores.reshape( | ||
(model_inputs["input_ids"].shape[0], top_k) | ||
), | ||
"sequences": generate_outputs.sequences.reshape((model_inputs["input_ids"].shape[0], top_k, -1)), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for all of this, I'd just run a generate
as done here. No need to check things like num_beams
, num_return_sequences
, output_scores
, etc.
For a generative model, it will just generate an answer so no need for any top k scores. For a classifier, the top k scores make sense as we can for instance take the top 5 predicted class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the forward here should only add **generate_kwargs
if self.model_type == ModelType.CLASSIFIER: | ||
probs = model_outputs.logits.sigmoid()[0] | ||
scores, ids = probs.topk(top_k) | ||
ids = ids.tolist() | ||
answers = [self.model.config.id2label[_id] for _id in ids] | ||
elif self.model_type == ModelType.GENERATIVE: | ||
scores = model_outputs["sequences_scores"][0] | ||
decoded_outputs = self.tokenizer.batch_decode(model_outputs["sequences"][0], skip_special_tokens=False) | ||
answers = [self.postprocess_git_output_single(o) for o in decoded_outputs] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks OK
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
What does this PR do?
This PR implement support for generative model in VQA pipeline (and more precisely GIT model).
Fixes part of #21110
This is my first contribution here, I am uncertain if my approach is correct. Please advise me if any modifications are necessary 😃
Before submitting
Pull Request section?
to it if that's the case. => Add support for BLIP and GIT in image-to-text and VQA pipelines #21110
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@NielsRogge