Make special image tokens attribute of tokenizer #31967

zucchini-nlp · 2024-07-15T08:06:51Z

What does this PR do?

This PR adds three most commonly used special tokens in VLMs as attributes to the tokenizer, following on #31534 (comment). Now we don't have to add as kwargs to init/hardcode the special tokens in processing code. We can call simply self.tokenizer.image_token.

The PR only adds an attribute in tokenizer's code, all the changes for specific VLMs will go on another PR. I believe it's only Idefics models that uses special tokens within processing and it will require BC.

Other models will benefit from this feature after VLM processor refactor is merged.

HuggingFaceDocBuilderDev · 2024-07-15T08:26:22Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

amyeroberts · 2024-07-15T13:23:38Z

Thanks for opening this PR @zucchini-nlp!

I'm not sure this is something we want to add as a property for all tokenizers: they predominantly work with text. In this regard, it makes sense to talk about "end of sentence" as pretty much all tokenizers work with text in sentence structures. However image tokens aren't a general properly of text data (what does it mean for a bert tokenizer to have an image token?).

I would instead try to restructure such that these are only added to certain tokenizers e.g. bundle as a mixin which we can selectively add.

leloykun · 2024-07-15T13:36:49Z

I also suggest adding an image_token_ids attribute that contains all the VQVAE image tokens (excluding the start & end marker tokens)

zucchini-nlp · 2024-07-15T13:57:53Z

@amyeroberts I see what you mean, but going your way will require some indicator that the tokenizer is loaded from VLM because we load the same tokenizer classes as LMs. I can give it a try and add an indicator through config file, and depending on that add new special token mixin. Will ping you when it's ready

@leloykun in case it goes to its own VLMSpecialTokensMixin, I can add image_token_ids, sure

amyeroberts · 2024-07-15T14:00:36Z

@zucchini-nlp OK - I'm open to other solutions - a mixin might not be a good idea! I just want to avoid leakiness in our abstractions.

leloykun · 2024-07-15T15:13:11Z

Thanks for this @zucchini-nlp btw!

Currently I have to do something this:

which is a PITA

ArthurZucker

Hey! I think we should be careful as @amyeroberts says here.
IMO the best would be to have a VlmTokenizer class, which we wrap around the tokenizer class used. We have a list of SPECIAL_TOKENS_ATTRIBUTES that can be passed to it, taken from the config as allowed or acessible or just special tokens.

We could have getters and setters automatically inferred / have a way to correctly set the id of the token that does not conflict.

One current limitatio is that setting the id of the token does not really check existence in the vocab. Thus you have edge cases. Good time to fix them!

github-actions · 2024-08-15T08:05:03Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

zucchini-nlp added 2 commits July 15, 2024 09:53

make image token attribute

e80b906

tmp

6af303b

zucchini-nlp requested review from amyeroberts and ArthurZucker July 15, 2024 08:06

leloykun approved these changes Jul 15, 2024

View reviewed changes

zucchini-nlp mentioned this pull request Jul 16, 2024

Enable interleaved image-image generation w/ Chameleon & Anole #31919

Closed

ArthurZucker reviewed Jul 16, 2024

View reviewed changes

github-actions bot closed this Aug 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make special image tokens attribute of tokenizer #31967

Make special image tokens attribute of tokenizer #31967

zucchini-nlp commented Jul 15, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 15, 2024

amyeroberts commented Jul 15, 2024

leloykun commented Jul 15, 2024 •

edited

Loading

zucchini-nlp commented Jul 15, 2024

amyeroberts commented Jul 15, 2024

leloykun commented Jul 15, 2024

ArthurZucker left a comment

github-actions bot commented Aug 15, 2024

Make special image tokens attribute of tokenizer #31967

Make special image tokens attribute of tokenizer #31967

Conversation

zucchini-nlp commented Jul 15, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Jul 15, 2024

amyeroberts commented Jul 15, 2024

leloykun commented Jul 15, 2024 • edited Loading

zucchini-nlp commented Jul 15, 2024

amyeroberts commented Jul 15, 2024

leloykun commented Jul 15, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 15, 2024

zucchini-nlp commented Jul 15, 2024 •

edited

Loading

leloykun commented Jul 15, 2024 •

edited

Loading