🚨 fix(SigLip): remove spurious exclusion of first vision output token #30952

transmissions11 · 2024-05-22T06:40:42Z

Looks like the first token is spuriously excluded from the average pooling because this code was copy & pasted from modeling_clip, which does prepend a [CLS] token. However SigLip has no such special token, so it would seem we are needlessly excluding a token from being included?

p.s. why not use the pooled_output from the model's map head vs averaging the last hidden states?

…n classifier

NielsRogge · 2024-05-22T11:40:53Z

There's another PR which suggested to use the pooled output rather than averaging the patch tokens: #30373.

The reason I went with the latter was because of this paper (follow-up paper by the ViT authors which showed that it works better than the CLS token). However averaging patch tokens might work better for ViTs pre-trained using (self-)supervised image classification. For contrastive models like CLIP and SigLIP, perhaps it makes more sense to use the pooled output. I'm happy to change it, although we would need to make it backwards compatible

amyeroberts

Thanks for fixing this!

Only thing we need to do is run the model slow tests. Pushing the following should trigger the necessary CI run:
git commit --allow-empty -m "[run-slow] siglip"

github-actions · 2024-07-08T08:04:12Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

amyeroberts · 2024-07-08T09:42:16Z

@transmissions11 Any update on this?

NielsRogge · 2024-07-08T12:30:20Z

As this is a breaking change, wondering whether we should just go for the pooled output anyway? It would however break any existing fine-tunes so we might need to add a flag for backwards compatibility.

amyeroberts · 2024-07-11T18:39:49Z

@NielsRogge It might be breaking but the previous logic was wrong -- there's no reason to exclude the first token. We don't keep backwards compatibility for bugs, and so I'm not sure it makes sense to have backwards compatibility here for the token.

If we did add the pooled logic, then yes we should have a flag. I'd do this in a separate PR

…huggingface#30952) fix(SigLip): remove spurious exclusion of first vision output token in classifier

fix(SigLip): remove spurious exclusion of first vision output token i…

98611aa

…n classifier

amyeroberts mentioned this pull request Jun 5, 2024

Add Idefics2ForSequenceClassification #31170

Closed

5 tasks

amyeroberts changed the title ~~fix(SigLip): remove spurious exclusion of first vision output token~~ 🚨 fix(SigLip): remove spurious exclusion of first vision output token Jun 13, 2024

amyeroberts added the run-slow label Jun 13, 2024

amyeroberts approved these changes Jun 13, 2024

View reviewed changes

amyeroberts merged commit 5258501 into huggingface:main Jul 11, 2024
18 checks passed

amyeroberts pushed a commit to amyeroberts/transformers that referenced this pull request Jul 19, 2024

🚨 fix(SigLip): remove spurious exclusion of first vision output token (…

c84d473

…huggingface#30952) fix(SigLip): remove spurious exclusion of first vision output token in classifier

MHRDYN7 pushed a commit to MHRDYN7/transformers that referenced this pull request Jul 23, 2024

🚨 fix(SigLip): remove spurious exclusion of first vision output token (…

fb1003c

…huggingface#30952) fix(SigLip): remove spurious exclusion of first vision output token in classifier

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚨 fix(SigLip): remove spurious exclusion of first vision output token #30952

🚨 fix(SigLip): remove spurious exclusion of first vision output token #30952

transmissions11 commented May 22, 2024 •

edited

Loading

NielsRogge commented May 22, 2024

amyeroberts left a comment

github-actions bot commented Jul 8, 2024

amyeroberts commented Jul 8, 2024

NielsRogge commented Jul 8, 2024

amyeroberts commented Jul 11, 2024 •

edited

Loading

🚨 fix(SigLip): remove spurious exclusion of first vision output token #30952

🚨 fix(SigLip): remove spurious exclusion of first vision output token #30952

Conversation

transmissions11 commented May 22, 2024 • edited Loading

NielsRogge commented May 22, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

github-actions bot commented Jul 8, 2024

amyeroberts commented Jul 8, 2024

NielsRogge commented Jul 8, 2024

amyeroberts commented Jul 11, 2024 • edited Loading

transmissions11 commented May 22, 2024 •

edited

Loading

amyeroberts commented Jul 11, 2024 •

edited

Loading