-
Notifications
You must be signed in to change notification settings - Fork 26.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add training support for SigLIP #31495
Conversation
@aliencaocao Could you rebase to include the upstream changes on main? This should fix the failures on the CI runs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding!
The tests in test_modeling_siglip.py will also need to be updated so the training tests are no longer skipped
[experimental] enable GC training tests as it has worked for my own data
Added the training tests and also enabled gradient checkpointing tests. I note that CLIP had issues with GC but I have used it with siglip myself and did not find any issue on convergence/accuracy on a single RTX 3080Ti with fp16 training and grad accum=16. Will let the tests run and see how it goes. |
@amyeroberts seems to need you to enable slow tests? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the continued work on this!
It shouldn't be necessary for the slow tests to be enabled to test training for this model. I've added the run-slow
label, nevertheless. If you push a commit with the message [run_slow] siglip
then this will trigger a run of the slow tests for this model (which I'll have to approve to set off)
[run_slow] siglip
Add skip reason for training tests for SiglipTextModel
# Conflicts: # tests/models/siglip/test_modeling_siglip.py
@amyeroberts now that the GC tests are properly skipped, shall we move forward with this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding!
What does this PR do?
Add the sigmoid contrastive loss function of SigLIP from https://github.com/google-research/big_vision/blob/01edb81a4716f93a48be43b3a4af14e29cdb3a7f/big_vision/trainers/proj/image_text/siglip.py#L287
This will allow training/finetuning SigLIP models.
Already verified to work on my own dataset.
I saw the note on using
torch.distributed
for loss function andopen_clip
's implementation, but I'm not sure why is it needed. I ran my training with both DDP and FDSP with full sharding and it seem to work just fine, also getting the expected speedup and ability to set larger BS. The only issue is #31034 when using FDSP but I don't think itsSigLIP
specific.Nonetheless, I updated the docs to mention the lack of usage of
torch.distributed
if that ended up important to some users.Not sure if a training test is needed.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@amyeroberts