Refactor IJEPA to use timm. #1612

radiradev · 2024-07-26T12:59:22Z

Changes

This PR adresses #1367. I have refactored IJEPA to use timm. I have tried to stay closer to the original implementation, and also added typing. There might be some structural changes needed - for instance the apply_masks function should probably be moved to utils? Any suggestions on how to improve this are welcome.

Also like the MAE timm implementation, I have created a separate file, instead of directly replacing the torchvision implementation. I assume once this is benchmarked the plan would be to completely replace it.

How was it tested?

Unit tests for the predictor, encoder and backbone classes.

codecov · 2024-07-26T13:23:10Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.61%. Comparing base (78f59fc) to head (9708f36).

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1612      +/-   ##
==========================================
+ Coverage   85.49%   85.61%   +0.11%     
==========================================
  Files         147      148       +1     
  Lines        6281     6333      +52     
==========================================
+ Hits         5370     5422      +52     
  Misses        911      911

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

guarin · 2024-07-26T15:40:47Z

Hi! Thanks a lot for this extensive PR, it looks really well made!

Looking at the code I see many parallels to our MaskedVisionTransformer implementation. I didn't have time yet to go through the full PR but I have a suspicion that IJEPABackboneTIMM and IJEPAEncoderTIMM are compatible with our implementation of the MaskedVisionTransformer (see here and here). Do you think it would be possible to use MaskedVisionTransformer directly instead of IJEPABackboneTIMM or am I missing something? If possible this simplify the code a lot.

What I imagine is something like this:

target_encoder = MaskedVisionTransformer()
context_encoder = MaskedVisionTransformer()
predictor = IJEPAPredictorTIMM()

# This encodes all patches.
target = target_encoder.encode(images)
# This encodes only the unmasked context patches.
context = context_encoder.encode(images, idx_keep=idx_keep)

prediction = predictor(context, masks_x, masks)
target = get_targets_at_masks(target, masks_x, masks)
loss(predictions, target)

radiradev · 2024-07-29T09:40:13Z

Hi @guarin. I think you are correct and we can reuse the MaskedVisionTransformer making this PR a lot smaller. I have updated it with your suggested changes. Also added drop_path_rate for IJepaPredictor.

guarin

Awesome, thanks so much!

guarin · 2024-07-29T13:36:34Z

Created two follow-up issues:

Closes: #1367

radiradev added 2 commits July 26, 2024 12:51

ijepa timm refactor

6c973f6

Calculate length correctly when masks are a single Tensor

b76112f

radiradev added 2 commits July 29, 2024 09:24

Remove ijepa encoder in favour or masked transformer

4c0b999

add docstring for drop_path

7005372

guarin approved these changes Jul 29, 2024

View reviewed changes

Merge branch 'master' into ijepa_timm

9708f36

guarin mentioned this pull request Jul 29, 2024

Update I-JEPA example to use timm #1614

Open

3 tasks

Merge branch 'master' into ijepa_timm

c42d345

guarin enabled auto-merge (squash) July 30, 2024 11:39

guarin merged commit 1bde34e into lightly-ai:master Jul 30, 2024
10 checks passed

radiradev deleted the ijepa_timm branch July 30, 2024 13:52

guarin mentioned this pull request Aug 16, 2024

Refactor I-JEPA to use timm ViT #1367

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor IJEPA to use timm. #1612

Refactor IJEPA to use timm. #1612

radiradev commented Jul 26, 2024 •

edited

Loading

codecov bot commented Jul 26, 2024 •

edited

Loading

guarin commented Jul 26, 2024

radiradev commented Jul 29, 2024

guarin left a comment

guarin commented Jul 29, 2024

Refactor IJEPA to use timm. #1612

Refactor IJEPA to use timm. #1612

Conversation

radiradev commented Jul 26, 2024 • edited Loading

Changes

How was it tested?

codecov bot commented Jul 26, 2024 • edited Loading

Codecov Report

guarin commented Jul 26, 2024

radiradev commented Jul 29, 2024

guarin left a comment

Choose a reason for hiding this comment

guarin commented Jul 29, 2024

radiradev commented Jul 26, 2024 •

edited

Loading

codecov bot commented Jul 26, 2024 •

edited

Loading