Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAE using pretrained VIT #196

Open
Songloading opened this issue Jan 26, 2022 · 3 comments
Open

MAE using pretrained VIT #196

Songloading opened this issue Jan 26, 2022 · 3 comments

Comments

@Songloading
Copy link

Hi There,

I am currently trying to fine-tune an MAE based on pretrained VIT from timm. However, when I do:

v = timm.create_model('vit_base_patch16_224', pretrained=True)
num_ftrs = v.head.in_features
v.head = nn.Linear(num_ftrs, 2) 
model = MAE(
    encoder = v,
    masking_ratio = 0.75,   # the paper recommended 75% masked patches
    decoder_dim = 512,      # paper showed good results with just 512
    decoder_depth = 6       # anywhere from 1 to 8
)

I got "AttributeError: 'VisionTransformer' object has no attribute 'pos_embedding'"
It seems that timm model is not compatible with the MAE implementation. Can this be easily fixed or I will have to change the internal implementation of MAE?

@lucidrains
Copy link
Owner

@Songloading i have no idea! i'm not really familiar with timm - perhaps you can ask Ross about it?

@Songloading
Copy link
Author

@lucidrains ok. Any ideas on what else pretrained VIT besides their implementation or pretrained MAE?

@mw9385
Copy link

mw9385 commented Jul 17, 2023

Did you solve the issue? @Songloading

@mw9385 mw9385 mentioned this issue Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants