Settings for experiment with ViT-B/32 and ViT-L/14 #5

andylin12 · 2022-11-02T02:57:40Z

Thanks for the wonderful paper and repo.

I was able to reproduce MaskClip and MaskClip+ with ViT-B/16 + R101 on Pascal context dataset. The result mAp is 25.45 and 29.48 respecitively.

However, when I tried to change the model to ViT-B/32 and ViT-L/14 the result is not good, less than half of ViT-B/16 and the quanlitative result shows that the predicted dense label is generally a mess.

What I did was:

convert weight and backbone and extract text embeddings for ViT-B/32 and ViT-L/14
create a config accoding to ViT-B/16, with modifications:
- change the patch size to 32 for ViT-B/32
- change the pathc size to 14, embed_dims to 1024, num_layers to 24 for ViT-L/14

Is there anything I've done wrong or misunderstood? Do you have any suggestions on why the result is bad?

Thanks in advance.

hewenbin · 2022-12-12T22:44:07Z

Same observation here. Any thoughts?

111chengxuyuan · 2023-02-08T01:50:11Z

Hello,I want to ask you a question,Which version of mmsegmentation should I install to run this code properly? I installed 0.20.0 but couldn't run it

ngfuong · 2023-03-06T00:02:32Z

Can you please share with me the configuration file of your reproduced MaskCLIP+ ViT16?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Settings for experiment with ViT-B/32 and ViT-L/14 #5

Settings for experiment with ViT-B/32 and ViT-L/14 #5

andylin12 commented Nov 2, 2022

hewenbin commented Dec 12, 2022

111chengxuyuan commented Feb 8, 2023 •

edited

Loading

ngfuong commented Mar 6, 2023

Settings for experiment with ViT-B/32 and ViT-L/14 #5

Settings for experiment with ViT-B/32 and ViT-L/14 #5

Comments

andylin12 commented Nov 2, 2022

hewenbin commented Dec 12, 2022

111chengxuyuan commented Feb 8, 2023 • edited Loading

ngfuong commented Mar 6, 2023

111chengxuyuan commented Feb 8, 2023 •

edited

Loading