Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacing Sliding Window CLIP ViT feature with Dense Perpixel CLIP feature without retraining #4

Open
PhucNDA opened this issue Jan 11, 2024 · 3 comments

Comments

@PhucNDA
Copy link

PhucNDA commented Jan 11, 2024

Hi, thanks for the significant work.

The current version uses Sliding Window CLIP ViT for each (3, 1120,1120) pixel value to generate (1408, 80, 80) feature map. I want to extend it by using per-pixel CLIP (same as OSM) feature like LSeg then sampled (pooled) it ~ to the same feature map shape. The rest remain unchanged. I wonder is it possible without retraining the whole model because I see there are some positional encoding in here. Basically, I think sampling from dense CLIP features might yield better results.

Looking forward to your response.
PhucNDA.

@cornettoyu
Copy link
Collaborator

Hi,

Not sure if I fully understand your question. Do you mean instead of sliding-window CLIP, you would like to extract CLIP feature at resolution 224 x 224 (or 336 x 336), and then interpolate the feature map to target resolution?

Based on my experience, that should give an inferior performance than sliding window one. Besides, if you would like to change any module, I expect a significant performance degrade w/o fine-tuning, as the feature distribution should be changed.

Best,

@PhucNDA
Copy link
Author

PhucNDA commented Jan 29, 2024

Hi @cornettoyu
Have you tried discarding the [CLS] token from the beginning of the embeddings during training. Will it tremendously affect the final performance?

@cornettoyu
Copy link
Collaborator

Hi @cornettoyu Have you tried discarding the [CLS] token from the beginning of the embeddings during training. Will it tremendously affect the final performance?

Sorry for the late reply. No we have not tried discard the cls token, but I suppose it should not affect the performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants