Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request more experiment results to compare to other architecture. #8

Open
Luciennnnnnn opened this issue Oct 16, 2021 · 1 comment
Open

Comments

@Luciennnnnnn
Copy link

Hi!
This work is pretty interesting, but I think there should are more results like in "Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight" as they replace local self-attention with depth-wise convolution in Swin Transformer. Since you conduct an advanced one with a more simple architecture compared to SwinTransformer, so I wonder if ConvMixer can get similar performance on object detection and semantic segmentation.

@BradKML
Copy link

BradKML commented Oct 21, 2021

This sounds like a good idea, but it requires standard benchmarks and model zoos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants