Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-trained weights? #2

Open
hzhang57 opened this issue May 1, 2021 · 11 comments
Open

Pre-trained weights? #2

hzhang57 opened this issue May 1, 2021 · 11 comments

Comments

@hzhang57
Copy link

hzhang57 commented May 1, 2021

Hi, I want to extend the model on my own task, will you release pre-trained weights?

@danczs
Copy link
Owner

danczs commented May 1, 2021

Beacause of the ploicy of our institution, we cannot send the pre-trained models out directly. We plan to find some gpu servers outside, but it will take time. So we are afraid the models will not be released recently.

@hzhang57
Copy link
Author

Hi, I trained a model with the provided codes on ImageNet-1k only with 4x2080ti (batch100), finally reach 82.0 around. I upload this temporal alternative in google drive to facilate other's needs. https://drive.google.com/drive/folders/18GpH1SeVOsq3_2QGTA5Z_3O1UFtKugEu?usp=sharing
I also guess that the model should have potential if pre-trained with ImageNet-21k.

@danczs
Copy link
Owner

danczs commented May 11, 2021

That's great! I will add it to readme for someone else need it. Thanks a lot!

@amaarora
Copy link

, I trained a model

Assuming this is Visformer small?

@hzhang57
Copy link
Author

yes, I trained the visformer small with 224: visformer_small

@developer0hye
Copy link
Contributor

developer0hye commented Oct 1, 2021

@danczs @amaarora

Thanks for sharing your works! I really love the architecture and experiments that you guys did. I could find out how to improve the performance of transformer models with convolutional layer.

I trained the visformer tiny with 224. If I upload the pretrained weight, will it can help other researchers?
When I trained the visformer tiny, the top1 acc of this model reached 78.3% and reached 78.1% with the weight saved into last epoch.

@danczs
Copy link
Owner

danczs commented Oct 1, 2021

@danczs @amaarora

Thanks for sharing your works! I really love the architecture and experiments that you guys did. I could find out how to improve the performance of transformer models with convolutional layer.

I trained the visformer tiny with 224. If I upload the pretrained weight, will it can help other researchers? When I trained the visformer tiny, the top1 acc of this model reached 78.3% and reached 78.1% with the weight saved into last epoch.

Thanks for your attention! Now only the weights of Visformer-small are available. So I think tiny weights can be helpful for someone. By the way, for tiny model, setting '--drop-path=0.0' can slightly improve the performance.

@developer0hye
Copy link
Contributor

@danczs

I trained the model with the below command having set '--drop-path' to 0.

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --model visformer_tiny --batch-size 256 --drop-path 0.0 --data-path /path/to/imagenet --output_dir /path/to/save

Please check my weight and share this link on Readme file!

https://drive.google.com/file/d/1LLBGbj7-ok1fDvvMCab-Fn5T3cjTzOKB/view?usp=sharing

@danczs
Copy link
Owner

danczs commented Oct 1, 2021

I have added it. Thanks for your sharing!
In addition, we will slightly update the model in the next few days to enable Visformer to use amp. At that time, old weights may not work well. We will test it and report the result here.
Thanks!

@developer0hye
Copy link
Contributor

@danczs Okay! Thanks!

@danczs
Copy link
Owner

danczs commented Oct 12, 2021

By slightly adjusting the model, Visformer can use amp now. During inference, old weights can utilize amp as well. One can refer to ReadMe for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants