This work is first titled "Transformer in Convolutional Neural Networks".
This repository exactly follows the code and the training settings of PVT.
Methods | Size | #Params | #FLOPs | Acc@1 | Pretrained Models |
---|---|---|---|---|---|
HAT-Net-Tiny | 224 x 224 | 12.7M | 2.0G | 79.8 | Google / Github |
HAT-Net-Small | 224 x 224 | 25.7M | 4.3G | 82.6 | Google / Github |
HAT-Net-Medium | 224 x 224 | 42.9M | 8.3G | 84.0 | Google / Github |
HAT-Net-Large | 224 x 224 | 63.1M | 11.5G | 84.2 | Google / Github |
If you are using the code/models provided here in a publication, please consider citing:
@article{liu2024vision,
title={Vision Transformers with Hierarchical Attention},
author={Liu, Yun and Wu, Yu-Huan and Sun, Guolei and Zhang, Le and Chhatkuli, Ajad and Van Gool, Luc},
journal={Machine Intelligence Research},
volume={21},
pages={670--683},
year={2024},
publisher={Springer}
}
@article{liu2021transformer,
title={Transformer in Convolutional Neural Networks},
author={Liu, Yun and Sun, Guolei and Qiu, Yu and Zhang, Le and Chhatkuli, Ajad and Van Gool, Luc},
journal={arXiv preprint arXiv:2106.03180},
year={2021}
}