Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
itpn-clip-b_hivit-base-p16_8xb256-amp-coslr-300e_in1k.py		itpn-clip-b_hivit-base-p16_8xb256-amp-coslr-300e_in1k.py
itpn-clip-b_hivit-base-p16_8xb256-amp-coslr-800e_in1k.py		itpn-clip-b_hivit-base-p16_8xb256-amp-coslr-800e_in1k.py
itpn-pixel_hivit-base-p16_8xb512-amp-coslr-1600e_in1k.py		itpn-pixel_hivit-base-p16_8xb512-amp-coslr-1600e_in1k.py
itpn-pixel_hivit-base-p16_8xb512-amp-coslr-400e_in1k.py		itpn-pixel_hivit-base-p16_8xb512-amp-coslr-400e_in1k.py
itpn-pixel_hivit-base-p16_8xb512-amp-coslr-800e_in1k.py		itpn-pixel_hivit-base-p16_8xb512-amp-coslr-800e_in1k.py
itpn-pixel_hivit-large-p16_8xb512-amp-coslr-1600e_in1k.py		itpn-pixel_hivit-large-p16_8xb512-amp-coslr-1600e_in1k.py
itpn-pixel_hivit-large-p16_8xb512-amp-coslr-400e_in1k.py		itpn-pixel_hivit-large-p16_8xb512-amp-coslr-400e_in1k.py
itpn-pixel_hivit-large-p16_8xb512-amp-coslr-800e_in1k.py		itpn-pixel_hivit-large-p16_8xb512-amp-coslr-800e_in1k.py
metafile.yml		metafile.yml

README.md

iTPN

Integrally Pre-Trained Transformer Pyramid Networks

Abstract

In this paper, we present an integral pre-training framework based on masked image modeling (MIM). We advocate for pre-training the backbone and neck jointly so that the transfer gap between MIM and downstream recognition tasks is minimal. We make two technical contributions. First, we unify the reconstruction and recognition necks by inserting a feature pyramid into the pre-training stage. Second, we complement mask image modeling (MIM) with masked feature modeling (MFM) that offers multi-stage supervision to the feature pyramid. The pre-trained models, termed integrally pre-trained transformer pyramid networks (iTPNs), serve as powerful foundation models for visual recognition. In particular, the base/large-level iTPN achieves an 86.2%/87.8% top-1 accuracy on ImageNet-1K, a 53.2%/55.6% box AP on COCO object detection with 1x training schedule using Mask-RCNN, and a 54.7%/57.7% mIoU on ADE20K semantic segmentation using UPerHead -- all these results set new records. Our work inspires the community to work on unifying upstream pre-training and downstream fine-tuning tasks. Code and the pre-trained models will be released at https://github.com/sunsmarterjie/iTPN.

How to use it?

Train/Test Command

Prepare your dataset according to the docs.

Train:

python tools/train.py configs/itpn/itpn-pixel_hivit-base-p16_8xb512-amp-coslr-800e_in1k.py

Models and results

Pretrained models

Model	Params (M)	Flops (G)	Config	Download
`itpn-clip-b_hivit-base-p16_8xb256-amp-coslr-800e_in1k`	233.00	18.47	config	N/A
`itpn-pixel_hivit-base-p16_8xb512-amp-coslr-800e_in1k`	103.00	18.47	config	N/A
`itpn-pixel_hivit-large-p16_8xb512-amp-coslr-800e_in1k`	314.00	63.98	config	N/A

Citation

@article{tian2022integrally,
  title={Integrally Pre-Trained Transformer Pyramid Networks},
  author={Tian, Yunjie and Xie, Lingxi and Wang, Zhaozhi and Wei, Longhui and Zhang, Xiaopeng and Jiao, Jianbin and Wang, Yaowei and Tian, Qi and Ye, Qixiang},
  journal={arXiv preprint arXiv:2211.12735},
  year={2022}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

itpn

itpn

README.md

iTPN

Abstract

How to use it?

Models and results

Pretrained models

Citation

Files

itpn

Directory actions

More options

Directory actions

More options

Latest commit

History

itpn

Folders and files

parent directory

README.md

iTPN

Abstract

How to use it?

Models and results

Pretrained models

Citation