CrossFormer++ Segmentation

Our semantic segmentation code is developed on top of MMSegmentation v0.29.1.

For more details please refer to our paper CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention.

Prerequisites

Libraries (Python3.6-based)

pip install mmcv-full==1.6.2 -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.12/index.html
pip install yapf==0.40.1
pip install mmsegmentation==0.29.1

Prepare ADE20K dataset according to guidelines in MMSegmentation v0.12.0
Prepare pretrained CrossFormer models

import torch
ckpt = torch.load("crossformer-s.pth") ## load classification checkpoint
torch.save(ckpt["model"], "backbone-corssformer-s.pth") ## only model weights are needed

Getting Started

Modify data_root in configs/_base_/datasets/ade20k.py and configs/_base_/datasets/ade20k_swin.py to your path to the ADE20K dataset.
Training

## Use config in Results table listed below as <CONFIG_FILE>
./dist_train.sh <CONFIG_FILE> <GPUS> <PRETRAIN_MODEL>

## e.g. train fpn_crossformer_b model with 8 GPUs
./dist_train.sh configs/crossformer/fpn_crossformer_b_ade20k_40k.py 8 path/to/backbone-corssformer-s.pth

Inference

./dist_test.sh <CONFIG_FILE> <GPUS> <DET_CHECKPOINT_FILE>

## e.g. evaluate semantic segmentation model by mIoU
./dist_test.sh configs/crossformer/fpn_crossformer_b_ade20k_40k.py 8 path/to/ckpt

Notes: We use single-scale testing by default, you can enable multi-scale testing or flip testing manually by following the instructions in configs/_base_/datasets/ade20k[_swin].py.

Results

Semantic FPN

Backbone	Iterations	Params	FLOPs	IOU	config	Models
PVT-M	80K	48.0M	219.0G	41.6	-	-
CrossFormer-S	80K	34.3M	209.8G	46.4	config	Google Drive/BaiduCloud, key: sn5h
PVT-L	80K	65.1M	283.0G	42.1	-	-
Swin-S	80K	53.2M	274.0G	45.2	-	-
CrossFormer-B	80K	55.6M	320.1G	48.0	config	Google Drive/BaiduCloud, key: joi5
CrossFormer-L	80K	95.4M	482.7G	49.1	config	Google Drive/BaiduCloud, key: 6v5d

UPerNet

Backbone	Iterations	Params	FLOPs	IOU	MS IOU	config	Models
ResNet-101	160K	86.0M	1029.0G	44.9	-	-	-
Swin-T	160K	60.0M	945.0G	44.5	45.8	-	-
CrossFormer-S	160K	62.3M	979.5G	47.6	48.4	config	Google Drive/BaiduCloud, key: wesb
Swin-S	160K	81.0M	1038.0G	47.6	49.5	-	-
CrossFormer-B	160K	83.6M	1089.7G	49.7	50.6	config	Google Drive/BaiduCloud, key: j061
Swin-B	160K	121.0M	1088.0G	48.1	49.7	-	-
CrossFormer-L	160K	125.5M	1257.8G	50.4	51.4	config	Google Drive/BaiduCloud, key: 17ks

Notes:

MS IOU means IOU with multi-scale testing.
Models are trained on ADE20K. Backbones are initialized with weights pre-trained on ImageNet-1K.
For Semantic FPN, models are trained for 80K iterations with batch size 16. For UperNet, models are trained for 160K iterations.
More detailed training settings can be found in corresponding configs.
More results can be seen in our paper.

FLOPs and Params Calculation

use get_flops.py to calculate FLOPs and #parameters of the specified model.

python get_flops.py <CONFIG_FILE> --shape <height> <width>

## e.g. get FLOPs and #params of fpn_crossformer_b with input image size [1024, 1024]
python get_flops.py configs/crossformer/fpn_crossformer_b_ade20k_40k.py --shape 1024 1024

Notes: Default input image size is [1024, 1024]. For calculation with different input image size, you need to change <height> <width> in the above command and change img_size in crossformer_factory.py accordingly at the same time.

Citing Us

@inproceedings{wang2021crossformer,
  title = {CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention},
  author = {Wang, Wenxiao and Yao, Lu and Chen, Long and Lin, Binbin and Cai, Deng and He, Xiaofei and Liu, Wei},
  booktitle = {International Conference on Learning Representations, {ICLR}},
  url = {https://openreview.net/forum?id=_PHymLIxuI},
  year = {2022}
}

@article{wang2023crossformer++,
  title={Crossformer++: A versatile vision transformer hinging on cross-scale attention},
  author={Wang, Wenxiao and Chen, Wei and Qiu, Qibo and Chen, Long and Wu, Boxi and Lin, Binbin and He, Xiaofei and Liu, Wei},
  journal={arXiv preprint arXiv:2303.06908},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CrossFormer++ Segmentation

Prerequisites

Getting Started

Results

Semantic FPN

UPerNet

FLOPs and Params Calculation

Citing Us

Files

README.md

Latest commit

History

README.md

File metadata and controls

CrossFormer++ Segmentation

Prerequisites

Getting Started

Results

Semantic FPN

UPerNet

FLOPs and Params Calculation

Citing Us