Conditional Convolutions for Instance Segmentation;
Zhi Tian, Chunhua Shen and Hao Chen;
In: Proc. European Conference on Computer Vision (ECCV), 2020.
arXiv preprint arXiv:2003.05664
First, follow the default instruction to install the project and datasets/README.md set up the datasets (e.g., MS-COCO).
For demo, run the following command lines:
wget https://cloudstor.aarnet.edu.au/plus/s/M8nNxSR5iNP4qyO/download -O CondInst_MS_R_101_3x_sem.pth
python demo/demo.py \
--config-file configs/CondInst/MS_R_101_3x_sem.yaml \
--input input1.jpg input2.jpg \
--opts MODEL.WEIGHTS CondInst_MS_R_101_3x_sem.pth
For training on COCO, run:
OMP_NUM_THREADS=1 python tools/train_net.py \
--config-file configs/CondInst/MS_R_50_1x.yaml \
--num-gpus 8 \
OUTPUT_DIR training_dir/CondInst_MS_R_50_1x
For evaluation on COCO, run:
OMP_NUM_THREADS=1 python tools/train_net.py \
--config-file configs/CondInst/MS_R_50_1x.yaml \
--eval-only \
--num-gpus 8 \
OUTPUT_DIR training_dir/CondInst_MS_R_50_1x \
MODEL.WEIGHTS training_dir/CondInst_MS_R_50_1x/model_final.pth
COCO Instance Segmentation Baselines with CondInst
Name | inf. time | box AP | mask AP | download |
---|---|---|---|---|
CondInst_MS_R_50_1x | 14 FPS | 39.7 | 35.7 | model |
CondInst_MS_R_50_3x | 14 FPS | 41.9 | 37.5 | model |
CondInst_MS_R_101_3x | 11 FPS | 43.3 | 38.6 | model |
With an auxiliary semantic segmentation task (set MODEL.CONDINST.MASK_BRANCH.SEMANTIC_LOSS_ON = True
to enable it):
Name | inf. time | box AP | mask AP | mask AP (test-dev) | download |
---|---|---|---|---|---|
CondInst_MS_R_50_3x_sem | 14 FPS | 42.6 | 38.2 | 38.7 | model |
CondInst_MS_R_101_3x_sem | 11 FPS | 44.6 | 39.8 | 40.1 | model |
With BiFPN:
Name | inf. time | box AP | mask AP | download |
---|---|---|---|---|
CondInst_MS_R_50_BiFPN_1x | 13 FPS | 42.5 | 37.3 | model |
CondInst_MS_R_50_BiFPN_3x | 13 FPS | 44.3 | 38.9 | model |
CondInst_MS_R_50_BiFPN_3x_sem | 13 FPS | 44.7 | 39.4 | model |
CondInst_MS_R_101_BiFPN_3x | 10 FPS | 45.3 | 39.6 | model |
CondInst_MS_R_101_BiFPN_3x_sem | 10 FPS | 45.7 | 40.2 | model |
Disclaimer:
- All models are trained with multi-scale data augmentation. Inference time is measured on a single NVIDIA 1080Ti with batch size 1.
- The final mask's resolution is 1/4 of the input image (i.e.,
MODEL.CONDINST.MASK_OUT_STRIDE = 4
, which is enough on MS-COCO and different from our original paper. In the paper, we usedMODEL.CONDINST.MASK_OUT_STRIDE = 2
. If you want high-resolution mask results, please reduce it. - This is a reimplementation. Thus, the numbers are slightly different from our original paper (within 0.1% in mask AP).
If you use CondInst in your research or wish to refer to the baseline results, please use the following BibTeX entries.
@inproceedings{tian2020conditional,
title = {Conditional Convolutions for Instance Segmentation},
author = {Tian, Zhi and Shen, Chunhua and Chen, Hao},
booktitle = {Proc. Eur. Conf. Computer Vision (ECCV)},
year = {2020}
}