ViTMatte🐒

Boosting Image Matting with Pretrained Plain Vision Transformers

Jingfeng Yao¹, Xinggang Wang^{1 📧}, Shusheng Yang¹, Baoyuan Wang²

¹ School of EIC, HUST, ² Xiaobing.AI

(^📧) corresponding author.

News

May 24th, 2024: ViTMatte has been brought to The Foundry's Nuke. Here is a bilibili tutorial. Thanks a lot!
Oct 19th, 2023: ViTMatte has been accepted by Information Fusion (IF=18.6)!
Sep 21th, 2023: ViTMatte is now available in 🤗HuggingFace Transformers! Many thanks to Niels!
June 12th, 2023: We released google colab demo. Try ViTMatte online!
June 9th, 2023: Many thanks to Lucas for creating ViT and twitting our ViTMatte paper!
June 8th, 2023: Matte Anything is released! If you like ViTMatte, you may also like Matte Anything.
May 27th, 2023: We released pretrained weights of ViTMatte!
May 25th, 2023: We released codes of ViTMatte. The pretrained models will be coming soon!
May 24th, 2023: We released our paper on arxiv.

Introduction

Plain Vision Transformer could also do image matting with simple ViTMatte framework!

Recently, plain vision Transformers (ViTs) have shown impressive performance on various computer vision tasks, thanks to their strong modeling capacity and large-scale pretraining. However, they have not yet conquered the problem of image matting. We hypothesize that image matting could also be boosted by ViTs and present a new efficient and robust ViT-based matting system, named ViTMatte. Our method utilizes (i) a hybrid attention mechanism combined with a convolution neck to help ViTs achieve an excellent performance-computation trade-off in matting tasks. (ii) Additionally, we introduce the detail capture module, which just consists of simple lightweight convolutions to complement the detailed information required by matting. To the best of our knowledge, ViTMatte is the first work to unleash the potential of ViT on image matting with concise adaptation. It inherits many superior properties from ViT to matting, including various pretraining strategies, concise architecture design, and flexible inference strategies. We evaluate ViTMatte on Composition-1k and Distinctions-646, the most commonly used benchmark for image matting, our method achieves state-of-the-art performance and outperforms prior matting works by a large margin.

Get Started

Demo

You could try to matting the demo image with its corresponding trimap by run:

python run_one_image.py \
    --model vitmatte-s \
    --checkpoint-dir path/to/checkpoint

The demo images will be saved in ./demo. You could also try with your own image and trimap with the same file.

Besides, you can also try ViTMatte in . It is a simple demo to show the ability of ViTMatte.

Results

Quantitative Results on Composition-1k

Model	SAD	MSE	Grad	Conn	checkpoints
ViTMatte-S	21.46	3.3	7.24	16.21	GoogleDrive
ViTMatte-B	20.33	3.0	6.74	14.78	GoogleDrive

Quantitative Results on Distinctions-646

Model	SAD	MSE	Grad	Conn	checkpoints
ViTMatte-S	21.22	2.1	8.78	17.55	GoogleDrive
ViTMatte-B	17.05	1.5	7.03	12.95	GoogleDrive

Citation

@article{yao2024vitmatte,
  title={ViTMatte: Boosting image matting with pre-trained plain vision transformers},
  author={Yao, Jingfeng and Wang, Xinggang and Yang, Shusheng and Wang, Baoyuan},
  journal={Information Fusion},
  volume={103},
  pages={102091},
  year={2024},
  publisher={Elsevier}
}

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
configs		configs
data		data
demo		demo
docs		docs
engine		engine
figs		figs
modeling		modeling
pretrained		pretrained
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluation.py		evaluation.py
inference.py		inference.py
main.py		main.py
requirements.txt		requirements.txt
run_one_image.py		run_one_image.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ViTMatte🐒

Boosting Image Matting with Pretrained Plain Vision Transformers

News

Introduction

Plain Vision Transformer could also do image matting with simple ViTMatte framework!

Get Started

Demo

Results

Citation

About

Releases

Packages

Languages

License

hustvl/ViTMatte

Folders and files

Latest commit

History

Repository files navigation

ViTMatte🐒

Boosting Image Matting with Pretrained Plain Vision Transformers

News

Introduction

Plain Vision Transformer could also do image matting with simple ViTMatte framework!

Get Started

Demo

Results

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages