MVT: Multi-view Vision Transformer for 3D Object Recognition

This folder contains the PyTorch code for our BMVC 2021 paper MVT: Multi-view Vision Transformer for 3D Object Recognition by Shuo Chen, Tan Yu, and Ping Li.

If you use this code for a paper, please cite:

@inproceedings{Chen2021MVT,
  author    = {Shuo Chen and
               Tan Yu and
               Ping Li},
  title     = {{MVT:} Multi-view Vision Transformer for 3D Object Recognition},
  booktitle = {{BMVC}},
  year      = {2021},
}

We have developed a MLP-based architecture for view-based 3D object recognition. Check out our paper R2-MLP: Round-Roll MLP for Multi-View 3D Object Recognition and the accompanying code repository for more information.

Requirements

PyTorch 1.7.0+

Data Preparation

Download the ModelNet40 dataset (20 view setting) and extract it to the current folder:

wget https://data.airc.aist.go.jp/kanezaki.asako/data/modelnet40v2png_ori4.tar
tar -xvf modelnet40v2png_ori4.tar

Training

Download the DeiT small model pretrained on ImageNet 2012 from the Model Zoo.

Train the model on 2 V100 GPUs:

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --master_addr 127.0.0.2 --master_port 23918 --nproc_per_node=2 --use_env main.py --model deit_small_patch16_224 --epochs 100 --batch-size 8 --lr 0.001 --dataset M10 --view-num 20 --output_dir outputs --num_workers 4

The training log is available here for your reference.

Note: If you change --view-num, please remember to change timm/models/vision_transformer.py line 316 accordingly:

x = x.reshape(B//20, N*20, C)

Evaluation

Run the following command for evaluation:

CUDA_VISIBLE_DEVICES=0 python main.py --eval --model=deit_tiny_patch16_224 --resume=trained/model/path.pth --data-set=M10 --num_workers=4 --view-num=20 --batch-size=8

Acknowledgments

This repo is based on Deit and SOS. We thank the authors for their work.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
logs		logs
timm		timm
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_util.py		data_util.py
dataset.py		dataset.py
engine.py		engine.py
hubconf.py		hubconf.py
job.sh		job.sh
losses.py		losses.py
main.py		main.py
models.py		models.py
requirements.txt		requirements.txt
run_with_submitit.py		run_with_submitit.py
samplers.py		samplers.py
testmodel10.txt		testmodel10.txt
testmodel40.txt		testmodel40.txt
tox.ini		tox.ini
trainmodel10.txt		trainmodel10.txt
trainmodel40.txt		trainmodel40.txt
transforms.py		transforms.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MVT: Multi-view Vision Transformer for 3D Object Recognition

Requirements

Data Preparation

Training

Evaluation

Acknowledgments

About

Releases

Packages

Languages

License

shanshuo/MVT

Folders and files

Latest commit

History

Repository files navigation

MVT: Multi-view Vision Transformer for 3D Object Recognition

Requirements

Data Preparation

Training

Evaluation

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages