Skip to content

Robert-zwr/YOLOS-EVA

 
 

Repository files navigation

YOLOS with EVA-02

Introduction

You Only Look at One Sequence (YOLOS) (paper, code) is a series of object detection models based on the vanilla Vision Transformer with the fewest possible modifications, region priors, as well as inductive biases of the target task. With pre-training on the ImageNet-1k dataset and fine-tuning on the COCO dataset, the transfer learning performance of YOLOS reflected on COCO object detection dataset can serve as a challenging transfer learning benchmark to evaluate different (label-supervised or self-supervised) pre-training strategies for ViT.

EVA-02 (paper, code) is a series of visual pre-training models based on the ViT architecture and Masked Image Modeling (MIM) pre-training strategy. After fine-tuning for downstream tasks, the EVA-02 model demonstrates superior performance compared to previous models in various downstream tasks such as image classification and object detection.

This project applies the YOLOS method to EVA-02 models and evaluates their performance on the VOC2007 dataset to reveal the transferability of EVA-02 pretraining models.

Results

Model Params Pre-train Epochs Init Weight Fine-tune Epochs Eval Size YOLOS Checkpoint / Log AP @ VOC2007 test
VOC-YOLOS-Ti 6M 300 DeiT-tiny 300 512 checkpoint / Log 23.9
VOC-YOLOS-S 23M 300 DeiT-small 150 512 checkpoint / Log 31.1
VOC-YOLOS-EVA-Ti 6M 240+100 EVA02-tiny 300 512 checkpoint / Log 31.9
VOC-YOLOS-EVA-S 23M 240+100 EVA02-small 150 512 checkpoint / Log 42.0

Notes:

  • The Pre-train Epochs of VOC-YOLOS-EVA is 240+100 , which means 240 MIM pre-training epochs and 100 IN-1K fine-tuned epochs. In other words, we use IN-1K fine-tuned EVA-02 weights as initial checkpoint. The reason why we don't choose to directly use MIM pre training weights as the initial weights is due to the small size of the VOC2007 dataset, which makes it difficult to start training from MIM models that have never seen real images before. Subsequent experiments have also proven this point: For the Tiny model, the performance difference between the two is not significant. But for Small model, model trained from MIM weights performs poorly.
  • For EVA models, We interpolate the kernel size of patch_embed from 14x14 to 16x16. This is useful for object detection, instance segmentation & semantic segmentation tasks.
  • The comparison of these results may not be fair, as the EVA model uses more data during the pre-training process(IN-21K).

Partial finetune results

  • Tiny models
Model Params (adjustable) Init Weight finetune type Fully adjustable layers Log AP @ VOC2007 test
VOC-YOLOS-Ti 6M DeiT-tiny full 12 Log 23.9
VOC-YOLOS-EVA-Ti 6M eva02_Ti_pt_in21k_ft_in1k_p14 full 12 Log 31.9
VOC-YOLOS-Ti-0 3.7M DeiT-tiny ffn 0 Log 9.4
VOC-YOLOS-EVA-Ti-0 3.7M eva02_Ti_pt_in21k_ft_in1k_p14 ffn 0 Log 10.9
VOC-YOLOS-EVA-Ti-1 4.2M eva02_Ti_pt_in21k_ft_in1k_p14 ffn 1 Log 11.6
VOC-YOLOS-EVA-Ti-2 4.6M eva02_Ti_pt_in21k_ft_in1k_p14 ffn 2 Log 12.4
VOC-YOLOS-EVA-Ti-3 5.1M eva02_Ti_pt_in21k_ft_in1k_p14 ffn 3 Log 16.2
  • Small models
Model Params (adjustable) Init Weight finetune type Fully adjustable layers Log AP @ VOC2007 test
VOC-YOLOS-S 23M DeiT-small full 12 Log 31.1
VOC-YOLOS-EVA-S 23M eva02_S_pt_in21k_ft_in1k_p14 full 12 Log 42.0
VOC-YOLOS-S-0 15M DeiT-small ffn 0 Log 12.2
VOC-YOLOS-EVA-S-0 15M eva02_S_pt_in21k_ft_in1k_p14 ffn 0 Log 21.0
VOC-YOLOS-EVA-S-MIM-0 15M eva02_S_pt_in21k_p14 ffn 0 Log 23.0
VOC-YOLOS-EVA-S-1 17M eva02_S_pt_in21k_ft_in1k_p14 ffn 1 Log 23.4
VOC-YOLOS-EVA-S-MIM-1 17M eva02_S_pt_in21k_p14 ffn 1 Log 24.0
VOC-YOLOS-EVA-S-2 18M eva02_S_pt_in21k_ft_in1k_p14 ffn 2 Log 25.8
VOC-YOLOS-EVA-S-MIM-2 18M eva02_S_pt_in21k_p14 ffn 2 Log 30.7
VOC-YOLOS-EVA-S-3 20M eva02_S_pt_in21k_ft_in1k_p14 ffn 3 Log 32.6
VOC-YOLOS-EVA-S-MIM-3 20M eva02_S_pt_in21k_p14 ffn 3 Log 35.4
VOC-YOLOS-EVA-S-attn-0 8M eva02_S_pt_in21k_p14 attn 0 checkpoint / Log 42.4

Explanations +

Requirement

Please reference to Requirement of YOLOS here to build the environment.

Further, you also need to install timm and einops for EVA model:

pip install timm einops

Data preparation

We use VOC2007 trainval to train and VOC2007 test to eval.

Download and extract Pascal VOC 2007 images and annotations:

# Download the data.
cd $HOME/data
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
# Extract the data.
tar -xvf VOCtrainval_06-Nov-2007.tar
tar -xvf VOCtest_06-Nov-2007.tar

Now you should see VOCdevkit folder.

Then run voc2coco.py to convert VOC annotations to COCO format.

python voc2coco.py /path/to/VOCdevkit

Now you should see voc_train.json and voc_val.json.

We expect the dataset directory structure to be the following:

path/to/dataset/
  annotations/
  	voc_train.json
  	voc_val.json
  images/
  	train/	# VOC 2007 trainval images
  	val/	# VOC 2007 test images

Training

Before finetuning on VOC2007, you need download the pre-trained model.

To train the original VOC-YOLOS-Ti model on VOC2007, run this command:

python -m torch.distributed.launch --nproc_per_node=3 --use_env main.py --coco_path /path/to/dataset --dataset_file voc --batch_size 2 --lr 2.5e-5 --epochs 300 --backbone_name tiny --pre_trained path/to/deit-tiny.pth --eval_size 512 --init_pe_size 608 800 --output_dir /output/path/box_model
To train the original VOC-YOLOS-S model on VOC2007, run this command:

python -m torch.distributed.launch --nproc_per_node=3 --use_env main.py --coco_path /path/to/dataset --dataset_file voc --batch_size 1 --lr 2.5e-5 --epochs 150 --backbone_name small --pre_trained path/to/deit-small-300epoch.pth --eval_size 512 --init_pe_size 512 864 --mid_pe_size 512 864 --output_dir /output/path/box_model
To train the VOC-YOLOS-EVA-Ti model on VOC2007, run this command:

python -m torch.distributed.launch --nproc_per_node=3 --use_env main.py --coco_path /path/to/dataset --dataset_file voc --batch_size 2 --lr 2.5e-5 --epochs 300 --model_name eva --backbone_name tiny --pre_trained path/to/eva02_Ti_pt_in21k_ft_in1k_p14.pt --eval_size 512 --init_pe_size 608 800 --output_dir /output/path/box_model
To train the EVA-YOLOS-EVA-S model on VOC2007, run this command:

python -m torch.distributed.launch --nproc_per_node=3 --use_env main.py --coco_path /path/to/dataset --dataset_file voc --batch_size 1 --lr 2.5e-5 --epochs 150 --model_name eva --backbone_name small --pre_trained path/to/eva02_S_pt_in21k_ft_in1k_p14.pt --eval_size 512 --init_pe_size 608 800 --output_dir /output/path/box_model
To apply attention partial finetune on EVA-YOLOS-EVA-S-MIM model on VOC2007, run this command:

python -m torch.distributed.launch --nproc_per_node=3 --use_env main.py --coco_path /path/to/dataset --dataset_file voc --batch_size 1 --lr 2.5e-5 --epochs 150 --model_name eva --backbone_name small_mim --use_partial_finetune --partial_finetune_type attn --pre_trained path/to/eva02_S_pt_in21k_p14.pt --eval_size 512 --init_pe_size 608 800 --output_dir /output/path/box_model

Evaluation

To evaluate VOC-YOLOS-Ti model on VOC2007 test, run:

python -m torch.distributed.launch --nproc_per_node=3 --use_env main.py --coco_path /path/to/dataset --dataset_file voc --batch_size 2 --backbone_name tiny --eval --eval_size 512 --init_pe_size 608 800 --resume path/to/voc_yolos_ti.pth

To evaluate VOC-YOLOS-S model on VOC2007 test, run:

python -m torch.distributed.launch --nproc_per_node=3 --use_env main.py --coco_path /path/to/dataset --dataset_file voc --batch_size 1 --backbone_name small --eval --eval_size 512 --init_pe_size 512 864 --mid_pe_size 512 864 --resume path/to/voc_yolos_s.pth

To evaluate VOC-YOLOS-EVA-Ti model on VOC2007 test, run:

python -m torch.distributed.launch --nproc_per_node=3 --use_env main.py --coco_path /path/to/dataset --dataset_file voc --batch_size 2 --model_name eva --backbone_name tiny --eval --eval_size 512 --init_pe_size 608 800 --resume path/to/voc_yolos_eva_ti.pth

To evaluate VOC-YOLOS-EVA-S model on VOC2007 test, run:

python -m torch.distributed.launch --nproc_per_node=3 --use_env main.py --coco_path /path/to/dataset --dataset_file voc --batch_size 1 --model_name eva --backbone_name small --eval --eval_size 512 --init_pe_size 608 800 --resume path/to/voc_yolos_eva_s.pth

To evaluate VOC-YOLOS-EVA-S-attn-0 model on VOC2007 test, run:

python -m torch.distributed.launch --nproc_per_node=3 --use_env main.py --coco_path /path/to/dataset --dataset_file voc --batch_size 1 --model_name eva --backbone_name small_mim --eval --eval_size 512 --init_pe_size 608 800 --resume path/to/voc_eva_s_mim_frozen_attn_0.pth

About

YOLOS with EVA-02

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 58.5%
  • Python 41.5%