Skip to content

Latest commit

 

History

History
255 lines (185 loc) · 8.17 KB

README.md

File metadata and controls

255 lines (185 loc) · 8.17 KB

EfficientViT for Image Classification

The codebase implements the image classification with EfficientViT.

Model Zoo

Model Data Input Acc@1 Acc@5 #FLOPs #Params Throughput Link
EfficientViT-M0 ImageNet-1k 224x224 63.2 85.2 79 2.3M 27644 model/log/onnx
EfficientViT-M1 ImageNet-1k 224x224 68.4 88.7 167 3.0M 20093 model/log/onnx
EfficientViT-M2 ImageNet-1k 224x224 70.8 90.2 201 4.2M 18218 model/log/onnx
EfficientViT-M3 ImageNet-1k 224x224 73.4 91.4 263 6.9M 16644 model/log/onnx
EfficientViT-M4 ImageNet-1k 224x224 74.3 91.8 299 8.8M 15914 model/log/onnx
EfficientViT-M5 ImageNet-1k 224x224 77.1 93.4 522 12.4M 10621 model/log/onnx

Get Started

Install requirements

Run the following command to install the dependences:

pip install -r requirements.txt

Data preparation

We need to prepare ImageNet-1k dataset from http://www.image-net.org/.

  • ImageNet-1k

ImageNet-1k contains 1.28 M images for training and 50 K images for validation. The images shall be stored as individual files:

ImageNet/
├── train
│   ├── n01440764
│   │   ├── n01440764_10026.JPEG
│   │   ├── n01440764_10027.JPEG
...
├── val
│   ├── n01440764
│   │   ├── ILSVRC2012_val_00000293.JPEG
...

Our code also supports storing the train set and validation set as the *.tar archives:

ImageNet/
├── train.tar
│   ├── n01440764
│   │   ├── n01440764_10026.JPEG
...
└── val.tar
│   ├── n01440764
│   │   ├── ILSVRC2012_val_00000293.JPEG
...

Evaluation

Before evaluation, we need to prepare the pre-trained models from model-zoo.

Run the following command to evaluate a pre-trained EfficientViT-M4 on ImageNet val with a single GPU:

python main.py --eval --model EfficientViT_M4 --resume ./efficientvit_m4.pth --data-path $PATH_TO_IMAGENET

This should give

* Acc@1 74.266 Acc@5 91.788 loss 1.242

Here are the command lines for evaluating other pre-trained models:

EfficientViT-M0
python main.py --eval --model EfficientViT_M0 --resume ./efficientvit_m0.pth --data-path $PATH_TO_IMAGENET

giving

* Acc@1 63.296 Acc@5 85.150 loss 1.741
EfficientViT-M1
python main.py --eval --model EfficientViT_M1 --resume ./efficientvit_m1.pth --data-path $PATH_TO_IMAGENET

giving

* Acc@1 68.356 Acc@5 88.672 loss 1.513
EfficientViT-M2
python main.py --eval --model EfficientViT_M2 --resume ./efficientvit_m2.pth --data-path $PATH_TO_IMAGENET

giving

* Acc@1 70.786 Acc@5 90.150 loss 1.442
EfficientViT-M3
python main.py --eval --model EfficientViT_M3 --resume ./efficientvit_m3.pth --data-path $PATH_TO_IMAGENET

giving

* Acc@1 73.390 Acc@5 91.350 loss 1.285
EfficientViT-M5
python main.py --eval --model EfficientViT_M5 --resume ./efficientvit_m5.pth --data-path $PATH_TO_IMAGENET

giving

* Acc@1 77.124 Acc@5 93.360 loss 1.127

Training

To train an EfficientViT-M4 model on a single node with 8 GPUs for 300 epochs and distributed evaluation, run:

python -m torch.distributed.launch --nproc_per_node=8 --master_port 12345 --use_env main.py --model EfficientViT_M4 --data-path $PATH_TO_IMAGENET --dist-eval
EfficientViT-M0

To train an EfficientViT-M0 model on a single node with 8 GPUs for 300 epochs and distributed evaluation, run:

python -m torch.distributed.launch --nproc_per_node=8 --master_port 12345 --use_env main.py --model EfficientViT_M0 --data-path $PATH_TO_IMAGENET --dist-eval
EfficientViT-M1

To train an EfficientViT-M1 model on a single node with 8 GPUs for 300 epochs and distributed evaluation, run:

python -m torch.distributed.launch --nproc_per_node=8 --master_port 12345 --use_env main.py --model EfficientViT_M1 --data-path $PATH_TO_IMAGENET --dist-eval
EfficientViT-M2

To train an EfficientViT-M2 model on a single node with 8 GPUs for 300 epochs and distributed evaluation, run:

python -m torch.distributed.launch --nproc_per_node=8 --master_port 12345 --use_env main.py --model EfficientViT_M2 --data-path $PATH_TO_IMAGENET --dist-eval
EfficientViT-M3

To train an EfficientViT-M3 model on a single node with 8 GPUs for 300 epochs and distributed evaluation, run:

python -m torch.distributed.launch --nproc_per_node=8 --master_port 12345 --use_env main.py --model EfficientViT_M3 --data-path $PATH_TO_IMAGENET --dist-eval
EfficientViT-M5

To train an EfficientViT-M5 model on a single node with 8 GPUs for 300 epochs and distributed evaluation, run:

python -m torch.distributed.launch --nproc_per_node=8 --master_port 12345 --use_env main.py --model EfficientViT_M5 --data-path $PATH_TO_IMAGENET --dist-eval

Speed test

Run the following command to compare the throughputs on GPU/CPU:

python speed_test.py

which should give

EfficientViT_M0 cuda:0 27643.941865437002 images/s @ batch size 2048
EfficientViT_M1 cuda:0 20093.286204638334 images/s @ batch size 2048
EfficientViT_M2 cuda:0 18218.347390415714 images/s @ batch size 2048
EfficientViT_M3 cuda:0 16643.905520424512 images/s @ batch size 2048
EfficientViT_M4 cuda:0 15914.449955135608 images/s @ batch size 2048
EfficientViT_M5 cuda:0 10620.868156518267 images/s @ batch size 2048

Acknowledge

We sincerely appreciate Swin Transformer, LeViT, pytorch-image-models, and PyTorch for their awesome codebases.

License