Skip to content

Latest commit

 

History

History
55 lines (35 loc) · 2.95 KB

TRAINING.md

File metadata and controls

55 lines (35 loc) · 2.95 KB

Training TinyViT

In this document, we introduce how to pretrain TinyViT with the proposed fast pretraining distillation.

Note: If the GPU memory is not enough to fit the batch size, you can use Gradient accumulation steps by adding the argument --accumulation-steps <acc_steps>. For example, the accumulated batch size per GPU is 128 (= 32 x 4) when passing the arguments --batch-size 32 --accumulation-steps 4.

Pretrain the model on ImageNet-22k with the proposed fast pretraining distillation.

Before training with the proposed fast pretraining distillation, we need to store the teacher sparse soft labels by the tutorial.

Assume that the teacher sparse soft labels are stored in the path ./teacher_logits/, and the IN-22k dataset is stored in the folder ./ImageNet-22k.

We use 4 nodes (8 GPUs per node) to pretrain the model on IN-22k with the distillation of stored soft labels.

python -m torch.distributed.launch --master_addr=$MASTER_ADDR --nproc_per_node 8 --nnodes=4 --node_rank=$NODE_RANK main.py --cfg configs/22k_distill/tiny_vit_21m_22k_distill.yaml --data-path ./ImageNet-22k --batch-size 128 --output ./output --opts DISTILL.TEACHER_LOGITS_PATH ./teacher_logits/

where $NODE_RANK and $MASTER_ADDR are the rank of a node and the IP address of the master node.

Finetune on ImageNet-1k

  • Finetune the pretrained model from IN-22k to IN-1k

After pretrained on IN-22k, the model can be finetuned on IN-1k by the following command.

python -m torch.distributed.launch --nproc_per_node 8 main.py --cfg configs/22kto1k/tiny_vit_21m_22kto1k.yaml --data-path ./ImageNet --batch-size 128 --pretrained ./checkpoints/tiny_vit_21m_22k_distill.pth --output ./output

where tiny_vit_21m_22k.pth is the checkpoint of pretrained TinyViT-21M on IN-22k dataset.

  • Finetune with higher resolution

To obtain better accuracy, we finetune the model to higher resolution progressively (224 -> 384 -> 512).

Finetune with higher resolution from 224 to 384
 python -m torch.distributed.launch --nproc_per_node 8 main.py --cfg configs/higher_resolution/tiny_vit_21m_224to384.yaml --data-path ./ImageNet --batch-size 32 --pretrained ./checkpoints/tiny_vit_21m_22kto1k_distill.pth --output ./output  --accumulation-steps 4
Finetune with higher resolution from 384 to 512
 python -m torch.distributed.launch --nproc_per_node 8 main.py --cfg configs/higher_resolution/tiny_vit_21m_384to512.yaml --data-path ./ImageNet --batch-size 32 --pretrained ./checkpoints/tiny_vit_21m_22kto1k_384_distill.pth --output ./output  --accumulation-steps 4

Train the model from scratch on ImageNet-1k

Here is the command to train TinyViT from scratch on ImageNet-1k.

python -m torch.distributed.launch --nproc_per_node 8 main.py --cfg configs/1k/tiny_vit_21m.yaml --data-path ./ImageNet --batch-size 128 --output ./output