PLSC/task/ssl/mae at master · PaddlePaddle/PLSC

Name	Name	Last commit message	Last commit date
parent directory ..
util	util
README.md	README.md
engine_finetune.py	engine_finetune.py
engine_pretrain.py	engine_pretrain.py
finetune.sh	finetune.sh
finetune_convmae.sh	finetune_convmae.sh
linprobe.sh	linprobe.sh
linprobe_convmae.sh	linprobe_convmae.sh
main_finetune.py	main_finetune.py
main_linprobe.py	main_linprobe.py
main_pretrain.py	main_pretrain.py
pretrain.sh	pretrain.sh
pretrain_convmae.sh	pretrain_convmae.sh

MAE and ConvMAE: A PaddlePaddle Re-Implementation

MAE

ConvMAE

This is a PaddlePaddle/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners and ConvMAE: Masked Convolution Meets Masked Autoencoders

Requirements

To enjoy some new features, PaddlePaddle 2.4 is required. For more installation tutorials refer to installation.md

Data Preparation

Prepare the data into the following directory:

dataset/
└── ILSVRC2012
    ├── train
    └── val

MAE

Pre-Training

#unset PADDLE_TRAINER_ENDPOINTS
#export PADDLE_NNODES=4
#export PADDLE_MASTER="10.67.228.16:12538"
#export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
#export PADDLE_JOB_ID=MAE

# If you use single node
# batch_size 64, ACCUM_ITER=8, effective batch size: 4096
# batch_size 256, ACCUM_ITER=2, effective batch size: 4096

# 4 nodes for pretrain
ACCUM_ITER=1
IMAGENET_DIR=./dataset/ILSVRC2012/
python -m paddle.distributed.launch \
    --nnodes=$PADDLE_NNODES \
    --master=$PADDLE_MASTER \
    --devices=$CUDA_VISIBLE_DEVICES \
    main_pretrain.py \
    --accum_iter $ACCUM_ITER \
    --batch_size 128 \
    --model mae_vit_base_patch16 \
    --norm_pix_loss \
    --mask_ratio 0.75 \
    --epochs 1600 \
    --warmup_epochs 40 \
    --blr 1.5e-4 --weight_decay 0.05 \
    --data_path ${IMAGENET_DIR}

Here the effective batch size is 128 (batch_size per gpu) * 4 (nodes) * 8 (gpus per node) = 4096. If memory or # gpus is limited, use --accum_iter to maintain the effective batch size, which is batch_size (per gpu) * nodes * 8 (gpus per node) * accum_iter.
blr is the base learning rate. The actual lr is computed by the linear scaling rule: lr = blr * effective batch size / 256.
Here we use --norm_pix_loss as the target for better representation learning. To train a baseline model (e.g., for visualization), use pixel-based construction and turn off --norm_pix_loss.
Training time is ~56h in 32 A100(40G) GPUs (1600 epochs).

To train ViT-Base or ViT-Huge, set --model mae_vit_base_patch16 or --model mae_vit_huge_patch14.

	ViT-Base	ViT-Large	ViT-Huge
epochs	1600	1600	1600
pre-trained checkpoint	download	-	-

Fine-tuning

#unset PADDLE_TRAINER_ENDPOINTS
#export PADDLE_NNODES=4
#export PADDLE_MASTER="10.67.123.16:12538"
#export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
#export PADDLE_JOB_ID=MAE

# batch_size 32, ACCUM_ITER=4, effective batch size: 1024
# batch_size 128, ACCUM_ITER=1, effective batch size: 1024

# 4 nodes finetune setting
ACCUM_ITER=1
PRETRAIN_CHKPT='output_dir/checkpoint-1599.pd'
IMAGENET_DIR=./dataset/ILSVRC2012/
python -m paddle.distributed.launch \
    --nnodes=$PADDLE_NNODES \
    --master=$PADDLE_MASTER \
    --devices=$CUDA_VISIBLE_DEVICES \
    main_finetune.py \
    --accum_iter $ACCUM_ITER \
    --batch_size 128 \
    --model maevit_base_patch16 \
    --finetune ${PRETRAIN_CHKPT} \
    --epochs 100 \
    --blr 5e-4 --layer_decay 0.65 \
    --weight_decay 0.05 --drop_path 0.1 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
    --dist_eval --data_path ${IMAGENET_DIR}

	ViT-Base	ViT-Large	ViT-Huge
official (PyTorch/GPU)	83.664	85.952	86.928
official rerun (PyTorch/GPU)	83.36	-	-
this repo (Paddle/GPU)	83.34	-	-
checkpoint	download	-	-

Linear Probing

#unset PADDLE_TRAINER_ENDPOINTS
#export PADDLE_NNODES=4
#export PADDLE_MASTER="10.67.123.16:12538"
#export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
#export PADDLE_JOB_ID=MAE


# batch_size 512, ACCUM_ITER=4, effective batch size: 16384

# 4 nodes finetune setting
ACCUM_ITER=1
PRETRAIN_CHKPT='output_dir/checkpoint-1599.pd'
IMAGENET_DIR=./dataset/ILSVRC2012/
python -m paddle.distributed.launch \
   --nnodes=$PADDLE_NNODES \
   --master=$PADDLE_MASTER \
   --devices=$CUDA_VISIBLE_DEVICES \
   main_linprobe.py \
   --accum_iter $ACCUM_ITER \
   --batch_size 512 \
   --model maevit_base_patch16 \
   --cls_token \
   --finetune ${PRETRAIN_CHKPT} \
   --epochs 90 \
   --blr 0.1 \
   --weight_decay 0.0 \
   --dist_eval --data_path ${IMAGENET_DIR}

	ViT-Base	ViT-Large	ViT-Huge
official (PyTorch/GPU)	67.8	76.0	77.2
official rerun (PyTorch/GPU)	68.05	-	-
this repo (Paddle/GPU)	68.08	-	-
checkpoint	download	-	-

ConvMAE

Pre-Training

To pretrain ConvMAE-Base with multi-node distributed training, run the following on 3 nodes with 8 GPUs each:

unset PADDLE_TRAINER_ENDPOINTS
export PADDLE_NNODES=3
export PADDLE_MASTER="10.67.228.16:12538"
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export PADDLE_JOB_ID=ConvMAE

# 3 nodes for pretrain
ACCUM_ITER=1
IMAGENET_DIR=./dataset/ILSVRC2012/
python -m paddle.distributed.launch \
    --nnodes=$PADDLE_NNODES \
    --master=$PADDLE_MASTER \
    --devices=$CUDA_VISIBLE_DEVICES \
    main_pretrain.py \
    --accum_iter $ACCUM_ITER \
    --batch_size 128 \
    --model convmae_convvit_base_patch16 \
    --norm_pix_loss \
    --mask_ratio 0.75 \
    --epochs 1600 \
    --warmup_epochs 40 \
    --blr 1.5e-4 --weight_decay 0.05 \
    --data_path ${IMAGENET_DIR}

Fine-tuning

To finetune with multi-node distributed training, run the following on 4 nodes with 8 GPUs each:

#unset PADDLE_TRAINER_ENDPOINTS
#export PADDLE_NNODES=4
#export PADDLE_MASTER="10.67.228.16:12538"
#export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
#export PADDLE_JOB_ID=ConvMAE

# 4 nodes finetune setting
ACCUM_ITER=1
PRETRAIN_CHKPT='output_dir/checkpoint-1599.pd'
IMAGENET_DIR=./dataset/ILSVRC2012/
python -m paddle.distributed.launch \
    --nnodes=$PADDLE_NNODES \
    --master=$PADDLE_MASTER \
    --devices=$CUDA_VISIBLE_DEVICES \
    main_finetune.py \
    --accum_iter $ACCUM_ITER \
    --batch_size 32 \
    --model convvit_base_patch16 \
    --finetune ${PRETRAIN_CHKPT} \
    --epochs 100 \
    --blr 5e-4 --layer_decay 0.65 \
    --weight_decay 0.05 --drop_path 0.1 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
    --dist_eval --data_path ${IMAGENET_DIR}

Linear Probing

To finetune with multi-node distributed training, run the following on 4 nodes with 8 GPUs each:

#unset PADDLE_TRAINER_ENDPOINTS
#export PADDLE_NNODES=4
#export PADDLE_MASTER="10.67.228.16:12538"
#export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
#export PADDLE_JOB_ID=ConvMAE

IMAGENET_DIR=./dataset/ILSVRC2012/

# 4 nodes finetune setting
ACCUM_ITER=1
PRETRAIN_CHKPT='./output_dir/checkpoint-1599.pd'
python -m paddle.distributed.launch \
   --nnodes=$PADDLE_NNODES \
   --master=$PADDLE_MASTER \
   --devices=$CUDA_VISIBLE_DEVICES \
   main_linprobe.py \
   --accum_iter $ACCUM_ITER \
   --batch_size 128 \
   --model convvit_base_patch16 \
   --global_pool \
   --finetune ${PRETRAIN_CHKPT} \
   --epochs 90 \
   --blr 0.1 \
   --weight_decay 0.0 \
   --dist_eval --data_path ${IMAGENET_DIR}

Models

Model	Phase	Dataset	Epochs	GPUs	Img/sec	Top1 acc@1(%)	Official	Checkpoint	Log
ConvMAE-Base	pretrain	ImageNet2012	1600	A100*N3C24	4984	-	-	download	download
ConvViT-Base	finetune	ImageNet2012	100	A100*N4C32	3927	85.0	85.0	download	download
ConvViT-Base	linprobe	ImageNet2012	90	A100*N4C32	16732	71.4	70.9	download	download

Citations

@Article{MaskedAutoencoders2021,
  author  = {Kaiming He and Xinlei Chen and Saining Xie and Yanghao Li and Piotr Doll{\'a}r and Ross Girshick},
  journal = {arXiv:2111.06377},
  title   = {Masked Autoencoders Are Scalable Vision Learners},
  year    = {2021},
}

@article{gao2022convmae,
  title={ConvMAE: Masked Convolution Meets Masked Autoencoders},
  author={Gao, Peng and Ma, Teli and Li, Hongsheng and Dai, Jifeng and Qiao, Yu},
  journal={arXiv preprint arXiv:2205.03892},
  year={2022}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mae

mae

README.md

MAE and ConvMAE: A PaddlePaddle Re-Implementation

Requirements

Data Preparation

MAE

Pre-Training

Fine-tuning

Linear Probing

ConvMAE

Pre-Training

Fine-tuning

Linear Probing

Models

Citations

Files

mae

Directory actions

More options

Directory actions

More options

Latest commit

History

mae

Folders and files

parent directory

README.md

MAE and ConvMAE: A PaddlePaddle Re-Implementation

Requirements

Data Preparation

MAE

Pre-Training

Fine-tuning

Linear Probing

ConvMAE

Pre-Training

Fine-tuning

Linear Probing

Models

Citations