Activations And Augmentations: Pushing The Performance Of Isotropic ConvNets

Keifer Lee, Shubham Gupta, Karan Sharma

Colab Demo Report

Abstract

Isotropic architectures have recently gained focus for solving computer vision problems for their ability to capture better spatial information. In this work, we experiment with training a ConvMixer model, an isotropic convolutional neural net architecture on the CIFAR-10 dataset. We propose a new architecture: ConvMixer-XL consisting of 66 layers and just under $5M$ parameters. To maximize its performance, various configurations of the architecture, augmentations and activations were tried in our ablation study to further fine-tune the model. Our experiments show applying augmentations and using the Swish (SiLU) activation function for deeper models gives the best results with a top-1 accuracy of 94.52%. Our code can be found at https://github.com/datacrisis/ConvMixerXL.

Results

Name	Activation	Depth	Inter-Block Skip	Augmentations	#Params (M)	Top 1 %Acc
CM-Vanilla-NoAug	GELU	8	No	No	0.59	0.8854
CM-Vanilla	GELU	8	No	Yes	0.59	0.9378
CM-Vanilla-ReLU	ReLU	8	No	Yes	0.59	0.9384
CM-Vanilla-SiLU	SiLU	8	No	Yes	0.59	0.9372
CM-XL-NoSkip	GELU	66	No	Yes	4.9	0.4868
CM-XL-Skip	GELU	66	Yes	Yes	4.9	0.9422
CM-XL	SiLU	66	Yes	Yes	4.9	0.9452

Weights and logs

We have uploaded all our experimentation logging and weights generated here.

Run

To reproduce the best performing configuration of ConvMixerXL, run the following code:

python3 train.py --lr-max=0.005 \
                  --depth=66\
                  --model='CM-XL'\
                  --activation='SiLU'\
                  --name='final_CMXL_SiLU'\
                  --save_dir='output/agg'\
                  --batch-size=128

References

This project is built based on ConvMixer CIFAR-10 and ConvMixer.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
archive		archive
.gitignore		.gitignore
README.md		README.md
networks.py		networks.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Activations And Augmentations: Pushing The Performance Of Isotropic ConvNets

Keifer Lee, Shubham Gupta, Karan Sharma

Colab Demo Report

Abstract

Results

Weights and logs

Run

References

About

Releases

Packages

Contributors 2

Languages

datacrisis/ConvMixerXL

Folders and files

Latest commit

History

Repository files navigation

Activations And Augmentations: Pushing The Performance Of Isotropic ConvNets

Keifer Lee, Shubham Gupta, Karan Sharma

Colab Demo Report

Abstract

Results

Weights and logs

Run

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages