Isotropic architectures have recently gained focus for solving computer vision problems for their ability to capture better spatial information. In this work, we experiment with training a ConvMixer model, an isotropic convolutional neural net architecture on the CIFAR-10 dataset. We propose a new architecture: ConvMixer-XL consisting of 66 layers and just under
Name | Activation | Depth | Inter-Block Skip | Augmentations | #Params (M) | Top 1 %Acc |
---|---|---|---|---|---|---|
CM-Vanilla-NoAug | GELU | 8 | No | No | 0.59 | 0.8854 |
CM-Vanilla | GELU | 8 | No | Yes | 0.59 | 0.9378 |
CM-Vanilla-ReLU | ReLU | 8 | No | Yes | 0.59 | 0.9384 |
CM-Vanilla-SiLU | SiLU | 8 | No | Yes | 0.59 | 0.9372 |
CM-XL-NoSkip | GELU | 66 | No | Yes | 4.9 | 0.4868 |
CM-XL-Skip | GELU | 66 | Yes | Yes | 4.9 | 0.9422 |
CM-XL | SiLU | 66 | Yes | Yes | 4.9 | 0.9452 |
We have uploaded all our experimentation logging and weights generated here.
To reproduce the best performing configuration of ConvMixerXL, run the following code:
python3 train.py --lr-max=0.005 \
--depth=66\
--model='CM-XL'\
--activation='SiLU'\
--name='final_CMXL_SiLU'\
--save_dir='output/agg'\
--batch-size=128
This project is built based on ConvMixer CIFAR-10 and ConvMixer.