-
In this work, I employ 3 different widely known semantic segmentation architectures and 3 different backbones. In total, I report results for 8 different models, each of them trained from scratch. The 8 model variants are,
-
Each model takes 3 channel imput. Output is a single channel
-
All the images are standardized using normal standardization to have pixel values between 0 and 1.
-
Each model is initialized with pretrained weights/pretrained backbone, and trained for maximum 30 epochs, with early stopping criteria.
-
Early stopping regularization is decided based on validation metric, which could either be weighted binary cross-entropy loss or IoU loss
-
Patience period of 10 epochs is used for early stopping.
-
Here, I play with two different loss functions. Weighted binary cross-entropy or IoU loss(Jaccard index). Weighted BCE required tuning of weight hyperparameter for minoroty class
-
Reference papers:
-
This work is fully implemented in pytorch. Following are the reference implementations used/adopted in this work,
- Pytorch vision - used for Deeplabv3, ResNets, MobileNets
- Early stopping - To implement early stopping
- detectron2 - For config nodes
- We report the results for models trained using IoU loss. We maximize the overlap between predicted foreground masks and ground truth foreground masks. We can see the results below. It is evident that
Lite R-ASPP + MobilenetV3 large
is able to perform very well for very less number of parameters. Such models provide perfect balance of performance and computational complexity, and hence desirable for most applications.
model | IoU | # params |
---|---|---|
DeeplabV3 + MobilenetV3 large | 0.46458 | 11,023,968 |
DeeplabV3 + MobilenetV3 small | 0.4557 | 6,124,577 |
DeeplabV3 + ResNet101 | 0.5724 | 60,985,922 |
DeeplabV3 + ResNet50 | 0.5789 | 41,993,794 |
FCN + ResNet50 | 0.5663 | 35,306,818 |
FCN + ResNet101 | 0.5596 | 54,298,946 |
Lite R-ASPP + MobilenetV3 large | 0.5301 | 3,218,138 |
Lite R-ASPP + MobilenetV3 small | 0.4768 | 1,074,874 |
conda create -n mila-challenge python=3.8
conda activate mila-challenge
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
conda install -c conda-forge tensorboard matplotlib fvcore
conda install -c anaconda scikit-learn
pip install iopath gdown
git clone https://github.com/dhaivat1729/mila-AI-challenge.git
cd mila-AI-challenge
gdown --id 1sD-2kBiAw94rwTCFi2xVP6YciqLvUm0D
unzip mila-segmentation-logs-20210711T195658Z-001.zip && rm mila-segmentation-logs-20210711T195658Z-001.zip
mila-AI-challenge/
mila-segmentation-logs/
deeplabv3_mobilenet_v3_large_v2_no_L2_val_loss_metric/
deeplabv3_mobilenet_v3_large_v3_jaccard_training/
.
.
.
.
lraspp_mobilenet_v3_small_v3_jaccard_training/
python train_net.py -dataset_path '/path/to/segmentation_project/' -model_name <model name> -model_ver v1
<model name>
could befcn_resnet101
,fcn_resnet50
,deeplabv3_resnet50
,deeplabv3_resnet101
,lraspp_mobilenet_v3_large
,deeplabv3_mobilenet_v3_large
,deeplabv3_mobilenet_v3_small
,lraspp_mobilenet_v3_small
model_ver
is used to version the model incase multiple models of same architectures need to be trained.
segmentation_project/
train/
img/
<image 1>.jpg
<image 2>.jpg
.
.
<image n>.jpg
mask/
<image 1>.BMP
<image 2>.BMP
.
.
<image n>.BMP
python train_net.py -dataset_path '/path/to/segmentation_project/' -model_name lraspp_mobilenet_v3_large -model_ver v1
There are options to set batchsize, loss function, loss weights etc in src/config/default.py
, feel free to change it there or overwrite it in setup()
function in utils.py
.
python infer.py /path/to/test/data/ /path/to/output/data/
mila-AI-challenge/
infer.py
test_data/
img/
<image 1>.jpg
<image 2>.jpg
.
.
<image n>.jpg
python infer.py test_data/img test_data/bmp_results
mila-AI-challenge/
infer.py
test_data/
img/
<image 1>.jpg
<image 2>.jpg
.
.
<image n>.jpg
bmp_results/
<image 1>.BMP
<image 2>.BMP
.
.
<image n>.BMP