Experiments tracking with MLflow

User can run ImageNet training using MLflow experiments tracking system on the local machine.

Requirements

We use conda and MLflow to handle experiments/runs and all python dependencies. Please, install these tools:

MLflow: pip install mlflow
conda

We need to also install Nvidia/APEX and libraries for opencv. APEX is automatically installed on the first run. Manually, all can be installed with the following commands. Important, please, check the content of experiments/setup_opencv.sh before running.

sh experiments/setup_apex.sh
 
sh experiments/setup_opencv.sh

Usage

Download ImageNet-1k dataset

Since 10/2019, we need to register an account in order to download the dataset. To download the dataset, use the following form : http://www.image-net.org/download.php

Setup dataset path

To configure the path to already existing ImageNet dataset, please specify DATASET_PATH environment variable

export DATASET_PATH=/path/to/imagenet
# export DATASET_PATH=$PWD/input/imagenet

MLflow setup

Setup mlflow output path as a local storage (option with remote storage is not supported):

export MLFLOW_TRACKING_URI=/path/to/output/mlruns
# e.g export MLFLOW_TRACKING_URI=$PWD/output/mlruns

Create once "Trainings" experiment

mlflow experiments create -n Trainings

or check existing experiments:

mlflow experiments list

Training on single node with single GPU

Please, make sure to adapt training data loader batch size to your GPU type. By default, batch size is 64.

export MLFLOW_TRACKING_URI=/path/to/output/mlruns
# e.g export MLFLOW_TRACKING_URI=$PWD/output/mlruns
mlflow run experiments/mlflow --experiment-name=Trainings -P config_path=configs/train/baseline_r50.py -P num_gpus=1

Training on single node with multiple GPUs

For optimal devices usage, please, make sure to adapt training data loader batch size to your infrastructure. By default, batch size is 64 per process.

export MLFLOW_TRACKING_URI=/path/to/output/mlruns
# e.g export MLFLOW_TRACKING_URI=$PWD/output/mlruns
mlflow run experiments/mlflow --experiment-name=Trainings -P config_path=configs/train/baseline_r50.py -P num_gpus=2

Training tracking

MLflow dashboard

To visualize experiments and runs, user can start mlflow dashboard:

mlflow server --backend-store-uri /path/to/output/mlruns --default-ainfrastructure/path/to/output/mlruns -p 6026 -h 0.0.0.0
# e.g mlflow server --backend-store-uri $PWD/output/mlruns --default-artifact-root $PWD/output/mlruns -p 6026 -h 0.0.0.0

Tensorboard dashboard

To visualize experiments and runs, user can start tensorboard:

tensorboard --logdir /path/to/output/mlruns/1
# e.g tensorboard --logdir $PWD/output/mlruns/1

where /1 points to "Training" experiment.

Implementation details

Files tree description:

code
configs  
experiments/mlflow : MLflow related files
notebooks

Experiments

conda.yaml: defines all python dependencies necessary for our experimentations
MLproject: defines types of experiments we would like to perform by "entry points":
- main : starts single-node multi-GPU training script

When we execute

mlflow run experiments/mlflow --experiment-name=Trainings -P config_path=configs/train/baseline_r50.py -P num_gpus=2

it executes main entry point from MLproject and runs provided command.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NOTES_MLflow.md

NOTES_MLflow.md

Experiments tracking with MLflow

Requirements

Usage

Download ImageNet-1k dataset

Setup dataset path

MLflow setup

Training on single node with single GPU

Training on single node with multiple GPUs

Training tracking

MLflow dashboard

Tensorboard dashboard

Implementation details

Experiments

Files

NOTES_MLflow.md

Latest commit

History

NOTES_MLflow.md

File metadata and controls

Experiments tracking with MLflow

Requirements

Usage

Download ImageNet-1k dataset

Setup dataset path

MLflow setup

Training on single node with single GPU

Training on single node with multiple GPUs

Training tracking

MLflow dashboard

Tensorboard dashboard

Implementation details

Experiments