Skip to content

A light-weight script for maintaining a LOT of machine learning experiments.

License

Notifications You must be signed in to change notification settings

simtony/mlrunner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Maintaining many machine learning experiments requires much manual effort. This lightweight tool helps you currently run a LOT of experiments with simple commands and configurations. You can easily aggregate custom metrics for each experiment with a single line of code.

Install

$ pip install mlrunner

Usage

Download and edit params.yaml, then simply

$ run

When all experiments finish, start a jupyter notebook and analyze results using examine.Examiner.

See examples for typical use cases. See comments in params.yaml for available configurations. Use run -h for available command-line args.

Example

Suppose we develop a new normalization layer "newnorm" and want to compare it to batchnorm. Both have a hyperparameter --moment. We also want to see how early stop affects our model, which is specified by a boolean flag --early-stop. Each run involves training, checkpoint average and test with the averaged checkpoint. Then params.yaml can be:

---
# All commands for each experiment with params to be filled specified as `{param}` or `[param]`
# `{_output}` is a reserved param for the automatically generated output directory
template:
  train: >
    python train.py data-bin/{data} --save-dir {_output} --norm {norm} [moment] [early-stop]

  avg: >
    python checkpoint_avg.py --inputs {_output} --num 5 --output {_output}/avg.pt

  test: >
    python generate.py data-bin/{data} --beam 5 --path {_output}/avg.pt

# default values for all params
default:
  data: iwslt14
  norm: batch
  moment: 0.1
  early-stop: False

# GPU indices to be filled in `CUDA_VISIBLE_DEVICES={}`, each corresponds to a worker.
resource: [ 0, 1, 2, 3 ]

---
# compare the effect of different normalization layer and moment 
norm: [ new, batch ]
moment: [ 0.1, 0.05 ]

---
# examine the effect of early stopping
norm: [ batch ]
early-stop: [ True, False ]

Since norm=batch,moment=0.1 and norm=batch,early-stop=False share the same params, the latter is skipped. As we specify 4 workers each with only one gpu, there are 4 tasks running concurrently:

$ run
Orphan params: set()
Tasks: 5, Commands: 15
START   gpu: 0, train: 1/ 4, output/Norm_new-Moment_0.1
START   gpu: 1, train: 2/ 4, output/Norm_new-Moment_0.05
START   gpu: 2, train: 3/ 4, output/Norm_batch-Moment_0.1
START   gpu: 3, train: 4/ 4, output/Norm_batch-Moment_0.05
START   gpu: 0, avg  : 1/ 4, output/Norm_new-Moment_0.1
FAIL    gpu: 0, avg  : 1/ 4, output/Norm_new-Moment_0.1
...

The command-line logs are redirected to directories (referred with {_output}) of each experiment (named with parameters):

$ ls output/Norm_batch-Moment_0.1
checkpoint51.pt
checkpoint52.pt
averaged_model.pt
log.train.20220316.030151
log.avg.20220316.030151
log.test.20220316.030151
param
stat

We provide Examiner as a container to iteratively apply a metric parser to all experiments and aggregate the results. In this example we simply parse the test log for the test BLEU:

from mlrunner.examine import Examiner, latest_log


# define a metric parser for each directory (experiment)
def add_bleu(output_dir, experiment, caches):
    # Each parser follows the same signature
    # It can read/write to a global cache dict `caches`, 
    # and read/write each experiment: 
    # collections.namedtuple("Experiment", ["cache", "metric", "param"])
    latest_test_log = latest_log("test", output_dir)
    bleu = parse_bleu(latest_test_log)  # a user-defined log parser
    experiment.metric["bleu"] = bleu


examiner = Examiner()  # container for parsed results
# register parser for each directory (experiment)
examiner.add(add_bleu)
# run all parsers for directories matched by regex 
examiner.exam(output="output", regex=".*")
# print the tsv table with all (different) params and metrics of each experiment
# return a pandas DataFrame object.
df = examiner.table(print_tsv=True)

which results in

norm	moment	early-stop	bleu
new	0.1	FALSE	11.0
new	0.05	FALSE	12.3
batch	0.1	FALSE	14.4
batch	0.05	FALSE	16.5
batch	0.1	TRUE	15.0

A pandas DataFrame object is returned for further analysis.

Under the hood

A sweep of param combinations results in an ordered task pool. Each param combination is a task. Each worker bound to a resource concurrently pulls a task from the pool in order, edits each command in template, and executes the commands sequentially. Editions include:

  1. Substituting the param placeholders ({param} and [param]) with corresponding params.
  2. Appending shell environment variable CUDA_VISIBLE_DEVICES={resource} as the prefix
  3. Appending shell redirect > output_dir/log.{command}.{time} 2>&1 as the suffix

About

A light-weight script for maintaining a LOT of machine learning experiments.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages