Simple and minimalistic utility to manage many experiments runs and custom analysis of results
My job is to do research in Deep Learning and I have dozens of different
experiments. Testing one hypothesis usually required several runs over
parameter grid. Plotting and visualizing results is often ad-hoc and
updating code producing output is a kind of overhead. Instead I decided
to collect all results in Jupyter notebook and create plots kind of
interest ~ parameters
. As I said, plotting that is a separate task
almost every time. Such tools as
ModelDB provide you with simple
visualizations so that they can be easily aggregated for model
comparison. Testing a hypothesis is not about model comparison and thus
requires special treatment.
Visualizing results became a kind of pain, you had to remember a mapping
parameters -> results
, separating results into different folders
made even more mess. I had really bad experience in visualizations. I
got that all I need was to iterate over folder with results and apply
the same function to it.
pip install -U git+https://github.com/ferrine/exman.git#egg=exman
# or
pip install exman
Simple drop in replacement of standard argparse.ArgumentParser
#file: main.py
import exman
# you should always use `exman.simpleroot(__file__)` unless you want another dir
parser = exman.ExParser(root=exman.simpleroot(__file__)) # `root = ./exman` relative to the main file
parser.add_argument(...)
You then just add arguments as you did before without any change.
Since 0.0.3 you can use the following context manager. If main()
function fails it will be moved to exman/fails
import exman
# you should always use `exman.simpleroot(__file__)` unless you want another dir
parser = exman.ExParser(root=exman.simpleroot(__file__)) # `root = ./exman` relative to the main file
parser.add_argument(...)
...
if __name__ == '__main__':
args = parser.parse_args()
with args.safe_experiment:
# do your stuff
main(args)
To avoid non reproducible results you can ensure you have commited all changes. Exman will take care and will log
hash
for the commit and diff
if any. To use these features you should hint the parser with the repo.
import exman
parser = exman.ExParser(root=exman.simpleroot(__file__), git=True)
# less fragile solution, but works only locally
parser = exman.ExParser(root=exman.simpleroot(__file__), git="/abs/path/to/repo")
# an ok solution, if you are sure in the relative path
parser = exman.ExParser(root=exman.simpleroot(__file__),
git=os.path.join(os.path.dirname(__file__), "relative", "path", "goes", "here"),
git_assert_clean=True # run assertion check before each run. False by default.
)
In cli of your favorite experiment you can skip the assertion if you want to:
python train.py --git-dirty --other-args
To avoid issues in reproducing experiments
you should consider using exman.optional(type)
for optional
arguments
import exman
# you should always use `exman.simpleroot(__file__)` unless you want another dir
parser = exman.ExParser(root=exman.simpleroot(__file__)) # `root = ./exman` relative to the main file
parser.add_argument('--myarg', type=exman.optional(int))
In simple argparser you cant easily validate multiple arguments, it is easy in Exman. You can create an informative error message
import exman
# you should always use `exman.simpleroot(__file__)` unless you want another dir
parser = exman.ExParser(root=exman.simpleroot(__file__)) # `root = ./exman` relative to the main file
parser.add_argument(...)
# here `p` stands for initial namespace parsed from arguments
parser.register_validator(lambda p: p.arg1 != p.arg2 or p.arg3 == p.arg4,
# next line will be autoformatted for you using .format
'You have provided wrong set of arguments: {arg1}, {arg2}, {arg3}, {arg4}')
Advanced validators can raise exman.ArgumentError that contains a better message than the one in validators function
Pandas is a great tool to work with table data. Experiments are the same data and can be loaded in python. So all you need is to run batch of experiments and open a Jupyter notebook.
import exman
index = exman.Index(exman.simpleroot('/path/to/main.py'))
experiments = index.info()
Table has columns time (datetime64[ns])
of experiment and
root (pathlib.Path)
path to results. Moreover this table has all
other parameters of the experiment. You later can filter/order the
results according to them and have easy-breezy access to results folder
and it's content.
for i, ex in experiments.iterrows():
# do some actions
# use ex.param for parameters
# ex.root / 'plot.png' for file paths
...
You can store local configuration files in your experiment folder. You should provide the filename to ExParser as well.
import exman
# you should always use `exman.simpleroot(__file__)` unless you want another dir
parser = exman.ExParser(
root=exman.simpleroot(__file__),
default_config_files=['local.cfg']
)
Local configuration stores globally defined default values, they override defaults set in main file
If you want argument specific human friendly directory structure you can tie specific argument names for that
import exman
# you should always use `exman.simpleroot(__file__)` unless you want another dir
parser = exman.ExParser(
root=exman.simpleroot(__file__),
automark=['arg1', 'constant']
)
parser.add_argument('--arg1')
Later you can see your marked folder looks like this
exman/marked/arg1/<arg1>/constant/<name-of-experiment>/...
This can be usefull if you work in a team. Write in main.py
import exman
# you should always use `exman.simpleroot(__file__)` unless you want another dir
parser = exman.ExParser(
root=exman.simpleroot(__file__),
automark=['user'],
# store `user: myuser` content in local.cfg
default_config_files=['local.cfg']
)
parser.add_argument('--user')
After you've done that, your team runs can be stored in a single exman directory assuming all access rights are correctly set up.
exman/marked/user/<username>/constant/<name-of-experiment>/...
In command line runs will look also the same:
python main.py --param1 foo --param2 bar
Things change if you actually run the program. It dumps all the parsed
parameters combined with defaults into Yaml style file into location
root/runs/<name-of-experiment>/params.yaml
. name-of-experiment
is generic and autocreated on the fly. For quick look or search there
are symlinks in the index
folder e.g.
root/index/<name-of-experiment>.yaml
. Since a lot of experiments are
created and debugging is sometimes needed, you might want not to create
debug experiments in runs
folder. For that case you just add
--tmp
flag and new filed will be written to
root/tmp/<name-of-experiment>
folder. That is convenient as you both
do not loose important info about experiment and results and can restore
these symlinks in index by hand if needed.
root |-- runs | `-- xxxxxx-YYYY-mm-dd-HH-MM-SS | |-- params.yaml | `-- ... |-- fails |-- index | `-- xxxxxx-YYYY-mm-dd-HH-MM-SS.yaml (symlink) |-- marked | `-- <mark> | `-- xxxxxx-YYYY-mm-dd-HH-MM-SS (symlink) | |-- params.yaml | `-- ... `-- tmp `-- xxxxxx-YYYY-mm-dd-HH-MM-SS |-- params.yaml `-- ...
If you want to reproduce an experiment, you can provide source configuration file in yaml format. For example:
python main.py --config root/index/<name-of-experiment-to-reproduce>.yaml
All the values will be restored from the previous run. You can also
modify old values in --config ...
using
python main.py --config root/index/<name-of-experiment-to-reproduce>.yaml --override-param=new_value
In case you do not want to restore some argument from saved config (it
may be some dynamic setted variable) you should use volatile=True
in
add_argument
:
parser.add_argument('--my_dynamic_id', default=os.environ.get('AUTOSETTED_ID'), volatile=True)
If you like some experiments you can mark them for easier later access.
cd root_of_exman_dir exman mark <key> <#ex1> [<#ex2> <#ex3> ...]
and later in Jupyter
index = exman.Index(exman.simpleroot('/path/to/main.py'))
experiments = index.info('<key>')
# assuming you work in a team and use best practice advice
user_experiments = index.info('user/username')
cd root_of_exman_dir # delete only index exman delete <#ex1> [<#ex2> <#ex3> ...] # delete all files exman delete --all <#ex1> [<#ex2> <#ex3> ...]