Cerebro
is a data system for optimized deep learning model selection. It uses a novel parallel execution strategy
called Model Hopper Parallelism (MOP) to execute end-to-end deep learning model selection workloads in a more
resource-efficient manner. Detailed technical information about Cerebro
can be found in our
Technical Report.
The best way to install the Cerebro
is via pip.
pip install -U cerebro-dl
Alternatively, you can git clone and run the provided Makefile script
git clone https://github.com/ADALabUCSD/cerebro-system.git && cd cerebro-system && make
You MUST be running on Python >= 3.6 with Tensorflow >= 2.2 and Apache Spark >= 2.4
Detailed documentation about the system can be found here.
This project was/is supported in part by a Hellman Fellowship, the NIDDK of the NIH under award number R01DK114945, and an NSF CAREER Award.
We used the following projects when building Cerebro.
- Horovod: Cerebro's Apache Spark implementation uses code from the Horovod's implementation for Apache Spark.
- Petastorm: We use Petastorm to read Apache Parquet data from remote storage (e.g., HDFS)
If you use this software for research, plase cite the following papers:
@inproceedings{nakandala2019cerebro,
title={Cerebro: Efficient and Reproducible Model Selection on Deep Learning Systems},
author={Nakandala, Supun and Zhang, Yuhao and Kumar, Arun},
booktitle={Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learning},
pages={1--4},
year={2019}
}