This repository contains:
- BCacheSim, a Python simulator with a focus on flash caching for bulk storage systems.
- EpisodicAnalysis, a Python module implementing the episodes model for flash caching and the training of ML admission and ML prefetching models based on episodes.
Supported admission policies:
- Baleen
- RejectX
- CoinFlip
Supported eviction policies
- LRU
- FIFO
- cachesim/
- simulate_ap.py: command line wrapper for for simulator
- sim_cache.py, admission_policies.py, prefetchers.py: key simulator code
- testbed/: utilities to benchmark machines for Service Time and launch CacheBench runs
- stats: C++ utilities that ingest the entire trace and produce stats (to be released)
- episodic_analysis:
- scripts/: scripts for processing traces
You may install packages via Conda or Pip, which will be sufficient to run the simulator.
For further research and development, you may wish to use a cluster manager to run many experiments in parallel (you can write an adaptor to your preferred one by modifying episodic_analysis/local_cluster.py). I use brooce with my experiment filesystem mounted on NFS -- you can clone my fork if you wish. This is not necessary to run basic experiments.
conda env create -f install/env_cachelib-py-3.11.yaml
conda env create -f install/env_cachelib-pypy-3.8.yaml
Alternatively:
micromamba create -c conda-forge -n cachelib-py-3.11 python=3.11 numpy pandas psutil scipy matplotlib seaborn tqdm lightgbm scikit-learn redis-py jsonargparse retry jupyterlab ipywidgets jupyter_nbextensions_configurator commentjson
micromamba create -c conda-forge -n cachelib-pypy-3.8 python=3.8 pypy numpy pandas psutil scipy matplotlib seaborn tqdm lightgbm scikit-learn redis-py jsonargparse retry jupyterlab ipywidgets jupyter_nbextensions_configurator commentjson
Note: scikit-learn only works with PyPy 3.8, not 3.9 yet. LightGBM requires sklearn.
Error: No module named 'sklearn.__check_build._check_build'
pip install -r install/requirements.txt
which is equivalent to
# For simulator
pip install lightgbm numpy pandas scikit-learn
pip install spookyhash jsonargparse compress_json compress_pickle retry commentjson
# Optional
pip install psutil ipywidgets
# optional: pympler.tracker
# Scripts
pip install tqdm
# For episodic_analysis
pip install scipy
pip install redis
# For cache-analysis
pip install matplotlib seaborn
# Advanced policies
pip install pqdict
Traces are available at https://ftp.pdl.cmu.edu/pub/datasets/Baleen24/. We ask that academic works using any code or traces to cite Baleen1 and, if appropriate, CacheLib 2 and Tectonic 3.
For the Baleen-FAST24 repository (meant for those trying to reproduce results in the Baleen paper), please see https://github.com/wonglkd/Baleen-FAST24.
For further questions, please contact Daniel Lin-Kit Wong.
Footnotes
-
Baleen: ML Admission & Prefetching for Flash Caches
Daniel Lin-Kit Wong, Hao Wu, Carson Molder, Sathya Gunasekar, Jimmy Lu, Snehal Khandkar, Abhinav Sharma, Daniel S. Berger, Nathan Beckmann, Gregory R. Ganger
USENIX FAST 2024 ↩ -
The CacheLib Caching Engine: Design and Experiences at Scale
Benjamin Berg, Daniel S. Berger, Sara McAllister, Isaac Grosof, Sathya Gunasekar, Jimmy Lu, Michael Uhlar, Jim Carrig, Nathan Beckmann, Mor Harchol-Balter, and Gregory R. Ganger
USENIX OSDI 2020 ↩ -
Facebook's Tectonic Filesystem: Efficiency from Exascale
Satadru Pan, Theano Stavrinos, Yunqiao Zhang, Atul Sikaria, Pavel Zakharov, Abhinav Sharma, Mike Shuey, Richard Wareing, Monika Gangapuram, Guanglei Cao, Christian Preseau, Pratap Singh, Kestutis Patiejunas, and JR Tipton, Ethan Katz-Bassett, and Wyatt Lloyd
USENIX FAST 2021 ↩