Skip to content

UChi-JCL/CacheGen

Repository files navigation

CacheGen: Fast Context Loading for Language Model Applications via KV Cache Streaming

For the latest update and integration, please check out the LMCache project!

This is the code repo for CacheGen: Fast Context Loading for Language Model Applications via KV Cache Streaming (SIGCOMM'24). The code structure is organized as follows:

  • LMCache: The modules for KV cache encoding / decoding with CacheGen's customized codec
  • test_data: The example testing cases for CacheGen.
  • src: Some helper functions used by CacheGen (e.g., transforming tensor to tuple, transforming tuple to tensor etc.)

Installation

To install the required python packages to run CacheGen with conda

conda env create -f env.yaml
conda activate cachegen
pip install -e LMCache
cd LMCache/third_party/torchac_cuda 
python setup.py install

Examples

Please refer to the page sigcomm_ae.md for running examples for CacheGen.

Contact

Yuhan Liu ([email protected])