This repository provides implementations and experiments for the following papers, as well as simplified presentations of earlier work such as S4.
Hyena Hierarchy: Towards Larger Convolutional Language models
Michael Poli*, Stefano Massaroli*, Eric Nguyen*, Daniel Y. Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, Christopher Ré
Paper
Simple Hardware-Efficient Long Convolutions for Sequence Modeling
Daniel Y. Fu*, Elliot L. Epstein*, Eric Nguyen, Armin W. Thomas, Michael Zhang, Tri Dao, Atri Rudra, Christopher Ré
Paper
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Daniel Y. Fu*, Tri Dao*, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher Ré
International Conference on Learning Representations, 2023. Notable top-25% (spotlight).
Paper
Include H3, LLM training, and synthetics in this repositoryMove in fast convolution codeAdd Hyena implementation and experiments- pip package
See CHANGELOG.md
This repository requires Python 3.8+ and Pytorch 1.10+. Other packages are listed in requirements.txt.
The easiest way to get started is to run the standalone_cifar.py
script.
This scripts trains a simple long convolution model on CIFAR-10:
python -m standalone_cifar
See the experiments page for more:
- LRA experiments from the Long Convs paper
- H3 experiments (language model, synthetics)
- H3 + Long Conv experiments
- Hyena language and vision experiments
If you use this codebase, or otherwise found our work valuable, you can cite us as follows:
@article{poli2023hyena,
title={Hyena Hierarchy: Towards Larger Convolutional Language Models},
author={Poli, Michael and Massaroli, Stefano and Nguyen, Eric and Fu, Daniel Y and Dao, Tri and Baccus, Stephen and Bengio, Yoshua and Ermon, Stefano and R{\'e}, Christopher},
journal={arXiv preprint arXiv:2302.10866},
year={2023}
}
@article{fu2023simple,
title={Simple Hardware-Efficient Long Convolutions for Sequence Modeling},
author={Fu, Daniel Y. and Epstein, Elliot L. and Nguyen, Eric and Thomas, Armin W. and Zhang, Michael and Dao, Tri and Rudra, Atri and R{\'e}, Christopher},
journal={arXiv preprint arXiv:2302.06646},
year={2023}
}
@inproceedings{fu2023hungry,
title={Hungry {H}ungry {H}ippos: Towards Language Modeling with State Space Models},
author={Fu, Daniel Y. and Dao, Tri and Saab, Khaled K. and Thomas, Armin W.
and Rudra, Atri and R{\'e}, Christopher},
booktitle={International Conference on Learning Representations},
year={2023}
}
This repo was forked from Albert Gu's state spaces repo and borrows its structure. It also contains code from the FlashAttention training scripts.