pip3 install transformer-tricks
Tricks and tools for speeding up LLMs:
-
Flash normalization:
- arXiv paper: https://arxiv.org/abs/2407.09577
- See python folder for code to convert LLMs to FlashNorm
- Notebook example for converting an LLM to FlashNorm:
- Notebook for paper:
- HuggingFace repo
-
Approximate attention [work in progress]:
-
Removing weights for skipless transformers:
- arXiv paper: https://arxiv.org/abs/2404.12362
- Notebook:
-
Precomputing the first layer:
- arXiv paper: https://arxiv.org/abs/2402.13388
Please give us a ⭐ if you like this repo, thanks!