Releases: casper-hansen/AutoAWQ
Releases · casper-hansen/AutoAWQ
v0.1.5
What's Changed
- Only apply attention mask if seqlen is greater than 1 by @casper-hansen in #96
- add gpt_neox support by @twaka in #113
- [
core
] Support fp32 / bf16 inference by @younesbelkada in #121 - Fix potential overflow by @casper-hansen in #102
- Fixing starcoder based models with 15B by @SebastianBodza in #118
- Support Aquila models. by @ftgreat in #123
- Add benchmark of Aquila2 34B AWQ in README.md. by @ftgreat in #126
New Contributors
- @twaka made their first contribution in #113
- @younesbelkada made their first contribution in #121
- @SebastianBodza made their first contribution in #118
- @ftgreat made their first contribution in #123
Full Changelog: v0.1.4...v0.1.5
v0.1.4
What's Changed
- Refactor cache and embedding modules by @casper-hansen in #95
- Fix
TypeError: 'NoneType' object is not subscriptable
Full Changelog: v0.1.3...v0.1.4
v0.1.3
What's Changed
- Turing inference support (Colab+Kaggle working) by @casper-hansen in #92
- Fix memory bug (save 2GB VRAM)
Full Changelog: v0.1.2...v0.1.3
v0.1.2
What's Changed
- Fix unexpected keyword by @casper-hansen in #88
- Fix Falcon n_kv_heads parameter by @casper-hansen in #89
- Mistral fused modules by @casper-hansen in #90
Full Changelog: v0.1.1...v0.1.2
v0.1.1
What's Changed
- Add GPT BigCode support (StarCoder) by @casper-hansen in #61
- Use typing classes over base types by @VikParuchuri in #69
- Fix KV cache shapes error by @casper-hansen in #75
- Mistral support by @casper-hansen in #79
- Add low_cpu_mem_usage=True in example by @casper-hansen in #80
- Offloading to cpu and disk by @s4rduk4r in #77
- Faster build, fix "no space left". by @casper-hansen in #84
New Contributors
- @VikParuchuri made their first contribution in #69
- @s4rduk4r made their first contribution in #77
Full Changelog: v0.1.0...v0.1.1
v0.1.0
What's Changed
- Support Falcon 180B by @casper-hansen in #35
- [NEW] GEMV kernel implementation by @casper-hansen in #40
- Allow user to use custom calibration data for quantization by @boehm-e in #27
- Safetensors and model sharding by @casper-hansen in #47
- 2x faster context processing with GEMV by @casper-hansen in #58
- Support kv_heads by @casper-hansen in #60
- Refactor quantization code by @casper-hansen in #62
- support windows by @qwopqwop200 in #53
- Improve model loading by @casper-hansen in #66
New Contributors
Full Changelog: v0.0.2...v0.1.0
v0.0.2
What's Changed
- Refactor fused modules by @casper-hansen in #18
- fuse_layers bug fix by @qwopqwop200 in #21
- support speedtest to benchmark FP16 model by @wanzhenchn in #25
- Implement batch size for speed test by @casper-hansen in #26
- [BUG] Fix illegal memory access + Quantized Multi-GPU support by @casper-hansen in #28
- YaRN support for LLaMa models by @casper-hansen in #23
New Contributors
- @wanzhenchn made their first contribution in #25
Full Changelog: v0.0.1...v0.0.2
v0.0.1
What's Changed
- Add GPTJ Support by @jamesdborin in #1
- windows support by @qwopqwop200 in #16
- Release PyPi package + Create GitHub workflow by @casper-hansen in #9
New Contributors
- @jamesdborin made their first contribution in #1
- @qwopqwop200 made their first contribution in #16
- @casper-hansen made their first contribution in #9
Full Changelog: https://github.com/casper-hansen/AutoAWQ/commits/v0.0.1