v0.1.8
What's Changed
- Fix MPT by @casper-hansen in #206
- Add config to Base model by @casper-hansen in #207
- Add Qwen model by @Sanster in #182
- Robust quantization for Catcher by @casper-hansen in #209
- New scaling to improve perplexity by @casper-hansen in #216
- Benchmark hf generate by @casper-hansen in #237
- Fix position ids by @casper-hansen in #215
- Pass
model_init_kwargs
tocheck_and_get_model_type
function by @rycont in #232 - Fixed an issue where the Qwen model had too much error after quantization by @jundolc in #243
- Load on CPU to avoid OOM by @casper-hansen in #236
- Update README.md by @casper-hansen in #245
- [
core
] Make AutoAWQ fused modules compatible with HF transformers by @younesbelkada in #244 - [
core
] Fix quantization issues with transformers==4.36.0 by @younesbelkada in #249 - FEAT: Add possibility of skipping modules when quantizing by @younesbelkada in #248
- Fix quantization issue with transformers >= 4.36.0 by @younesbelkada in #264
- Mixtral: Mixture of Experts quantization by @casper-hansen in #251
- Fused rope theta by @casper-hansen in #270
- FEAT: add llava to autoawq by @younesbelkada in #250
- Add Baichuan2 Support by @AoyuQC in #247
- Set default rope_theta on LlamaLikeBlock by @casper-hansen in #271
- Update news and models supported by @casper-hansen in #272
- Add vLLM async example by @casper-hansen in #273
- Bump to v0.1.8 by @casper-hansen in #274
New Contributors
- @Sanster made their first contribution in #182
- @rycont made their first contribution in #232
- @jundolc made their first contribution in #243
- @AoyuQC made their first contribution in #247
Full Changelog: v0.1.7...v0.1.8