Release v0.1.8 · casper-hansen/AutoAWQ

What's Changed

Fix MPT by @casper-hansen in #206
Add config to Base model by @casper-hansen in #207
Add Qwen model by @Sanster in #182
Robust quantization for Catcher by @casper-hansen in #209
New scaling to improve perplexity by @casper-hansen in #216
Benchmark hf generate by @casper-hansen in #237
Fix position ids by @casper-hansen in #215
Pass model_init_kwargs to check_and_get_model_type function by @rycont in #232
Fixed an issue where the Qwen model had too much error after quantization by @jundolc in #243
Load on CPU to avoid OOM by @casper-hansen in #236
Update README.md by @casper-hansen in #245
[core] Make AutoAWQ fused modules compatible with HF transformers by @younesbelkada in #244
[core] Fix quantization issues with transformers==4.36.0 by @younesbelkada in #249
FEAT: Add possibility of skipping modules when quantizing by @younesbelkada in #248
Fix quantization issue with transformers >= 4.36.0 by @younesbelkada in #264
Mixtral: Mixture of Experts quantization by @casper-hansen in #251
Fused rope theta by @casper-hansen in #270
FEAT: add llava to autoawq by @younesbelkada in #250
Add Baichuan2 Support by @AoyuQC in #247
Set default rope_theta on LlamaLikeBlock by @casper-hansen in #271
Update news and models supported by @casper-hansen in #272
Add vLLM async example by @casper-hansen in #273
Bump to v0.1.8 by @casper-hansen in #274

Full Changelog: v0.1.7...v0.1.8