Release v0.2.5 · casper-hansen/AutoAWQ

What's Changed

Fix fused models for tf >= 4.39 by @TechxGenus in #418
FIX: Add safe guards for static cache + llama on transformers latest by @younesbelkada in #401
Pin: lm_eval==0.4.1 by @casper-hansen in #426
Implement apply_clip argument to quantize() by @casper-hansen in #427
Workaround: illegal memory access by @casper-hansen in #421
Add download_kwargs for load model (#302) by @Roshiago in #399
add starcoder2 support by @shaonianyr in #406
Add StableLM support by @Isotr0py in #410
Fix starcoder2 fused norm by @TechxGenus in #442
Update generate example to llama 3 by @casper-hansen in #448
[BUG] Fix github action documentation build by @suparious in #449
Fix path by @casper-hansen in #451
FIX: 'awq_ext' is not defined error by @younesbelkada in #465
FIX: Fix multiple generations for new HF cache format by @younesbelkada in #444
support max_memory to specify mem usage for each GPU by @laoda513 in #460
Bump to 0.2.5 by @casper-hansen in #468

Full Changelog: v0.2.4...v0.2.5