v0.2.5
What's Changed
- Fix fused models for tf >= 4.39 by @TechxGenus in #418
- FIX: Add safe guards for static cache + llama on transformers latest by @younesbelkada in #401
- Pin: lm_eval==0.4.1 by @casper-hansen in #426
- Implement
apply_clip
argument toquantize()
by @casper-hansen in #427 - Workaround: illegal memory access by @casper-hansen in #421
- Add download_kwargs for load model (#302) by @Roshiago in #399
- add starcoder2 support by @shaonianyr in #406
- Add StableLM support by @Isotr0py in #410
- Fix starcoder2 fused norm by @TechxGenus in #442
- Update generate example to llama 3 by @casper-hansen in #448
- [BUG] Fix github action documentation build by @suparious in #449
- Fix path by @casper-hansen in #451
- FIX: 'awq_ext' is not defined error by @younesbelkada in #465
- FIX: Fix multiple generations for new HF cache format by @younesbelkada in #444
- support max_memory to specify mem usage for each GPU by @laoda513 in #460
- Bump to 0.2.5 by @casper-hansen in #468
New Contributors
- @Roshiago made their first contribution in #399
- @shaonianyr made their first contribution in #406
- @Isotr0py made their first contribution in #410
- @suparious made their first contribution in #449
- @laoda513 made their first contribution in #460
Full Changelog: v0.2.4...v0.2.5