Releases · casper-hansen/AutoAWQ

23 Jul 17:17

github-actions

v0.2.6

e683bfd

v0.2.6 Latest

Latest

What's Changed

Cohere Support by @TechxGenus in #457
Add phi3 support by @pprp in #481
Support Weight-Only quantization on CPU device with QBits backend by @PenghuiCheng in #437
Fix typo by @wanyaworld in #486
Add updates + sponsorship by @casper-hansen in #495
Update README.md by @casper-hansen in #497
Update doc by @imba-tjd in #499
add support for Openbmb/MiniCPM by @LDLINGLINGLING in #504
Update RunPod support by @casper-hansen in #514
add deepseek v2 support by @TechxGenus in #508
nan problem of Qwen2-72B quantization by @baoyf4244 in #519
Qwen nan fix by @baoyf4244 in #522
fix deepseek v2 input feat by @TechxGenus in #524
Batched quantization by @casper-hansen in #516
Fix step size when computing clipping by @casper-hansen in #531
Pin torch version to 2.3.1 by @devin-ai-integration in #542
Revert "Pin torch version to 2.3.1 (#542)" by @casper-hansen in #547
CLI example + Runpod launch script by @casper-hansen in #548
Print warning if AutoAWQ cannot load extensions by @casper-hansen in #515
Remove progress bars by @casper-hansen in #550
Add test for chunked methods by @casper-hansen in #551
Llama with inputs_embeds only(LLava-v1.5 bug fixed) and Llava-v1.6 Support by @WanBenLe in #471
Better CLI + RunPod Script by @casper-hansen in #552
Release 026 by @casper-hansen in #546
pin torch==2.3.1 by @casper-hansen in #554
Remove ROCm build and only build for PyPi by @casper-hansen in #555

New Contributors

@pprp made their first contribution in #481
@PenghuiCheng made their first contribution in #437
@wanyaworld made their first contribution in #486
@imba-tjd made their first contribution in #499
@LDLINGLINGLING made their first contribution in #504
@baoyf4244 made their first contribution in #519
@devin-ai-integration made their first contribution in #542
@WanBenLe made their first contribution in #471

Full Changelog: v0.2.5...v0.2.6

Contributors

wanyaworld, baoyf4244, and 7 other contributors

Assets 10

02 May 18:23

github-actions

v0.2.5

5f3785d

v0.2.5

What's Changed

Fix fused models for tf >= 4.39 by @TechxGenus in #418
FIX: Add safe guards for static cache + llama on transformers latest by @younesbelkada in #401
Pin: lm_eval==0.4.1 by @casper-hansen in #426
Implement apply_clip argument to quantize() by @casper-hansen in #427
Workaround: illegal memory access by @casper-hansen in #421
Add download_kwargs for load model (#302) by @Roshiago in #399
add starcoder2 support by @shaonianyr in #406
Add StableLM support by @Isotr0py in #410
Fix starcoder2 fused norm by @TechxGenus in #442
Update generate example to llama 3 by @casper-hansen in #448
[BUG] Fix github action documentation build by @suparious in #449
Fix path by @casper-hansen in #451
FIX: 'awq_ext' is not defined error by @younesbelkada in #465
FIX: Fix multiple generations for new HF cache format by @younesbelkada in #444
support max_memory to specify mem usage for each GPU by @laoda513 in #460
Bump to 0.2.5 by @casper-hansen in #468

New Contributors

@Roshiago made their first contribution in #399
@shaonianyr made their first contribution in #406
@Isotr0py made their first contribution in #410
@suparious made their first contribution in #449
@laoda513 made their first contribution in #460

Full Changelog: v0.2.4...v0.2.5

Contributors

suparious, Roshiago, and 6 other contributors

Assets 26

24 Mar 11:28

github-actions

v0.2.4

0fa9a2c

v0.2.4

What's Changed

Add Gemma Support by @TechxGenus in #393
Pin transformers>=4.35.0,<=4.38.2 by @casper-hansen in #408
Bump to v0.2.4 by @casper-hansen in #409

New Contributors

@TechxGenus made their first contribution in #393

Full Changelog: v0.2.3...v0.2.4

Contributors

casper-hansen and TechxGenus

Assets 26

02 Mar 10:13

github-actions

v0.2.3

d8ca1e2

v0.2.3

What's Changed

New optimized kernels by @casper-hansen in #365
Fix double bias by @casper-hansen in #383
x_max -> x_mean and w_max -> w_mean name changes and some comments by @OscarSavolainenDR in #378

New Contributors

@OscarSavolainenDR made their first contribution in #378

Full Changelog: v0.2.2...v0.2.3

Contributors

casper-hansen and OscarSavolainenDR

Assets 26

17 Feb 10:38

github-actions

v0.2.2

6b7992a

v0.2.2

What's Changed

Support Fused Mixtral on multi-GPU by @casper-hansen in #352
Add multi-GPU benchmark of Mixtral by @casper-hansen in #353
Remove MoE Triton kernels by @casper-hansen in #355
Bump to 0.2.2 by @casper-hansen in #356

Full Changelog: v0.2.1...v0.2.2

Contributors

casper-hansen

Assets 26

16 Feb 08:59

github-actions

v0.2.1

7405310

v0.2.1

What's Changed

Avoid downloading ROCm by @casper-hansen in #347
ENH / FIX: Few enhancements and fix for mixed-precision training by @younesbelkada in #348
Fix triton dependency by @casper-hansen in #350
Bump to 0.2.1 by @casper-hansen in #351

Full Changelog: v0.2.0...v0.2.1

Contributors

casper-hansen and younesbelkada

Assets 26

15 Feb 20:57

github-actions

v0.2.0

d831102

v0.2.0

What's Changed

AWQ: Separate the AWQ kernels to separate repository by @casper-hansen in #279
Add CPU-loaded multi-GPU quantization by @xNul in #289
GGUF compatible quantization (2, 3, 4 bit / any bit) by @casper-hansen in #285
Exllama kernels support by @IlyasMoutawwakil in #313
Cleanup requirements by @casper-hansen in #295
Torch only inference + any-device quantization by @casper-hansen in #319
Up to 60% faster context processing by @casper-hansen in #316
Evaluation: Add more evals by @casper-hansen in #283
Fixes a breaking change in autoawq by @younesbelkada in #325
AMD ROCM Support by @IlyasMoutawwakil in #315
Marlin symmetric quantization and inference by @IlyasMoutawwakil in #320
Add qwen2 by @JustinLin610 in #321
Fix n_samples by @casper-hansen in #326
PEFT compatible GEMM by @casper-hansen in #324
[PEFT] Fix PEFT batch size > 1 by @younesbelkada in #338
v0.2.0 by @casper-hansen in #330
Fix ROCm build by @casper-hansen in #342
Fix dependency by @casper-hansen in #343
Fix importlib by @casper-hansen in #344
Fix workflow by @casper-hansen in #345
Fix typo in setup.py by @casper-hansen in #346

New Contributors

@xNul made their first contribution in #289
@IlyasMoutawwakil made their first contribution in #313
@JustinLin610 made their first contribution in #321

Full Changelog: v0.1.8...v0.2.0

Contributors

xNul, casper-hansen, and 3 other contributors

Assets 26

23 Dec 16:46

github-actions

v0.1.8

3f10cf1

v0.1.8

What's Changed

Fix MPT by @casper-hansen in #206
Add config to Base model by @casper-hansen in #207
Add Qwen model by @Sanster in #182
Robust quantization for Catcher by @casper-hansen in #209
New scaling to improve perplexity by @casper-hansen in #216
Benchmark hf generate by @casper-hansen in #237
Fix position ids by @casper-hansen in #215
Pass model_init_kwargs to check_and_get_model_type function by @rycont in #232
Fixed an issue where the Qwen model had too much error after quantization by @jundolc in #243
Load on CPU to avoid OOM by @casper-hansen in #236
Update README.md by @casper-hansen in #245
[core] Make AutoAWQ fused modules compatible with HF transformers by @younesbelkada in #244
[core] Fix quantization issues with transformers==4.36.0 by @younesbelkada in #249
FEAT: Add possibility of skipping modules when quantizing by @younesbelkada in #248
Fix quantization issue with transformers >= 4.36.0 by @younesbelkada in #264
Mixtral: Mixture of Experts quantization by @casper-hansen in #251
Fused rope theta by @casper-hansen in #270
FEAT: add llava to autoawq by @younesbelkada in #250
Add Baichuan2 Support by @AoyuQC in #247
Set default rope_theta on LlamaLikeBlock by @casper-hansen in #271
Update news and models supported by @casper-hansen in #272
Add vLLM async example by @casper-hansen in #273
Bump to v0.1.8 by @casper-hansen in #274

New Contributors

@Sanster made their first contribution in #182
@rycont made their first contribution in #232
@jundolc made their first contribution in #243
@AoyuQC made their first contribution in #247

Full Changelog: v0.1.7...v0.1.8

Contributors

Sanster, AoyuQC, and 4 other contributors

Assets 18

16 Nov 19:04

github-actions

v0.1.7

87350fe

v0.1.7

What's Changed

Build older cuda wheels by @casper-hansen in #158
Exclude download of CUDA wheels by @casper-hansen in #159
New benchmarks in README by @casper-hansen in #160
Fix typo in benchmark command by @casper-hansen in #161
Yi support by @casper-hansen in #167
Make sure to delete dummy model by @casper-hansen in #180
Fix CUDA error: invalid argument by @casper-hansen in #179
New logic for passing past_key_value by @younesbelkada in #177
Reset cache on new generation by @casper-hansen in #178
Adaptive batch sizing by @casper-hansen in #181
Pass arguments to AutoConfig by @s4rduk4r in #97
Fix cache util logic by @casper-hansen in #186
Fix multi-GPU loading and inference by @casper-hansen in #190
[core] Replace QuantLlamaMLP with QuantFusedMLP by @younesbelkada in #188
[core] Add is_hf_transformers flag by @younesbelkada in #195
Fixed multi-GPU quantization by @casper-hansen in #196

Full Changelog: v0.1.6...v0.1.7

Contributors

s4rduk4r, casper-hansen, and younesbelkada

Assets 18

04 Nov 12:46

github-actions

v0.1.6

8110e02

v0.1.6

What's Changed

Pseudo dequantize function by @casper-hansen in #127
CUDA 11.8.0 and 12.1.1 build by @casper-hansen in #128
AwqConfig class by @casper-hansen in #132
Fix init quant by @casper-hansen in #136
Update readme by @casper-hansen in #137
Benchmark info by @casper-hansen in #138
Bump to v0.1.6 by @casper-hansen in #139
CUDA 12 release by @casper-hansen in #140
Revert to previous version by @casper-hansen in #141
Fix performance regression by @casper-hansen in #148
[core / attention] Fix fused attention generation with newest transformers version by @younesbelkada in #146
Fix condition when rolling cache by @casper-hansen in #150
Default to safetensors for quantized models by @casper-hansen in #151
Create fused LlamaLikeModel by @casper-hansen in #152

Full Changelog: v0.1.5...v0.1.6

Contributors

casper-hansen and younesbelkada

Assets 18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

Releases: casper-hansen/AutoAWQ

v0.2.6

What's Changed

New Contributors

Contributors

v0.2.5

What's Changed

New Contributors

Contributors

v0.2.4

What's Changed

New Contributors

Contributors

v0.2.3

What's Changed

New Contributors

Contributors

v0.2.2

What's Changed

Contributors

v0.2.1

What's Changed

Contributors

v0.2.0

What's Changed

New Contributors

Contributors

v0.1.8

What's Changed

New Contributors

Contributors

v0.1.7

What's Changed

Contributors

v0.1.6

What's Changed

Contributors