Batched quantization #516

casper-hansen · 2024-06-21T14:03:02Z

TODO:

Add documentation with custom dataset, e.g. Cosmopedia filtering for long samples only.

Features:

Split samples into batches of n_parallel_calib_samples
Allow any sample length (long context)

Fixes:

If you modify n_samples to be higher than 128, OOM is extremely likely because all samples are currently run through the model at the same time. Closes Expose calibration dataset arguments #517 Fixes bug with block_size and exposes n_samples and block_size to the user #493
scales being NaN or inf in some cases. Resolves qwen2-72B can not be quantized by autoawq #498 when quantize qwen2 by autoawq, it not works successful. #500 Support Qwen2 72 Awq quantization？ #509.

awq/utils/calib_data.py

WallE-Chang

OOM often happens here. Maybe you need move partial_output to cpu by module_output.append(partial_output.detach.cpu())

awq/quantize/quantizer.py

RanchiZhao · 2024-06-28T07:29:43Z

Mark here! looking forward to the new feature!

casper-hansen added 8 commits April 6, 2024 17:35

Batched quantization

fd3a9d4

Merge branch 'main' into batched_quantization

6ff2b64

Enable long context calibration data

e53840d

Fix reference bug

6875a25

Fix not selecting right output

1e89762

Add max_calib_samples and max_calib_seq_len

08d75cc

Potentially handle overflow

bc37fda

Improve naming and docs

edefcab

casper-hansen mentioned this pull request Jun 21, 2024

qwen2-72B can not be quantized by autoawq #498

Closed

casper-hansen added 3 commits June 21, 2024 20:47

Chunked loss computation

a0b3ac7

Long-context doc on Cosmopedia

0006e98

Chunk per-channel mean

af92421

attafosu reviewed Jun 23, 2024

View reviewed changes

awq/utils/calib_data.py Show resolved Hide resolved

baoyf4244 mentioned this pull request Jun 24, 2024

nan problem of Qwen2-72B quantization #519

Merged

Merge branch 'main' into batched_quantization

0565748

WallE-Chang reviewed Jun 25, 2024

View reviewed changes

awq/quantize/quantizer.py Show resolved Hide resolved

TechxGenus mentioned this pull request Jun 27, 2024

add deepseek v2 support #508

Merged

casper-hansen added 9 commits June 30, 2024 10:32

Optimize UX with progress bars

4b68287

Update long-context example

6dfdfbb

Expose max_chunk_memory parameter

ca00573

apply code formatting

77401c0

Fix overflow: per-channel mean in FP32

73a96f0

Merge branch 'main' into batched_quantization

987bcfc

Coding example

1f66471

Optimize quantization speed. Fix OOM due to inp -> inp_flat.

8111285

Update coding example

7ad5ac9

casper-hansen merged commit c025b15 into main Jul 2, 2024

This was referenced Jul 2, 2024

Fixes bug with block_size and exposes n_samples and block_size to the user #493

Closed

when quantize qwen2 by autoawq, it not works successful. #500

Closed

casper-hansen mentioned this pull request Jul 2, 2024

Support Qwen2 72 Awq quantization？ #509

Closed

casper-hansen deleted the batched_quantization branch July 26, 2024 18:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batched quantization #516

Batched quantization #516

casper-hansen commented Jun 21, 2024 •

edited

Loading

WallE-Chang left a comment

RanchiZhao commented Jun 28, 2024

Batched quantization #516

Batched quantization #516

Conversation

casper-hansen commented Jun 21, 2024 • edited Loading

WallE-Chang left a comment

Choose a reason for hiding this comment

RanchiZhao commented Jun 28, 2024

casper-hansen commented Jun 21, 2024 •

edited

Loading