[Quantization] Quanto quantizer #29023

SunMarc · 2024-02-14T21:38:50Z

What does this PR do ?

This PR adds the quantization methods from quanto library. We will support inference + model quantization if the user perform weights only quantization since we don't require a calibration dataset.

TODO:

docs
tests
Guard against saving now since it is a bit complicated

HuggingFaceDocBuilderDev · 2024-02-14T22:08:36Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

younesbelkada

Clean work already ! Looking forward to merge the PR !

younesbelkada · 2024-02-15T04:54:44Z

cc @dacorvo

src/transformers/quantizers/quantizer_quanto.py

src/transformers/utils/quantization_config.py

src/transformers/quantizers/quantizer_quanto.py

SunMarc · 2024-02-22T23:35:52Z

Quick update, all tests are passing with the exception of safetensors tests. I've also implemented the quantization on the spot just like we do for bnb quantization. However, if the user wants to use cpu/disk offload, he will need to install the main branch of accelerate for now because of this PR.

ArthurZucker · 2024-03-06T03:32:39Z

LGTM, mostly concerned with quanto specific function addition, that should leave in quanto.py not in the modeling (as much as possible of course).

Co-authored-by: Arthur <[email protected]>

ArthurZucker

Thanks for iterating! 🔥 let's make quanto go brrr

ArthurZucker · 2024-03-08T01:09:51Z

src/transformers/modeling_utils.py

+        if hf_quantizer is not None:
+            missing_keys = hf_quantizer.update_missing_keys(model, missing_keys, prefix)


Alright no worries 🤗

tests/quantization/quanto_integration/test_quanto.py

Co-authored-by: Arthur <[email protected]>

…formers into quanto_integration

younesbelkada

Thanks again @SunMarc - great work ! 🚀 let's 🚢 it

SunMarc · 2024-03-14T15:15:04Z

Note to the core mainteners and especially @ArthurZucker . I reverted a lot of changes I did for the serialization. I decided to postpone the serialization feature by setting the is_serializable property to False in this PR.

I will do a follow up PR for the serialization since the PR to make it compatible with safetensors or weight_only saving in quanto needs lots of changes in the way we load + save models in transformers.

Without serialization, this PR should still be good to merge !

…formers into quanto_integration

* start integration * fix * add and debug tests * update tests * make pytorch serialization works * compatible with device_map and offload * fix tests * make style * add ref * guard against safetensors * add float8 and style * fix is_serializable * Fix shard_checkpoint compatibility with quanto * more tests * docs * adjust memory * better * style * pass tests * Update src/transformers/modeling_utils.py Co-authored-by: Younes Belkada <[email protected]> * add is_safe_serialization instead * Update src/transformers/quantizers/quantizer_quanto.py Co-authored-by: Younes Belkada <[email protected]> * add QbitsTensor tests * fix tests * simplify activation list * Update docs/source/en/quantization.md Co-authored-by: David Corvoysier <[email protected]> * better comment * Update tests/quantization/quanto_integration/test_quanto.py Co-authored-by: David Corvoysier <[email protected]> * Update tests/quantization/quanto_integration/test_quanto.py Co-authored-by: David Corvoysier <[email protected]> * find and fix edge case * Update docs/source/en/quantization.md Co-authored-by: Arthur <[email protected]> * pass weights_only_kwarg instead * fix shard_checkpoint loading * simplify update_missing_keys * Update tests/quantization/quanto_integration/test_quanto.py Co-authored-by: Arthur <[email protected]> * recursion to get all tensors * block serialization * skip serialization tests * fix * change by cuda:0 for now * fix regression * update device_map * fix doc * add noteboon * update torch_dtype * update doc * typo * typo * remove comm --------- Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: David Corvoysier <[email protected]> Co-authored-by: Arthur <[email protected]> Co-authored-by: Younes Belkada <[email protected]>

start integration

ba4c2b9

SunMarc changed the title ~~[Quantization] Quanto~~ [Quantization] Quanto quantizer Feb 14, 2024

fix

dc88d4f

younesbelkada reviewed Feb 15, 2024

View reviewed changes

add and debug tests

ee1ee85

dacorvo reviewed Feb 16, 2024

View reviewed changes

src/transformers/quantizers/quantizer_quanto.py Show resolved Hide resolved

src/transformers/utils/quantization_config.py Outdated Show resolved Hide resolved

src/transformers/quantizers/quantizer_quanto.py Outdated Show resolved Hide resolved

SunMarc added 3 commits February 20, 2024 20:48

update tests

4c50c4d

make pytorch serialization works

97951ab

compatible with device_map and offload

c8436ca

younesbelkada mentioned this pull request Feb 22, 2024

OSError: You are trying to access a gated repo. #29177

Closed

2 tasks

SunMarc added 3 commits February 22, 2024 23:37

fix tests

9d15653

Merge remote-tracking branch 'upstream/main' into quanto_integration

194a58a

make style

d1ccb23

SunMarc added 8 commits February 23, 2024 00:37

add ref

b0e8adb

Merge remote-tracking branch 'upstream/main' into quanto_integration

6193289

guard against safetensors

b550157

add float8 and style

29eee50

fix is_serializable

26fe440

Fix shard_checkpoint compatibility with quanto

6d4ab4c

more tests

daaeb91

docs

565e699

SunMarc marked this pull request as ready for review February 28, 2024 23:05

SunMarc added 2 commits March 1, 2024 00:10

adjust memory

56ba706

better

9329a07

SunMarc requested review from younesbelkada and dacorvo March 1, 2024 16:40

style

9da4d0b

SunMarc and others added 4 commits March 6, 2024 09:59

Update docs/source/en/quantization.md

850f5e4

Co-authored-by: Arthur <[email protected]>

pass weights_only_kwarg instead

5fc659c

fix shard_checkpoint loading

15f7a2a

simplify update_missing_keys

bf5f7e6

SunMarc requested a review from ArthurZucker March 7, 2024 19:52

Merge remote-tracking branch 'upstream/main' into quanto_integration

c52b6c1

ArthurZucker approved these changes Mar 8, 2024

View reviewed changes

SunMarc and others added 8 commits March 8, 2024 11:01

Update tests/quantization/quanto_integration/test_quanto.py

ad012e0

Co-authored-by: Arthur <[email protected]>

recursion to get all tensors

3419a3c

Merge branch 'quanto_integration' of https://github.com/SunMarc/trans…

bb7c226

…formers into quanto_integration

block serialization

a1b3c18

skip serialization tests

0030d0a

fix

6d1bce3

change by cuda:0 for now

e677a53

fix regression

e005baf

younesbelkada approved these changes Mar 14, 2024

View reviewed changes

Merge remote-tracking branch 'upstream/main' into quanto_integration

dc8547d

SunMarc and others added 9 commits March 14, 2024 16:29

update device_map

229e439

fix doc

8f5c9f7

add noteboon

d4cc911

update torch_dtype

95f05a4

Merge branch 'quanto_integration' of https://github.com/SunMarc/trans…

058937c

…formers into quanto_integration

update doc

5bfa654

typo

b0b79f0

typo

e389cd9

remove comm

46aae3f

SunMarc merged commit 28de2f4 into huggingface:main Mar 15, 2024
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Quantization] Quanto quantizer #29023

[Quantization] Quanto quantizer #29023

SunMarc commented Feb 14, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 14, 2024

younesbelkada left a comment

younesbelkada commented Feb 15, 2024

SunMarc commented Feb 22, 2024

ArthurZucker commented Mar 6, 2024

ArthurZucker left a comment

ArthurZucker Mar 8, 2024

younesbelkada left a comment

SunMarc commented Mar 14, 2024 •

edited

Loading

		if hf_quantizer is not None:
		missing_keys = hf_quantizer.update_missing_keys(model, missing_keys, prefix)

[Quantization] Quanto quantizer #29023

[Quantization] Quanto quantizer #29023

Conversation

SunMarc commented Feb 14, 2024 • edited Loading

What does this PR do ?

HuggingFaceDocBuilderDev commented Feb 14, 2024

younesbelkada left a comment

Choose a reason for hiding this comment

younesbelkada commented Feb 15, 2024

SunMarc commented Feb 22, 2024

ArthurZucker commented Mar 6, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Mar 8, 2024

Choose a reason for hiding this comment

younesbelkada left a comment

Choose a reason for hiding this comment

SunMarc commented Mar 14, 2024 • edited Loading

SunMarc commented Feb 14, 2024 •

edited

Loading

SunMarc commented Mar 14, 2024 •

edited

Loading