[FEAT]: EETQ quantizer support #30262

dtlzhuangz · 2024-04-16T07:24:02Z

What does this PR do?

EETQ supports int8 per-channel weight-only quantization for NVIDIA GPUS. The high-performance GEMM and GEMV kernels are from FasterTransformer and TensorRT-LLM. It requires no calibration dataset and does not need to pre-quantize your model. Moreover, the accuracy degradation is negligible owing to the per-channel quantization.
NetEase-FuXi/EETQ#13

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Fixes NetEase-FuXi/EETQ#13

dtlzhuangz · 2024-04-16T07:27:17Z

@younesbelkada
Please review the code and document to see if there is anything inappropriate.

SunMarc

Awesome PR @dtlzhuangz ! EETQ library is written in such way that the integration is very smooth. We can quantize on the fly, serialize the quantized model and even reload it with minimal changes in transformers 🔥 I left a few minor comments. Make sure to fix the style with make style.

docs/source/en/main_classes/quantization.md

SunMarc · 2024-04-16T10:23:33Z

docs/source/en/quantization.md

+Make sure you have eetq installed via the source code https://github.com/NetEase-FuXi/EETQ
+```
+git clone https://github.com/NetEase-FuXi/EETQ.git
+cd EETQ/
+git submodule update --init --recursive
+pip install .
+```


Is there a plan to release EETQ on pypi ?

docs/source/en/quantization.md

src/transformers/integrations/__init__.py

tests/quantization/eetq_integration/test_eetq.py

src/transformers/quantizers/quantizer_eetq.py

src/transformers/quantizers/auto.py

younesbelkada

Thanks so much for this great work ! In addition to @SunMarc 's comments I have tiny additional comments
1- Can you add pip install git+https://github.com/NetEase-FuXi/EETQ.git inside the quantization docker file here: https://github.com/huggingface/transformers/blob/main/docker/transformers-quantization-latest-gpu/Dockerfile
2- Can you elaborate on the hardware restrictions in the documentation section? (i.e. if it works only from cuda compute capability 8.0 and above, or also 7.0 etc)
3- Yes let's use camel case for the newly introduced files (EETQ --> Eetq)
4- Can you make sure the styling checks pass make fixup to make the CI happy?
Thanks again and looking forward to merging this !

SunMarc

Hi @dtlzhuangz, thanks for the fast response. I've answered your questions. After fixing the issues pointed by @younesbelkada, we will ask a core maintainer for a final review.

dtlzhuangz · 2024-04-17T12:52:27Z

Hi @dtlzhuangz, thanks for the fast response. I've answered your questions. After fixing the issues pointed by @younesbelkada, we will ask a core maintainer for a final review.

Sorry, could you help me fix the error of 'Import block is un-sorted or un-formatted'? I'm not quite familiar with the CI.

SunMarc · 2024-04-17T13:54:52Z

Yes, I took care of that. You just needed to do make style. I will try to run the test on my setup to see if everything works !

HuggingFaceDocBuilderDev · 2024-04-17T13:55:15Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

dtlzhuangz · 2024-04-17T13:59:08Z

Yes, I took care of that. You just needed to do make style. I will try to run the test on my setup to see if everything works !

Thank you so much for your guidance and effort!

younesbelkada

Very smooth integration ! Thanks for delivering this to the community ! LGTM with only two nits

docs/source/en/quantization.md

src/transformers/integrations/eetq.py

younesbelkada · 2024-04-17T14:23:25Z

src/transformers/utils/quantization_config.py

+        r"""
+        Safety checker that arguments are correct
+        """
+        accepted_weights = ["int8"]


Out of curiosity: is there any plans to support 4-bit group-wise quantization as well ?

Sorry, no at the moment.

ok no worries!

SunMarc

I was able to build the dockerfile and the tests are passing 🔥 Thanks again @dtlzhuangz for the clean PR. Do you any plan to release the package on pypi ? Installing from source is not ideal since it takes quite a lot of time to build the wheels + users are subject to breaking changes since there is no release yet.

dtlzhuangz · 2024-04-18T05:49:26Z

I was able to build the dockerfile and the tests are passing 🔥 Thanks again @dtlzhuangz for the clean PR. Do you any plan to release the package on pypi ? Installing from source is not ideal since it takes quite a lot of time to build the wheels + users are subject to breaking changes since there is no release yet.

Sorry for replying to the question late. My colleague and I are setting out to do it but the built files depend on the version of torch. Error occurs if the version mismatches. If there is no solution, we have to install a specific version of torch when installing EETQ

dtlzhuangz · 2024-04-19T07:14:58Z

Hi @SunMarc @younesbelkada @amyeroberts, we have released the .whl in the release page and updated the document. Please make a review. Thanks!

younesbelkada

Great work thanks! I will let @amyeroberts make a final review and merge it if all is good ! Thanks again for all your great work ! @dtlzhuangz

amyeroberts

Thanks for adding!

Just a few small comments to address

docs/source/en/quantization.md

amyeroberts · 2024-04-19T16:00:36Z

src/transformers/integrations/eetq.py

+    modules_to_not_convert = ["lm_head"] if modules_to_not_convert is None else modules_to_not_convert
+
+    if quantization_config.modules_to_not_convert is not None:
+        modules_to_not_convert.extend(quantization_config.modules_to_not_convert)


We might want to use sets here - otherwise we can end up with duplicate modules added

I have added modules_to_not_convert = list(set(modules_to_not_convert))

amyeroberts · 2024-04-19T16:04:20Z

src/transformers/integrations/eetq.py

+        if current_key_name is None:
+            current_key_name = []


This should go outside of the for-loop, we only need to check it for noneness once

Indeed. It has been done.

amyeroberts · 2024-04-19T16:08:38Z

tests/quantization/eetq_integration/test_eetq.py

+    def test_raise_if_non_quantized(self):
+        model_id = "facebook/opt-125m"
+        quantization_config = EetqConfig()
+        _ = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", quantization_config=quantization_config)


This doesn't test any error is raised here

I have removed it.

amyeroberts · 2024-04-19T16:43:44Z

src/transformers/quantizers/quantizer_eetq.py

+        if torch_dtype is None:
+            torch_dtype = torch.float16


+1 logger.info message should be added here

dtlzhuangz · 2024-04-21T03:10:27Z

Hi @amyeroberts . I have fixed all the comments. I think ci errors should not be because of me, the errors occured after I modified the quantization.md. Please make a check.

amyeroberts · 2024-04-22T11:36:57Z

@dtlzhuangz Regarding the failing tests - a fix has been merged into main. Could you rebase?

Co-authored-by: Marc Sun <[email protected]>

Co-authored-by: Younes Belkada <[email protected]>

Co-authored-by: amyeroberts <[email protected]>

younesbelkada · 2024-04-22T18:19:47Z

Re-ran the testing suite and tests seem to pass now ! 🤞

amyeroberts

Thanks for adding this and iterating!

dtlzhuangz · 2024-04-23T01:57:24Z

Thank you all for your help! @SunMarc @amyeroberts @younesbelkada

younesbelkada · 2024-04-23T07:57:43Z

Great work thanks everyone involved in this !

* [FEAT]: EETQ quantizer support * Update quantization.md * Update docs/source/en/main_classes/quantization.md Co-authored-by: Marc Sun <[email protected]> * Update docs/source/en/quantization.md Co-authored-by: Marc Sun <[email protected]> * Update docs/source/en/quantization.md Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/integrations/__init__.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/integrations/__init__.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/integrations/eetq.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/integrations/eetq.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/integrations/eetq.py Co-authored-by: Marc Sun <[email protected]> * Update tests/quantization/eetq_integration/test_eetq.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/quantizers/auto.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/quantizers/auto.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/quantizers/auto.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/quantizers/quantizer_eetq.py Co-authored-by: Marc Sun <[email protected]> * Update tests/quantization/eetq_integration/test_eetq.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/quantizers/quantizer_eetq.py Co-authored-by: Marc Sun <[email protected]> * Update tests/quantization/eetq_integration/test_eetq.py Co-authored-by: Marc Sun <[email protected]> * Update tests/quantization/eetq_integration/test_eetq.py Co-authored-by: Marc Sun <[email protected]> * [FEAT]: EETQ quantizer support * [FEAT]: EETQ quantizer support * remove whitespaces * update quantization.md * style * Update docs/source/en/quantization.md Co-authored-by: Younes Belkada <[email protected]> * add copyright * Update quantization.md * Update docs/source/en/quantization.md Co-authored-by: amyeroberts <[email protected]> * Update docs/source/en/quantization.md Co-authored-by: amyeroberts <[email protected]> * Address the comments by amyeroberts * style --------- Co-authored-by: Marc Sun <[email protected]> Co-authored-by: Marc Sun <[email protected]> Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: amyeroberts <[email protected]>

SunMarc approved these changes Apr 16, 2024

View reviewed changes

SunMarc requested review from younesbelkada and amyeroberts April 16, 2024 12:37

younesbelkada reviewed Apr 16, 2024

View reviewed changes

SunMarc reviewed Apr 17, 2024

View reviewed changes

younesbelkada approved these changes Apr 17, 2024

View reviewed changes

SunMarc approved these changes Apr 17, 2024

View reviewed changes

younesbelkada approved these changes Apr 19, 2024

View reviewed changes

amyeroberts reviewed Apr 19, 2024

View reviewed changes

dtlzhuangz force-pushed the main branch from 20ee0df to 65db13f Compare April 22, 2024 12:22

dtlzhuangz and others added 11 commits April 22, 2024 13:24

[FEAT]: EETQ quantizer support

afaa67e

Update quantization.md

ad1ca50

Update docs/source/en/main_classes/quantization.md

bf21f20

Co-authored-by: Marc Sun <[email protected]>

Update docs/source/en/quantization.md

cff6f4a

Co-authored-by: Marc Sun <[email protected]>

Update docs/source/en/quantization.md

d794cba

Co-authored-by: Marc Sun <[email protected]>

Update src/transformers/integrations/__init__.py

89cc413

Co-authored-by: Marc Sun <[email protected]>

Update src/transformers/integrations/__init__.py

5ad5bca

Co-authored-by: Marc Sun <[email protected]>

Update src/transformers/integrations/eetq.py

7a75125

Co-authored-by: Marc Sun <[email protected]>

Update src/transformers/integrations/eetq.py

42c6083

Co-authored-by: Marc Sun <[email protected]>

Update src/transformers/integrations/eetq.py

a8699cc

Co-authored-by: Marc Sun <[email protected]>

Update tests/quantization/eetq_integration/test_eetq.py

ce88601

Co-authored-by: Marc Sun <[email protected]>

dtlzhuangz and others added 20 commits April 22, 2024 13:24

Update src/transformers/quantizers/auto.py

4b9e506

Co-authored-by: Marc Sun <[email protected]>

Update src/transformers/quantizers/auto.py

5fb8cce

Co-authored-by: Marc Sun <[email protected]>

Update src/transformers/quantizers/auto.py

4844a1a

Co-authored-by: Marc Sun <[email protected]>

Update src/transformers/quantizers/quantizer_eetq.py

061379d

Co-authored-by: Marc Sun <[email protected]>

Update tests/quantization/eetq_integration/test_eetq.py

1e4b8f2

Co-authored-by: Marc Sun <[email protected]>

Update src/transformers/quantizers/quantizer_eetq.py

7887811

Co-authored-by: Marc Sun <[email protected]>

Update tests/quantization/eetq_integration/test_eetq.py

6a9bd46

Co-authored-by: Marc Sun <[email protected]>

Update tests/quantization/eetq_integration/test_eetq.py

02a1d3d

Co-authored-by: Marc Sun <[email protected]>

[FEAT]: EETQ quantizer support

f2b79b9

[FEAT]: EETQ quantizer support

ab7f88f

remove whitespaces

d003872

update quantization.md

6f87c22

style

3d9eb34

Update docs/source/en/quantization.md

15b4ee7

Co-authored-by: Younes Belkada <[email protected]>

add copyright

22b1027

Update quantization.md

115705b

Update docs/source/en/quantization.md

9f275af

Co-authored-by: amyeroberts <[email protected]>

Update docs/source/en/quantization.md

9108dd9

Co-authored-by: amyeroberts <[email protected]>

Address the comments by amyeroberts

c8fb808

style

2306f84

dtlzhuangz force-pushed the main branch from 65db13f to 2306f84 Compare April 22, 2024 13:28

younesbelkada requested a review from amyeroberts April 22, 2024 18:19

amyeroberts approved these changes Apr 22, 2024

View reviewed changes

amyeroberts merged commit b4c18a8 into huggingface:main Apr 22, 2024
22 checks passed

younesbelkada mentioned this pull request Apr 24, 2024

FEAT: Add EETQ support in PEFT huggingface/peft#1675

Merged

2 tasks

SunMarc mentioned this pull request May 21, 2024

Integration with Hugging Face transformers library turboderp/exllamav2#461

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT]: EETQ quantizer support #30262

[FEAT]: EETQ quantizer support #30262

dtlzhuangz commented Apr 16, 2024 •

edited by younesbelkada

Loading

dtlzhuangz commented Apr 16, 2024 •

edited

Loading

SunMarc left a comment •

edited

Loading

SunMarc Apr 16, 2024 •

edited

Loading

younesbelkada left a comment •

edited

Loading

SunMarc left a comment

dtlzhuangz commented Apr 17, 2024

SunMarc commented Apr 17, 2024

HuggingFaceDocBuilderDev commented Apr 17, 2024

dtlzhuangz commented Apr 17, 2024

younesbelkada left a comment

younesbelkada Apr 17, 2024

dtlzhuangz Apr 18, 2024

younesbelkada Apr 18, 2024

SunMarc left a comment

dtlzhuangz commented Apr 18, 2024 •

edited

Loading

dtlzhuangz commented Apr 19, 2024 •

edited

Loading

younesbelkada left a comment

amyeroberts left a comment

amyeroberts Apr 19, 2024

dtlzhuangz Apr 21, 2024

amyeroberts Apr 19, 2024

dtlzhuangz Apr 21, 2024

amyeroberts Apr 19, 2024

dtlzhuangz Apr 21, 2024

amyeroberts Apr 19, 2024

dtlzhuangz commented Apr 21, 2024 •

edited

Loading

amyeroberts commented Apr 22, 2024

younesbelkada commented Apr 22, 2024

amyeroberts left a comment

dtlzhuangz commented Apr 23, 2024

younesbelkada commented Apr 23, 2024

[FEAT]: EETQ quantizer support #30262

[FEAT]: EETQ quantizer support #30262

Conversation

dtlzhuangz commented Apr 16, 2024 • edited by younesbelkada Loading

What does this PR do?

Before submitting

Who can review?

dtlzhuangz commented Apr 16, 2024 • edited Loading

SunMarc left a comment • edited Loading

Choose a reason for hiding this comment

SunMarc Apr 16, 2024 • edited Loading

Choose a reason for hiding this comment

younesbelkada left a comment • edited Loading

Choose a reason for hiding this comment

SunMarc left a comment

Choose a reason for hiding this comment

dtlzhuangz commented Apr 17, 2024

SunMarc commented Apr 17, 2024

HuggingFaceDocBuilderDev commented Apr 17, 2024

dtlzhuangz commented Apr 17, 2024

younesbelkada left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SunMarc left a comment

Choose a reason for hiding this comment

dtlzhuangz commented Apr 18, 2024 • edited Loading

dtlzhuangz commented Apr 19, 2024 • edited Loading

younesbelkada left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dtlzhuangz commented Apr 21, 2024 • edited Loading

amyeroberts commented Apr 22, 2024

younesbelkada commented Apr 22, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

dtlzhuangz commented Apr 23, 2024

younesbelkada commented Apr 23, 2024

dtlzhuangz commented Apr 16, 2024 •

edited by younesbelkada

Loading

dtlzhuangz commented Apr 16, 2024 •

edited

Loading

SunMarc left a comment •

edited

Loading

SunMarc Apr 16, 2024 •

edited

Loading

younesbelkada left a comment •

edited

Loading

dtlzhuangz commented Apr 18, 2024 •

edited

Loading

dtlzhuangz commented Apr 19, 2024 •

edited

Loading

dtlzhuangz commented Apr 21, 2024 •

edited

Loading