Refactor `_quantized_linear` for better extensibility #634

jerryzh168 · 2024-08-08T01:47:45Z

Summary:
Some popular ops like linear will get a lot of implementations based on the different characteristics of input and weight, e.g. int8 act + int8 weight, int8 act + int4 weight etc. For AffineQuantizedTensor rigth now all of these implementations are added to the main body of the implementation of linear dispatch, this makes the code hard to read and extend.

We refactored the implementation for _quantized_linear op to take a list of (dispatch_condition, impl) and go through them one by one, this makes the body of _quantized_linear shorter and easier to maintain.

Alternatively we could also add more functionality to implements, e.g. add a secondary dispatch condition: implements(func, dispatch_condition), but that is a much more complicated discussion that we can delay for later. a few questions we need to think about are:

how do we allow people to override all implementations for a specific function? (used in autoquant)
how do we make sure the dispatch condition people registered are called at the right order? e.g. if we have static quant, weight only quant, activation activation quant implementation/conditions, static quant (two inputs quantized) should come before the others, this might mean we also have to introduce the concept of secondary dispatch key here
what happens to the dispatch table during inheritance? currently I think the dispatch table is just shared between parent and child, but if needed, we can make the dispatch table to be keyed on class as well and copy paste the dispatch table when a child class inherits it: cls._DISPATCH_TABLE[cls][func] = impl

Test Plan:
regression tests
python test/quantization/test_quant_api.py
python test/integration/test_integration.py

python tutorials/quantize_vit/run_vit_b_quant.py

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2024-08-08T01:47:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/634

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b4ee5c0 with merge base 433cd14 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerryzh168 · 2024-08-09T00:56:25Z

cc @kimishpatel we discussed about this a bit before, hopefully this makes extension outside of the repo easier

torchao/dtypes/affine_quantized_tensor.py

kimishpatel · 2024-08-11T23:34:04Z

I need to look at the PR in more detail but why are we not proposing "extending" AffineQuantizedTensor class itself as a better solution. This allows for various packing and other things be also encoded by a class that wants different dispatch behavior.

I there a reason why AffineQuantizedTensor class should take on this responsibility?

kimishpatel · 2024-08-11T23:37:37Z

torchao/dtypes/affine_quantized_tensor.py

+        func = torch.nn.functional.linear
+        args = (input_tensor, weight_tensor, bias)
+        kwargs = {}
+        if hasattr(cls, "_DISPATCH_TABLE_WITH_CONDITION"):


This feels like aqt expanding on its responsibility with some APIs whose need seems to have popped up now. I do feel AffineQuantizedTensor should not take on responsibility of dispatch at all, or if it does, it should just be dq-op-q. Rest should be by derived class.

I guess here you are asking users to register their favorite dispatch but I feel we are just implementing dispatcher now. I suspect subclass approach might be cleaner

AQT have to do dispatch because cpu/cuda or other devices code path should be optimized by default I think

but thinking about this dispatch table, it does feel very complicated. I'm planning to revert this part of the change, and just have this as a change to refactor _quantized_op function for now, we can discuss later about the support for lower bit kernels

AQT have to do dispatch because cpu/cuda or other devices code path should be optimized by default I think

That is exactly what expanding responsibility is honestly. And this is why we end up with trying to figure out how to extend to accommodate different dispatch needs within AQT itself.

jerryzh168 · 2024-08-13T00:25:51Z

I need to look at the PR in more detail but why are we not proposing "extending" AffineQuantizedTensor class itself as a better solution. This allows for various packing and other things be also encoded by a class that wants different dispatch behavior.

I there a reason why AffineQuantizedTensor class should take on this responsibility?

so packing can be natively supported in the tensor as an attribute right now, it feels more complicated to do subclass than just adding an attribute? it might help to see the impact on all use cases though.

also why AffineQuantizedTensor take on the responsibility of different dispatches? - this is just following what torch.Tensor is doing.

kimishpatel · 2024-08-13T03:40:23Z

this is just following what torch.Tensor is doing

Ok I need to understand this better perhaps

HDCharles · 2024-08-14T03:57:14Z

looks ok to me, I think with these changes (especially the last version) we may want to add a unit test to check that the expected kernels are being used in autoquant, rather than what would normally be there from the dispatch. May be a todo for if we try a more complicated dispatch setup again.

@implements

Summary: Some popular ops like linear will get a lot of implementations based on the different characteristics of input and weight, e.g. int8 act + int8 weight, int8 act + int4 weight etc. For `AffineQuantizedTensor` rigth now all of these implementations are added to the main body of the implementation of linear dispatch, this makes the code hard to read and extend. In this PR we supported a secondary dispatch condition check for `implements` function: ``` def disptch_condition(func, types, args, kwargs): ... @implements(torch.nn.functional.linear, dispatch_condition) def _(func, types, args, kwargs): # implementation for inputs that passes the dispatch_condition ... @implements(torch.nn.functional.linear) def _(func, types, args, kwargs): ... ``` Test Plan: regression tests python test/quantization/test_quant_api.py python test/integration/test_integration.py python tutorials/quantize_vit/run_vit_b_quant.py Reviewers: Subscribers: Tasks: Tags:

jerryzh168 · 2024-08-14T19:47:17Z

looks ok to me, I think with these changes (especially the last version) we may want to add a unit test to check that the expected kernels are being used in autoquant, rather than what would normally be there from the dispatch. May be a todo for if we try a more complicated dispatch setup again.

do we already have the check here:

ao/test/integration/test_integration.py

Line 934 in acfb0da

self.assertTrue("mixed_mm" in code, f"got code: {code}")

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 8, 2024

jerryzh168 requested review from andrewor14, msaroufim, jcaip and HDCharles August 8, 2024 01:48

jerryzh168 force-pushed the refactor-dispatch branch 9 times, most recently from 5614006 to 03ac03e Compare August 8, 2024 23:37

jerryzh168 requested a review from kimishpatel August 9, 2024 00:55

HDCharles reviewed Aug 9, 2024

View reviewed changes

torchao/dtypes/affine_quantized_tensor.py Outdated Show resolved Hide resolved

HDCharles reviewed Aug 9, 2024

View reviewed changes

torchao/dtypes/affine_quantized_tensor.py Show resolved Hide resolved

HDCharles reviewed Aug 9, 2024

View reviewed changes

torchao/dtypes/affine_quantized_tensor.py Outdated Show resolved Hide resolved

jerryzh168 force-pushed the refactor-dispatch branch 2 times, most recently from 4fd8e86 to f0792c4 Compare August 9, 2024 05:37

jerryzh168 marked this pull request as draft August 9, 2024 06:10

jerryzh168 force-pushed the refactor-dispatch branch from f0792c4 to 31f7a28 Compare August 9, 2024 17:23

kimishpatel reviewed Aug 11, 2024

View reviewed changes

jerryzh168 force-pushed the refactor-dispatch branch from 31f7a28 to 41cebd6 Compare August 13, 2024 21:48

jerryzh168 changed the title ~~Refactor implements to support secondary dispatch~~ Refactor _quantized_linear for better extensibility Aug 13, 2024

jerryzh168 force-pushed the refactor-dispatch branch from 41cebd6 to a0f41c8 Compare August 13, 2024 21:55

jerryzh168 marked this pull request as ready for review August 13, 2024 21:56

jerryzh168 force-pushed the refactor-dispatch branch 2 times, most recently from b6787b2 to 4c4e884 Compare August 14, 2024 02:40

HDCharles approved these changes Aug 14, 2024

View reviewed changes

jerryzh168 force-pushed the refactor-dispatch branch from 4c4e884 to b4ee5c0 Compare August 14, 2024 19:38

jerryzh168 merged commit 582b6d4 into pytorch:main Aug 14, 2024
14 checks passed

jerryzh168 deleted the refactor-dispatch branch August 14, 2024 20:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `_quantized_linear` for better extensibility #634

Refactor `_quantized_linear` for better extensibility #634

jerryzh168 commented Aug 8, 2024 •

edited

Loading

pytorch-bot bot commented Aug 8, 2024 •

edited

Loading

jerryzh168 commented Aug 9, 2024

kimishpatel commented Aug 11, 2024

kimishpatel Aug 11, 2024

jerryzh168 Aug 13, 2024

jerryzh168 Aug 13, 2024 •

edited

Loading

kimishpatel Aug 13, 2024

jerryzh168 commented Aug 13, 2024 •

edited

Loading

kimishpatel commented Aug 13, 2024

HDCharles commented Aug 14, 2024

jerryzh168 commented Aug 14, 2024

Refactor _quantized_linear for better extensibility #634

Refactor _quantized_linear for better extensibility #634

Conversation

jerryzh168 commented Aug 8, 2024 • edited Loading

pytorch-bot bot commented Aug 8, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/634

✅ No Failures

jerryzh168 commented Aug 9, 2024

kimishpatel commented Aug 11, 2024

kimishpatel Aug 11, 2024

Choose a reason for hiding this comment

jerryzh168 Aug 13, 2024

Choose a reason for hiding this comment

jerryzh168 Aug 13, 2024 • edited Loading

Choose a reason for hiding this comment

kimishpatel Aug 13, 2024

Choose a reason for hiding this comment

jerryzh168 commented Aug 13, 2024 • edited Loading

kimishpatel commented Aug 13, 2024

HDCharles commented Aug 14, 2024

jerryzh168 commented Aug 14, 2024

Refactor `_quantized_linear` for better extensibility #634

Refactor `_quantized_linear` for better extensibility #634

jerryzh168 commented Aug 8, 2024 •

edited

Loading

pytorch-bot bot commented Aug 8, 2024 •

edited

Loading

jerryzh168 Aug 13, 2024 •

edited

Loading

jerryzh168 commented Aug 13, 2024 •

edited

Loading