Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor _quantized_linear for better extensibility #634

Merged
merged 1 commit into from
Aug 14, 2024

Conversation

jerryzh168
Copy link
Contributor

@jerryzh168 jerryzh168 commented Aug 8, 2024

Summary:
Some popular ops like linear will get a lot of implementations based on the different characteristics of input and weight, e.g. int8 act + int8 weight, int8 act + int4 weight etc. For AffineQuantizedTensor rigth now all of these implementations are added to the main body of the implementation of linear dispatch, this makes the code hard to read and extend.

We refactored the implementation for _quantized_linear op to take a list of (dispatch_condition, impl) and go through them one by one, this makes the body of _quantized_linear shorter and easier to maintain.

Alternatively we could also add more functionality to implements, e.g. add a secondary dispatch condition: implements(func, dispatch_condition), but that is a much more complicated discussion that we can delay for later. a few questions we need to think about are:

  • how do we allow people to override all implementations for a specific function? (used in autoquant)
  • how do we make sure the dispatch condition people registered are called at the right order? e.g. if we have static quant, weight only quant, activation activation quant implementation/conditions, static quant (two inputs quantized) should come before the others, this might mean we also have to introduce the concept of secondary dispatch key here
  • what happens to the dispatch table during inheritance? currently I think the dispatch table is just shared between parent and child, but if needed, we can make the dispatch table to be keyed on class as well and copy paste the dispatch table when a child class inherits it: cls._DISPATCH_TABLE[cls][func] = impl

Test Plan:
regression tests
python test/quantization/test_quant_api.py
python test/integration/test_integration.py

python tutorials/quantize_vit/run_vit_b_quant.py

Reviewers:

Subscribers:

Tasks:

Tags:

Copy link

pytorch-bot bot commented Aug 8, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/634

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b4ee5c0 with merge base 433cd14 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 8, 2024
@jerryzh168 jerryzh168 force-pushed the refactor-dispatch branch 9 times, most recently from 5614006 to 03ac03e Compare August 8, 2024 23:37
@jerryzh168
Copy link
Contributor Author

cc @kimishpatel we discussed about this a bit before, hopefully this makes extension outside of the repo easier

@jerryzh168 jerryzh168 force-pushed the refactor-dispatch branch 2 times, most recently from 4fd8e86 to f0792c4 Compare August 9, 2024 05:37
@jerryzh168 jerryzh168 marked this pull request as draft August 9, 2024 06:10
@kimishpatel
Copy link
Contributor

I need to look at the PR in more detail but why are we not proposing "extending" AffineQuantizedTensor class itself as a better solution. This allows for various packing and other things be also encoded by a class that wants different dispatch behavior.

I there a reason why AffineQuantizedTensor class should take on this responsibility?

func = torch.nn.functional.linear
args = (input_tensor, weight_tensor, bias)
kwargs = {}
if hasattr(cls, "_DISPATCH_TABLE_WITH_CONDITION"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like aqt expanding on its responsibility with some APIs whose need seems to have popped up now. I do feel AffineQuantizedTensor should not take on responsibility of dispatch at all, or if it does, it should just be dq-op-q. Rest should be by derived class.

I guess here you are asking users to register their favorite dispatch but I feel we are just implementing dispatcher now. I suspect subclass approach might be cleaner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AQT have to do dispatch because cpu/cuda or other devices code path should be optimized by default I think

Copy link
Contributor Author

@jerryzh168 jerryzh168 Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but thinking about this dispatch table, it does feel very complicated. I'm planning to revert this part of the change, and just have this as a change to refactor _quantized_op function for now, we can discuss later about the support for lower bit kernels

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AQT have to do dispatch because cpu/cuda or other devices code path should be optimized by default I think

That is exactly what expanding responsibility is honestly. And this is why we end up with trying to figure out how to extend to accommodate different dispatch needs within AQT itself.

@jerryzh168
Copy link
Contributor Author

jerryzh168 commented Aug 13, 2024

I need to look at the PR in more detail but why are we not proposing "extending" AffineQuantizedTensor class itself as a better solution. This allows for various packing and other things be also encoded by a class that wants different dispatch behavior.

I there a reason why AffineQuantizedTensor class should take on this responsibility?

so packing can be natively supported in the tensor as an attribute right now, it feels more complicated to do subclass than just adding an attribute? it might help to see the impact on all use cases though.

also why AffineQuantizedTensor take on the responsibility of different dispatches? - this is just following what torch.Tensor is doing.

@kimishpatel
Copy link
Contributor

this is just following what torch.Tensor is doing

Ok I need to understand this better perhaps

@jerryzh168 jerryzh168 changed the title Refactor implements to support secondary dispatch Refactor _quantized_linear for better extensibility Aug 13, 2024
@jerryzh168 jerryzh168 marked this pull request as ready for review August 13, 2024 21:56
@jerryzh168 jerryzh168 force-pushed the refactor-dispatch branch 2 times, most recently from b6787b2 to 4c4e884 Compare August 14, 2024 02:40
@HDCharles
Copy link
Contributor

looks ok to me, I think with these changes (especially the last version) we may want to add a unit test to check that the expected kernels are being used in autoquant, rather than what would normally be there from the dispatch. May be a todo for if we try a more complicated dispatch setup again.

Summary:
Some popular ops like linear will get a lot of implementations based on
the different characteristics of input and weight, e.g. int8 act + int8 weight, int8 act + int4 weight
etc. For `AffineQuantizedTensor` rigth now all of these implementations are added to the main body of the implementation of linear dispatch, this makes the code hard to read and extend. In this PR we supported
a secondary dispatch condition check for `implements` function:

```
def disptch_condition(func, types, args, kwargs):
    ...

@implements(torch.nn.functional.linear, dispatch_condition)
def _(func, types, args, kwargs):
    # implementation for inputs that passes the dispatch_condition
    ...

@implements(torch.nn.functional.linear)
def _(func, types, args, kwargs):
    ...
```

Test Plan:
regression tests
python test/quantization/test_quant_api.py
python test/integration/test_integration.py

python tutorials/quantize_vit/run_vit_b_quant.py

Reviewers:

Subscribers:

Tasks:

Tags:
@jerryzh168
Copy link
Contributor Author

looks ok to me, I think with these changes (especially the last version) we may want to add a unit test to check that the expected kernels are being used in autoquant, rather than what would normally be there from the dispatch. May be a todo for if we try a more complicated dispatch setup again.

do we already have the check here:

self.assertTrue("mixed_mm" in code, f"got code: {code}")

@jerryzh168 jerryzh168 merged commit 582b6d4 into pytorch:main Aug 14, 2024
14 checks passed
@jerryzh168 jerryzh168 deleted the refactor-dispatch branch August 14, 2024 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants