-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor _quantized_linear
for better extensibility
#634
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/634
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit b4ee5c0 with merge base 433cd14 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
5614006
to
03ac03e
Compare
cc @kimishpatel we discussed about this a bit before, hopefully this makes extension outside of the repo easier |
4fd8e86
to
f0792c4
Compare
f0792c4
to
31f7a28
Compare
I need to look at the PR in more detail but why are we not proposing "extending" AffineQuantizedTensor class itself as a better solution. This allows for various packing and other things be also encoded by a class that wants different dispatch behavior. I there a reason why AffineQuantizedTensor class should take on this responsibility? |
func = torch.nn.functional.linear | ||
args = (input_tensor, weight_tensor, bias) | ||
kwargs = {} | ||
if hasattr(cls, "_DISPATCH_TABLE_WITH_CONDITION"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like aqt expanding on its responsibility with some APIs whose need seems to have popped up now. I do feel AffineQuantizedTensor should not take on responsibility of dispatch at all, or if it does, it should just be dq-op-q. Rest should be by derived class.
I guess here you are asking users to register their favorite dispatch but I feel we are just implementing dispatcher now. I suspect subclass approach might be cleaner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AQT have to do dispatch because cpu/cuda or other devices code path should be optimized by default I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but thinking about this dispatch table, it does feel very complicated. I'm planning to revert this part of the change, and just have this as a change to refactor _quantized_op
function for now, we can discuss later about the support for lower bit kernels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AQT have to do dispatch because cpu/cuda or other devices code path should be optimized by default I think
That is exactly what expanding responsibility is honestly. And this is why we end up with trying to figure out how to extend to accommodate different dispatch needs within AQT itself.
so packing can be natively supported in the tensor as an attribute right now, it feels more complicated to do subclass than just adding an attribute? it might help to see the impact on all use cases though. also why AffineQuantizedTensor take on the responsibility of different dispatches? - this is just following what |
Ok I need to understand this better perhaps |
31f7a28
to
41cebd6
Compare
implements
to support secondary dispatch_quantized_linear
for better extensibility
41cebd6
to
a0f41c8
Compare
b6787b2
to
4c4e884
Compare
looks ok to me, I think with these changes (especially the last version) we may want to add a unit test to check that the expected kernels are being used in autoquant, rather than what would normally be there from the dispatch. May be a todo for if we try a more complicated dispatch setup again. |
Summary: Some popular ops like linear will get a lot of implementations based on the different characteristics of input and weight, e.g. int8 act + int8 weight, int8 act + int4 weight etc. For `AffineQuantizedTensor` rigth now all of these implementations are added to the main body of the implementation of linear dispatch, this makes the code hard to read and extend. In this PR we supported a secondary dispatch condition check for `implements` function: ``` def disptch_condition(func, types, args, kwargs): ... @implements(torch.nn.functional.linear, dispatch_condition) def _(func, types, args, kwargs): # implementation for inputs that passes the dispatch_condition ... @implements(torch.nn.functional.linear) def _(func, types, args, kwargs): ... ``` Test Plan: regression tests python test/quantization/test_quant_api.py python test/integration/test_integration.py python tutorials/quantize_vit/run_vit_b_quant.py Reviewers: Subscribers: Tasks: Tags:
4c4e884
to
b4ee5c0
Compare
do we already have the check here: ao/test/integration/test_integration.py Line 934 in acfb0da
|
Summary:
Some popular ops like linear will get a lot of implementations based on the different characteristics of input and weight, e.g. int8 act + int8 weight, int8 act + int4 weight etc. For
AffineQuantizedTensor
rigth now all of these implementations are added to the main body of the implementation of linear dispatch, this makes the code hard to read and extend.We refactored the implementation for
_quantized_linear
op to take a list of (dispatch_condition, impl) and go through them one by one, this makes the body of _quantized_linear shorter and easier to maintain.Alternatively we could also add more functionality to
implements
, e.g. add a secondary dispatch condition:implements(func, dispatch_condition)
, but that is a much more complicated discussion that we can delay for later. a few questions we need to think about are:cls._DISPATCH_TABLE[cls][func] = impl
Test Plan:
regression tests
python test/quantization/test_quant_api.py
python test/integration/test_integration.py
python tutorials/quantize_vit/run_vit_b_quant.py
Reviewers:
Subscribers:
Tasks:
Tags: