Refactor `_quantized_linear` for better extensibility #634

@implements

Summary: Some popular ops like linear will get a lot of implementations based on the different characteristics of input and weight, e.g. int8 act + int8 weight, int8 act + int4 weight etc. For `AffineQuantizedTensor` rigth now all of these implementations are added to the main body of the implementation of linear dispatch, this makes the code hard to read and extend. In this PR we supported a secondary dispatch condition check for `implements` function: ``` def disptch_condition(func, types, args, kwargs): ... @implements(torch.nn.functional.linear, dispatch_condition) def _(func, types, args, kwargs): # implementation for inputs that passes the dispatch_condition ... @implements(torch.nn.functional.linear) def _(func, types, args, kwargs): ... ``` Test Plan: regression tests python test/quantization/test_quant_api.py python test/integration/test_integration.py python tutorials/quantize_vit/run_vit_b_quant.py Reviewers: Subscribers: Tasks: Tags:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `_quantized_linear` for better extensibility #634

Refactor `_quantized_linear` for better extensibility #634

Commits on Aug 14, 2024

Refactor _quantized_linear for better extensibility #634

Refactor _quantized_linear for better extensibility #634

Commits on Aug 14, 2024

Refactor `_quantized_linear` for better extensibility #634

Refactor `_quantized_linear` for better extensibility #634