Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor _quantized_linear for better extensibility #634

Merged
merged 1 commit into from
Aug 14, 2024

Commits on Aug 14, 2024

  1. Refactor implements to support secondary dispatch

    Summary:
    Some popular ops like linear will get a lot of implementations based on
    the different characteristics of input and weight, e.g. int8 act + int8 weight, int8 act + int4 weight
    etc. For `AffineQuantizedTensor` rigth now all of these implementations are added to the main body of the implementation of linear dispatch, this makes the code hard to read and extend. In this PR we supported
    a secondary dispatch condition check for `implements` function:
    
    ```
    def disptch_condition(func, types, args, kwargs):
        ...
    
    @implements(torch.nn.functional.linear, dispatch_condition)
    def _(func, types, args, kwargs):
        # implementation for inputs that passes the dispatch_condition
        ...
    
    @implements(torch.nn.functional.linear)
    def _(func, types, args, kwargs):
        ...
    ```
    
    Test Plan:
    regression tests
    python test/quantization/test_quant_api.py
    python test/integration/test_integration.py
    
    python tutorials/quantize_vit/run_vit_b_quant.py
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    jerryzh168 committed Aug 14, 2024
    Configuration menu
    Copy the full SHA
    b4ee5c0 View commit details
    Browse the repository at this point in the history