Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autoquant using aqt #609

Merged
merged 6 commits into from
Aug 8, 2024
Merged

autoquant using aqt #609

merged 6 commits into from
Aug 8, 2024

Conversation

HDCharles
Copy link
Contributor

@HDCharles HDCharles commented Aug 6, 2024

Summary:

changing autoquant to use aqt instead of the old subclass subtensors changed aqt to first dispatch to a static _quantized_linear_op which then dispatches to the normal function. This way autoquant has an extention point to modify the kernel functions for various quantization modes without editing the main kernel function of all the classes. linear_activation_quantized_tensor got the same treatment.

there were some transposes found in the aqt kernels not present in the subclass kernels, however they do not seen to affect performance (see benchmark_results.txt for an autoquant perf run)

Test Plan:

sh benchmarks.sh

python test_integration.py

Reviewers:

Subscribers:

Tasks:

Tags:

Copy link

pytorch-bot bot commented Aug 6, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/609

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f5ac4bf with merge base 87869f2 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 6, 2024
# return weight

# avoid circular dep
from torchao.dtypes import to_affine_quantized
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we reuse the code from quant_api.py? also we need to refactor the input_quant_func to be a normal function (not lambda) in order for serialization to work I think, might help to do that refactor at the same time

Copy link
Contributor Author

@HDCharles HDCharles Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue is that we need to call from_float with super in order to get this to work correctly. The code in quant_api would generate an aqt, not an autoquant class inheriting from an aqt. (I tried this approach initially since that's how it worked with subclass)

If we want to reuse the code, it may make sense to make a function that like prepares all the variables needed to go into from_float

def int8_weight_only_kwargs(weight):
       ...do the code up to to_affine_quantized and put it all together
       return a_bunch_of_kwargs
       
then you could have the quant_api code be like

def int8_weight_only():
    def apply_int8wo_quant(weight):
        kwargs = int8_weight_only_kwargs(weight)   
        return to_affine_quantized(**kwargs)
    return _get_linear_subclass_inserter(apply_int8wo_quant)

then in autoquant we could do similar

def from_float(weight):
      kwargs = int8_weight_only_kwargs(weight)
      super().from_float(**kwargs)
      

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does move apply_int8wo_quant function to top level help? I'm doing it here: https://github.com/pytorch/ao/pull/630/files, seems like you call the super().from_float after the aqt is produced right

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can move the apply quant to weight Tensor function to top level as well I think

Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, the changes looks good to me, please fix CI before landing

Comment on lines +278 to +281
# in_features = weight.shape[1]
# int8 dynamic quantization only has benefit when in_feature > 16
# if in_features <= 16:
# return weight
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are these? should these be enabled or removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a todo

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants