autoquant using aqt #609

HDCharles · 2024-08-06T17:37:29Z

Summary:

changing autoquant to use aqt instead of the old subclass subtensors changed aqt to first dispatch to a static _quantized_linear_op which then dispatches to the normal function. This way autoquant has an extention point to modify the kernel functions for various quantization modes without editing the main kernel function of all the classes. linear_activation_quantized_tensor got the same treatment.

there were some transposes found in the aqt kernels not present in the subclass kernels, however they do not seen to affect performance (see benchmark_results.txt for an autoquant perf run)

Test Plan:

sh benchmarks.sh

python test_integration.py

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2024-08-06T17:37:32Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/609

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f5ac4bf with merge base 87869f2 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerryzh168 · 2024-08-07T02:06:03Z

torchao/quantization/autoquant.py

+            # return weight
+
+        # avoid circular dep
+        from torchao.dtypes import to_affine_quantized


can we reuse the code from quant_api.py? also we need to refactor the input_quant_func to be a normal function (not lambda) in order for serialization to work I think, might help to do that refactor at the same time

the issue is that we need to call from_float with super in order to get this to work correctly. The code in quant_api would generate an aqt, not an autoquant class inheriting from an aqt. (I tried this approach initially since that's how it worked with subclass)

If we want to reuse the code, it may make sense to make a function that like prepares all the variables needed to go into from_float

def int8_weight_only_kwargs(weight): ...do the code up to to_affine_quantized and put it all together return a_bunch_of_kwargs then you could have the quant_api code be like def int8_weight_only(): def apply_int8wo_quant(weight): kwargs = int8_weight_only_kwargs(weight) return to_affine_quantized(**kwargs) return _get_linear_subclass_inserter(apply_int8wo_quant) then in autoquant we could do similar def from_float(weight): kwargs = int8_weight_only_kwargs(weight) super().from_float(**kwargs)

does move apply_int8wo_quant function to top level help? I'm doing it here: https://github.com/pytorch/ao/pull/630/files, seems like you call the super().from_float after the aqt is produced right

we can move the apply quant to weight Tensor function to top level as well I think

jerryzh168

thanks, the changes looks good to me, please fix CI before landing

jerryzh168 · 2024-08-07T02:08:09Z

torchao/quantization/autoquant.py

+        # in_features = weight.shape[1]
+        # int8 dynamic quantization only has benefit when in_feature > 16
+        # if in_features <= 16:
+            # return weight


what are these? should these be enabled or removed?

added a todo

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 6, 2024

HDCharles requested review from jerryzh168 and msaroufim August 6, 2024 17:37

HDCharles force-pushed the 063_aqt_autoquant branch from d4d5abe to 2623bab Compare August 7, 2024 00:40

jerryzh168 reviewed Aug 7, 2024

View reviewed changes

jerryzh168 approved these changes Aug 7, 2024

View reviewed changes

jerryzh168 reviewed Aug 7, 2024

View reviewed changes

HDCharles added 4 commits August 7, 2024 17:10

autoquant using aqt

678b0db

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

finalizing PR

d1bda5e

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

fixing tests

6d63f7f

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

fixing test failures in old pytorch

5832bf8

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

HDCharles force-pushed the 063_aqt_autoquant branch from 56b44aa to 5832bf8 Compare August 8, 2024 00:10

HDCharles added 2 commits August 7, 2024 17:11

fixing comment

2b90abe

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

more test fixes

f5ac4bf

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

HDCharles merged commit 934dead into main Aug 8, 2024
14 checks passed

This was referenced Aug 13, 2024

Add PyTorch 2.4 tests in CI #654

Merged

export error when tracing tensor subclass tensors: 'FakeTensor' object has no attribute 'get_plain' pytorch/pytorch#133262

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

autoquant using aqt #609

autoquant using aqt #609

HDCharles commented Aug 6, 2024 •

edited

Loading

pytorch-bot bot commented Aug 6, 2024 •

edited

Loading

jerryzh168 Aug 7, 2024

HDCharles Aug 8, 2024 •

edited

Loading

jerryzh168 Aug 8, 2024

jerryzh168 Aug 8, 2024

jerryzh168 left a comment •

edited

Loading

jerryzh168 Aug 7, 2024

HDCharles Aug 8, 2024

autoquant using aqt #609

autoquant using aqt #609

Conversation

HDCharles commented Aug 6, 2024 • edited Loading

pytorch-bot bot commented Aug 6, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/609

✅ No Failures

jerryzh168 Aug 7, 2024

Choose a reason for hiding this comment

HDCharles Aug 8, 2024 • edited Loading

Choose a reason for hiding this comment

jerryzh168 Aug 8, 2024

Choose a reason for hiding this comment

jerryzh168 Aug 8, 2024

Choose a reason for hiding this comment

jerryzh168 left a comment • edited Loading

Choose a reason for hiding this comment

jerryzh168 Aug 7, 2024

Choose a reason for hiding this comment

HDCharles Aug 8, 2024

Choose a reason for hiding this comment

HDCharles commented Aug 6, 2024 •

edited

Loading

pytorch-bot bot commented Aug 6, 2024 •

edited

Loading

HDCharles Aug 8, 2024 •

edited

Loading

jerryzh168 left a comment •

edited

Loading