-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove int_scaled_mm's dependency on triton for cpu #128
base: main
Are you sure you want to change the base?
Conversation
Hi @cpuhrsch Could you please review and see if the changes are reasonable to you? Thanks. |
Hi @cpuhrsch Could you please suggest how to deal with the issue (CPU impl availability depends on triton and AUTOTUNER_ENABLE)? Thanks! |
Hey @Xia-Weiwen - Thank you for the PR! Sorry for the delay in review. Also, please note the CI hasn't run green. Another way to resolve this could be to move
into |
@cpuhrsch Thanks! I will give it a try. A question is what |
@Xia-Weiwen - it's used for a Triton autotuner that allows us to cycle over a very large number of configs for a given fixed input shape. See https://github.com/pytorch-labs/ao/tree/main/torchao/kernel#autotuner-and-custom-triton-kernels |
Thank you @cpuhrsch. Looks like CPU impl does not need this. |
In #121, the CPU version of int_scaled_mm was implemented by
@torch.library.impl
in torchao/kernel/intmm_triton.py. It requires triton installed to call this op. However, the CPU implementation should not depend on triton. This PR moves the implementation to torchao/kernel/intmm.py and does not use@torch.library.impl
.AUTOTUNER_ENABLE
is not required anymore, either. (Not sure if this is reasonable)Test is still covered by UT in test/kernel/test_autotuner.py.