Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove int_scaled_mm's dependency on triton for cpu #128

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Xia-Weiwen
Copy link
Collaborator

In #121, the CPU version of int_scaled_mm was implemented by @torch.library.impl in torchao/kernel/intmm_triton.py. It requires triton installed to call this op. However, the CPU implementation should not depend on triton. This PR moves the implementation to torchao/kernel/intmm.py and does not use @torch.library.impl.
AUTOTUNER_ENABLE is not required anymore, either. (Not sure if this is reasonable)

Test is still covered by UT in test/kernel/test_autotuner.py.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 8, 2024
@Xia-Weiwen
Copy link
Collaborator Author

Hi @cpuhrsch Could you please review and see if the changes are reasonable to you? Thanks.

@Xia-Weiwen
Copy link
Collaborator Author

Hi @cpuhrsch Could you please suggest how to deal with the issue (CPU impl availability depends on triton and AUTOTUNER_ENABLE)? Thanks!

@cpuhrsch
Copy link
Contributor

Hey @Xia-Weiwen - Thank you for the PR! Sorry for the delay in review. Also, please note the CI hasn't run green.

Another way to resolve this could be to move

@torch.library.impl(lib, "int_scaled_matmul", "CPU")
def int_scaled_matmul_cpu(a, b, scales1):
    c = torch._int_mm(a, b)
    return c.to(scales1.dtype) * scales1

into torchao/kernel/intmm.py which shouldn't have a dependency on triton. Just be sure to also define lib = torch.library.Library("torchao", "FRAGMENT")

@Xia-Weiwen
Copy link
Collaborator Author

@cpuhrsch Thanks! I will give it a try. A question is what AUTOTUNER_ENABLE is and whether CPU impl should depend on it or not.

@cpuhrsch
Copy link
Contributor

@Xia-Weiwen - it's used for a Triton autotuner that allows us to cycle over a very large number of configs for a given fixed input shape. See https://github.com/pytorch-labs/ao/tree/main/torchao/kernel#autotuner-and-custom-triton-kernels

@Xia-Weiwen
Copy link
Collaborator Author

Thank you @cpuhrsch. Looks like CPU impl does not need this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants