Remove int_scaled_mm's dependency on triton for cpu #128

Xia-Weiwen · 2024-04-08T10:32:17Z

In #121, the CPU version of int_scaled_mm was implemented by @torch.library.impl in torchao/kernel/intmm_triton.py. It requires triton installed to call this op. However, the CPU implementation should not depend on triton. This PR moves the implementation to torchao/kernel/intmm.py and does not use @torch.library.impl.
AUTOTUNER_ENABLE is not required anymore, either. (Not sure if this is reasonable)

Test is still covered by UT in test/kernel/test_autotuner.py.

Xia-Weiwen · 2024-04-08T10:33:19Z

Hi @cpuhrsch Could you please review and see if the changes are reasonable to you? Thanks.

Xia-Weiwen · 2024-04-10T02:09:39Z

Hi @cpuhrsch Could you please suggest how to deal with the issue (CPU impl availability depends on triton and AUTOTUNER_ENABLE)? Thanks!

cpuhrsch · 2024-04-10T18:15:12Z

Hey @Xia-Weiwen - Thank you for the PR! Sorry for the delay in review. Also, please note the CI hasn't run green.

Another way to resolve this could be to move

@torch.library.impl(lib, "int_scaled_matmul", "CPU")
def int_scaled_matmul_cpu(a, b, scales1):
    c = torch._int_mm(a, b)
    return c.to(scales1.dtype) * scales1

into torchao/kernel/intmm.py which shouldn't have a dependency on triton. Just be sure to also define lib = torch.library.Library("torchao", "FRAGMENT")

Xia-Weiwen · 2024-04-11T07:39:37Z

@cpuhrsch Thanks! I will give it a try. A question is what AUTOTUNER_ENABLE is and whether CPU impl should depend on it or not.

cpuhrsch · 2024-04-11T17:09:48Z

@Xia-Weiwen - it's used for a Triton autotuner that allows us to cycle over a very large number of configs for a given fixed input shape. See https://github.com/pytorch-labs/ao/tree/main/torchao/kernel#autotuner-and-custom-triton-kernels

Xia-Weiwen · 2024-04-12T08:19:29Z

Thank you @cpuhrsch. Looks like CPU impl does not need this.

Avoid int_scaled_mm's dependency on triton for cpu

d7f77c5

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 8, 2024

cpuhrsch and others added 5 commits April 15, 2024 12:23

Merge branch 'main' into cpu_int_scaled_mm_2

ded45c1

Merge branch 'main' into cpu_int_scaled_mm_2

e5758e1

refine implementation

946374c

Fall back to ref if torch._int_mm not available on CPU

43c25b6

Merge branch 'main' into cpu_int_scaled_mm_2

f3b979a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove int_scaled_mm's dependency on triton for cpu #128

Remove int_scaled_mm's dependency on triton for cpu #128

Xia-Weiwen commented Apr 8, 2024

Xia-Weiwen commented Apr 8, 2024

Xia-Weiwen commented Apr 10, 2024

cpuhrsch commented Apr 10, 2024

Xia-Weiwen commented Apr 11, 2024

cpuhrsch commented Apr 11, 2024

Xia-Weiwen commented Apr 12, 2024

Remove int_scaled_mm's dependency on triton for cpu #128

Are you sure you want to change the base?

Remove int_scaled_mm's dependency on triton for cpu #128

Conversation

Xia-Weiwen commented Apr 8, 2024

Xia-Weiwen commented Apr 8, 2024

Xia-Weiwen commented Apr 10, 2024

cpuhrsch commented Apr 10, 2024

Xia-Weiwen commented Apr 11, 2024

cpuhrsch commented Apr 11, 2024

Xia-Weiwen commented Apr 12, 2024