Make fused RMSNorm a registered op #199

lessw2020 · 2024-04-05T04:48:20Z

Adding this as tracking issue to unblock #181 from landing:
per @wanchaol :
IMO we should also register the fwd/bwd rmsnorm kernel as a PyTorch op, this is so that:

making it a custom op makes it compatible with PT2, which I believe it's currently graph breaking on the FusedRMSNorm path if we turn on torch.compile
it allows other components (i.e. DTensor) to provide sharding rule to this custom op so that it would compatible with the tensor parallelism

tianyu-l · 2024-05-08T21:39:54Z

update: Hit IMA issues for both my implementation #296 and @wconstab's #303. Working on debugging with @lessw2020 .

tianyu-l · 2024-06-20T22:09:26Z

closing this as we have supported this fused RMSNorm in Tensor Parallelism (#404).

lessw2020 mentioned this issue Apr 5, 2024

Add FusedRMSNorm (Triton kernel, +15% eager), Add NPLayerNorm, Enable config selectable Norm Type #181

Merged

tianyu-l self-assigned this May 3, 2024

tianyu-l added enhancement New feature or request bug Something isn't working labels May 3, 2024

tianyu-l closed this as completed Jun 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make fused RMSNorm a registered op #199

Make fused RMSNorm a registered op #199

lessw2020 commented Apr 5, 2024

tianyu-l commented May 8, 2024

tianyu-l commented Jun 20, 2024

Make fused RMSNorm a registered op #199

Make fused RMSNorm a registered op #199

Comments

lessw2020 commented Apr 5, 2024

tianyu-l commented May 8, 2024

tianyu-l commented Jun 20, 2024