[RFC] Goal for trition.ops.flash_attention #2267

EPronovost · 2023-09-08T01:59:12Z

Hi! The flash attention implementation is really helpful as a reference. I noticed that the code currently makes some assumptions (e.g. about shapes and strides) and can silently produce incorrect results if used incorrectly.

I've seen some related issues and PRs (e.g. #2033, #2029, #2046, #2086) and am not clear what are the intended goals of this code. Two possibilities I can imagine:

[For developers] This code serves as a complex use case that helps to catch bugs. Covering more use cases (e.g. a different number of queries and keys) or making it "user-friendly" is not a priority.
[For users] This code is meant for users as an alternative to other flash attention implementations. Improving the user experience (e.g. no silently incorrect results) and expanding coverage of use cases are positives.

How do the core developers think about triton.ops.flash_attention? Would you welcome contributions to improve the user experience of this code (e.g. follow up on #2033)? On the one hand I think having a flash attention implementation on par with other libraries could help get folks interested in Triton; on the other hand I imagine the core devs aren't looking to have to maintain more code without a good reason. I'd be happy to help add features to this code if that aligns with what the core devs want.

The text was updated successfully, but these errors were encountered:

jon-chuang · 2023-09-08T08:25:55Z

I myself have been hoping for a collection or library of more advanced applications of triton.

However, it seems that the core concern in this repo is the compiler correctness and performance.

The application code with more production concerns ideally live somewhere else. Perhaps called triton-extra or poseidon. I would be keen to contribute there without getting in the way of the compiler development.

For instance, experimentation with more advanced kernels like #2243, #2259 could take place there

Jokeren · 2023-09-09T00:26:18Z

You probably want to have a discussion with @daemyung

daemyung · 2023-09-09T01:50:20Z

@Jokeren Thanks for noticing me.

My opinion is that Triton is a language, akin to CUDA and SYCL, rather than a library. Consequently, supporting various operations (e.g., flash attention) falls outside Triton's scope. Consider CUDA for comparison: CUDA itself doesn't offer implementations for specific operations. Instead, libraries based on CUDA (like CUTLASS, cuBLAS, and cub) provide those implementations.

For these reasons, I have initiated Trident. Trident is a performance library designed for machine learning applications, with a focus on accelerating both training and inference. It comprises highly optimized kernels, functions, and modules tailored for machine learning and is built upon Triton.

Therefore, I believe the purpose of triton.ops.flash_attention is geared towards developers rather than users.

I want to share the reasoning behind the name 'Trident'. Triton is a Greek god, and his signature weapon is Trident. The library was named 'Trident' because, much like the god Triton exhibits his full potential when wielding Trident, our Triton reaches its peak performance when paired with Trident.

chengjunlu mentioned this issue Nov 14, 2023

[BACKEND] To generalize TritonGPU dialect for GPU of different vendors #2652

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Goal for trition.ops.flash_attention #2267

[RFC] Goal for trition.ops.flash_attention #2267

EPronovost commented Sep 8, 2023

jon-chuang commented Sep 8, 2023 •

edited

Loading

Jokeren commented Sep 9, 2023

daemyung commented Sep 9, 2023 •

edited

Loading

[RFC] Goal for trition.ops.flash_attention #2267

[RFC] Goal for trition.ops.flash_attention #2267

Comments

EPronovost commented Sep 8, 2023

jon-chuang commented Sep 8, 2023 • edited Loading

Jokeren commented Sep 9, 2023

daemyung commented Sep 9, 2023 • edited Loading

jon-chuang commented Sep 8, 2023 •

edited

Loading

daemyung commented Sep 9, 2023 •

edited

Loading