Flash-attention-v2 triton version (adding bias) #2029

tiandiao123 · 2023-08-03T14:07:49Z

Hello friends:
I am wondering whether someone here can help check my modified version of fused_attention_bias(https://gist.github.com/tiandiao123/0b82ea31a5dc5865663c2966e369b05a#file-flash_attention_bias-py-L106). I am trying to use triton and original example in tutorial to modify original flash attention algorithm. In my opinion,I can only simply add a bias_ptr, and load corresponding location into SRAM, and then add them into qk, but it is not what I expected in the result since the output is not exactly matched with my pytorch implementation after testing it. someone has some ideas?

tiandiao123 · 2023-08-04T03:02:38Z

probably @ptillet can take a look at it?

chaaland · 2023-08-09T19:01:34Z

Have you looked at this implementation? I haven't checked its correctness myself though but the comments about bugs in some head_dim regimes indicates it's been tested

tiandiao123 · 2023-08-10T05:52:28Z

Have you looked at this implementation? I haven't checked its correctness myself though but the comments about bugs in some head_dim regimes indicates it's been tested

let me check!

chaaland · 2023-08-12T06:37:28Z

Looks like this one from mosaic uses the same implementation

shiqingzhangCSU · 2023-09-11T03:21:54Z

Is there any progress? I also want to implement flash2+bias.

Hello friends: I am wondering whether someone here can help check my modified version of fused_attention_bias(https://gist.github.com/tiandiao123/0b82ea31a5dc5865663c2966e369b05a#file-flash_attention_bias-py-L106). I am trying to use triton and original example in tutorial to modify original flash attention algorithm. In my opinion,I can only simply add a bias_ptr, and load corresponding location into SRAM, and then add them into qk, but it is not what I expected in the result since the output is not exactly matched with my pytorch implementation after testing it. someone has some ideas?

shiqingzhangCSU · 2023-09-18T03:09:52Z

@tiandiao123 Hi？Is there any update? I currently want to implement this version too.

chaaland · 2023-09-18T20:52:00Z

@shiqingzhangCSU why not just use the one from mosaic?

tiandiao123 · 2023-09-22T14:57:15Z

@tiandiao123 Hi？Is there any update? I currently want to implement this version too.

I saw lightllm has some updates: https://github.com/ModelTC/lightllm/blob/main/lightllm/models/bloom/triton_kernel/context_flashattention_nopad.py#L10

juntang-zhuang · 2024-05-25T21:31:21Z

Mark, important feature

alexzhang13 · 2024-08-03T06:46:07Z

I've written a version of this here: https://github.com/alexzhang13/flashattention2-custom-mask. To add arbitrary attention biases you just need to remove the masking logic (torch.where).

If it's still needed, I can write out this explicit functionality as well.

EPronovost mentioned this issue Sep 8, 2023

[RFC] Goal for trition.ops.flash_attention #2267

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flash-attention-v2 triton version (adding bias) #2029

Flash-attention-v2 triton version (adding bias) #2029

tiandiao123 commented Aug 3, 2023 •

edited

Loading

tiandiao123 commented Aug 4, 2023

chaaland commented Aug 9, 2023

tiandiao123 commented Aug 10, 2023

chaaland commented Aug 12, 2023

shiqingzhangCSU commented Sep 11, 2023

shiqingzhangCSU commented Sep 18, 2023

chaaland commented Sep 18, 2023

tiandiao123 commented Sep 22, 2023

juntang-zhuang commented May 25, 2024

alexzhang13 commented Aug 3, 2024

Flash-attention-v2 triton version (adding bias) #2029

Flash-attention-v2 triton version (adding bias) #2029

Comments

tiandiao123 commented Aug 3, 2023 • edited Loading

tiandiao123 commented Aug 4, 2023

chaaland commented Aug 9, 2023

tiandiao123 commented Aug 10, 2023

chaaland commented Aug 12, 2023

shiqingzhangCSU commented Sep 11, 2023

shiqingzhangCSU commented Sep 18, 2023

chaaland commented Sep 18, 2023

tiandiao123 commented Sep 22, 2023

juntang-zhuang commented May 25, 2024

alexzhang13 commented Aug 3, 2024

tiandiao123 commented Aug 3, 2023 •

edited

Loading