Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using custom attention mask #584

Open
whatisslove11 opened this issue Oct 3, 2023 · 7 comments
Open

using custom attention mask #584

whatisslove11 opened this issue Oct 3, 2023 · 7 comments

Comments

@whatisslove11
Copy link

Hello!
I am doing a translation task and would like to try using flash attention in my model
In addition to the usual triangular mask, I also need to mask padding tokens so that the model does not pay attention to them - sequences of the same length already arrive in the model itself
As I understand it, there is no function of feeding your mask yet
Could you tell me how I can use my mask or make flash attention add padding tokens to the mask itself?

@whatisslove11
Copy link
Author

@tridao

@tridao
Copy link
Contributor

tridao commented Oct 10, 2023

You can look at how we do it in BERT: Remove all padding tokens before the first layer.

Idk if that works for translation.

@flower-with-safe
Copy link

what if I want to use reset-attention-mask when pretrain a llama model?
for example, my attention mask could be:
tensor([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[1., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[1., 1., 1., 0., 0., 0., 0., 0., 0., 0.],
[1., 1., 1., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 1., 1., 0., 0., 0.],
[0., 0., 0., 0., 1., 1., 1., 1., 0., 0.],
[0., 0., 0., 0., 1., 1., 1., 1., 1., 0.],
[0., 0., 0., 0., 1., 1., 1., 1., 1., 1.]])
in such case how could I use flash attention?
@tridao

@sentialx
Copy link

@tridao Is it possible to use a PrefixLM attention mask?

@tridao
Copy link
Contributor

tridao commented Oct 11, 2023

No that's not supported.

@sentialx
Copy link

It looks like custom attention mask is the most requested feature. In fact, it seems to be already solved by this PR: #57

@iiLaurens
Copy link

iiLaurens commented Oct 20, 2023

Am also much interested in custom masks. I think the value of Prefixlm mask is not appreciated enough. Would like to experiment continuation of pretraining with prefixlm a la UL2RUL2R.

@tridao, are you considering support for a custom attention masks? Or do you have specific objections to it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants