using custom attention mask #584

whatisslove11 · 2023-10-03T10:07:19Z

Hello!
I am doing a translation task and would like to try using flash attention in my model
In addition to the usual triangular mask, I also need to mask padding tokens so that the model does not pay attention to them - sequences of the same length already arrive in the model itself
As I understand it, there is no function of feeding your mask yet
Could you tell me how I can use my mask or make flash attention add padding tokens to the mask itself?

whatisslove11 · 2023-10-10T04:20:25Z

@tridao

tridao · 2023-10-10T04:22:17Z

You can look at how we do it in BERT: Remove all padding tokens before the first layer.

Idk if that works for translation.

flower-with-safe · 2023-10-10T09:20:50Z

what if I want to use reset-attention-mask when pretrain a llama model?
for example, my attention mask could be:
tensor([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[1., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[1., 1., 1., 0., 0., 0., 0., 0., 0., 0.],
[1., 1., 1., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 1., 1., 0., 0., 0.],
[0., 0., 0., 0., 1., 1., 1., 1., 0., 0.],
[0., 0., 0., 0., 1., 1., 1., 1., 1., 0.],
[0., 0., 0., 0., 1., 1., 1., 1., 1., 1.]])
in such case how could I use flash attention?
@tridao

sentialx · 2023-10-11T15:57:47Z

@tridao Is it possible to use a PrefixLM attention mask?

tridao · 2023-10-11T17:04:29Z

No that's not supported.

sentialx · 2023-10-12T11:30:48Z

It looks like custom attention mask is the most requested feature. In fact, it seems to be already solved by this PR: #57

iiLaurens · 2023-10-20T19:17:28Z

Am also much interested in custom masks. I think the value of Prefixlm mask is not appreciated enough. Would like to experiment continuation of pretraining with prefixlm a la UL2RUL2R.

@tridao, are you considering support for a custom attention masks? Or do you have specific objections to it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using custom attention mask #584

using custom attention mask #584

whatisslove11 commented Oct 3, 2023

whatisslove11 commented Oct 10, 2023

tridao commented Oct 10, 2023

flower-with-safe commented Oct 10, 2023

sentialx commented Oct 11, 2023

tridao commented Oct 11, 2023

sentialx commented Oct 12, 2023

iiLaurens commented Oct 20, 2023 •

edited

Loading

using custom attention mask #584

using custom attention mask #584

Comments

whatisslove11 commented Oct 3, 2023

whatisslove11 commented Oct 10, 2023

tridao commented Oct 10, 2023

flower-with-safe commented Oct 10, 2023

sentialx commented Oct 11, 2023

tridao commented Oct 11, 2023

sentialx commented Oct 12, 2023

iiLaurens commented Oct 20, 2023 • edited Loading

iiLaurens commented Oct 20, 2023 •

edited

Loading