-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
using custom attention mask #584
Comments
You can look at how we do it in BERT: Remove all padding tokens before the first layer. Idk if that works for translation. |
what if I want to use reset-attention-mask when pretrain a llama model? |
@tridao Is it possible to use a PrefixLM attention mask? |
No that's not supported. |
It looks like custom attention mask is the most requested feature. In fact, it seems to be already solved by this PR: #57 |
Hello!
I am doing a translation task and would like to try using flash attention in my model
In addition to the usual triangular mask, I also need to mask padding tokens so that the model does not pay attention to them - sequences of the same length already arrive in the model itself
As I understand it, there is no function of feeding your mask yet
Could you tell me how I can use my mask or make flash attention add padding tokens to the mask itself?
The text was updated successfully, but these errors were encountered: