Fix Gemma2 4d attention mask #31674

hiyouga · 2024-06-27T21:50:24Z

What does this PR do?

(Might) continuation of #31670

The attention mask here should be a 4d mask, while the current implementation mistakenly treated it as a 2d mask. Additionally, the mask needs to be inverted to prevent label leakage.

Before this pr:

After this pr:

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@ArthurZucker

Gemma2 finetuning cannot work until merging huggingface/transformers#31674

ArthurZucker

Thanks, with this we do avoid the overflow!

HuggingFaceDocBuilderDev · 2024-06-28T06:30:56Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Update modeling_gemma2.py Co-authored-by: Arthur <[email protected]>

ArthurZucker · 2024-06-28T06:44:16Z

The patch is out for this! Thanks again 🤗 https://pypi.org/project/transformers/4.42.2/

Gemma2 finetuning cannot work until merging huggingface/transformers#31674

kkk935208447 · 2024-07-02T03:07:02Z

Thank you all for your hard work. I've been dealing with this issue recently as well.

Gemma2 finetuning cannot work until merging huggingface/transformers#31674

Update modeling_gemma2.py

ccbd374

hiyouga added a commit to hiyouga/LLaMA-Factory that referenced this pull request Jun 27, 2024

bf16 by default, gemma2 attns

4d35e21

Gemma2 finetuning cannot work until merging huggingface/transformers#31674

ArthurZucker approved these changes Jun 28, 2024

View reviewed changes

Merge branch 'main' into patch-6

260cda3

ArthurZucker merged commit 5e89b33 into huggingface:main Jun 28, 2024
18 of 23 checks passed

ArthurZucker added a commit that referenced this pull request Jun 28, 2024

Fix Gemma2 4d attention mask (#31674)

8691867

Update modeling_gemma2.py Co-authored-by: Arthur <[email protected]>

PrimaLuz pushed a commit to PrimaLuz/LLaMA-Factory that referenced this pull request Jul 1, 2024

bf16 by default, gemma2 attns

d797ddf

Gemma2 finetuning cannot work until merging huggingface/transformers#31674

xtchen96 pushed a commit to xtchen96/LLaMA-Factory that referenced this pull request Jul 17, 2024

bf16 by default, gemma2 attns

e4388e8

Gemma2 finetuning cannot work until merging huggingface/transformers#31674

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Gemma2 4d attention mask #31674

Fix Gemma2 4d attention mask #31674

hiyouga commented Jun 27, 2024 •

edited

Loading

ArthurZucker left a comment

HuggingFaceDocBuilderDev commented Jun 28, 2024

ArthurZucker commented Jun 28, 2024

kkk935208447 commented Jul 2, 2024

Fix Gemma2 4d attention mask #31674

Fix Gemma2 4d attention mask #31674

Conversation

hiyouga commented Jun 27, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

ArthurZucker left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jun 28, 2024

ArthurZucker commented Jun 28, 2024

kkk935208447 commented Jul 2, 2024

hiyouga commented Jun 27, 2024 •

edited

Loading