Fix: FA2 with packed training #32487

zucchini-nlp · 2024-08-07T05:30:22Z

What does this PR do?

Fixes the issue from #32241 (comment). The FA2 was taking packed path when we were trying to continue with filled cache and had more than 1 new token. This could happen in assisted decoding for ex.

I think instead if checking the position ids length and last elements, we can check that if they are arranged in increasing order. We can know for sure that packed sequences will not have all elements with increasing positions. Also, we can be sure that in inference we'll always have an increasing order if there's no attn mask provided. If attn mask is there, we'll not reach this check so should not be a problem

HuggingFaceDocBuilderDev · 2024-08-07T05:48:52Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

We probably need a patch for this no? Can you add a non slow tests as well? 🤗

ArthurZucker · 2024-08-07T10:53:30Z

src/transformers/modeling_flash_attention_utils.py

+    # If position_ids is provided and check all examples do not contain only 1 sequence, If tensor in increasing
+    # then we probably have one sequence, otherwise it is packed. Additionally check we are in pre-fill/training stage.
+    # Use `flash_attn_varlen_func` to prevent cross-example attention and also allow padding free approach
+    elif position_ids is not None and not (torch.diff(position_ids) >= 0).all() and query_length != 1:


can we explicit that we diff on the batch dim?

yes, will do so and add more tests!

ArthurZucker

Cool! Can you just run the slow test to make sur it's not skipped! 🤗

zucchini-nlp · 2024-08-09T05:42:01Z

Slow CI tests are failing for unrelated reasons: gated repo, OOM or failing even w/o this PR. I believe this can be merged, since many tests need to be fixed in general

ArthurZucker · 2024-08-12T08:32:21Z

Yep we need to patch with this let's merge! 🤗

zucchini-nlp · 2024-08-12T08:40:00Z

Yep, and #32527 for patch release plz. Can you approve it also?

* fix check * add tests * [run-slow] llama, gemma2 * oops, whisper actually runs but needed some special treatment

fix check

52bab19

zucchini-nlp requested a review from ArthurZucker August 7, 2024 05:30

ArthurZucker reviewed Aug 7, 2024

View reviewed changes

zucchini-nlp added 2 commits August 8, 2024 11:14

Merge remote-tracking branch 'upstream/main' into flash-attn-fixes

d96e4b0

add tests

bfe11ad

zucchini-nlp requested a review from ArthurZucker August 8, 2024 10:03

ArthurZucker approved these changes Aug 8, 2024

View reviewed changes

[run-slow] llama, gemma2

4b49c85

zucchini-nlp added the run-slow label Aug 8, 2024

oops, whisper actually runs but needed some special treatment

bdcf2cb

zucchini-nlp merged commit 8f2b6d5 into huggingface:main Aug 12, 2024
22 checks passed

ArthurZucker pushed a commit that referenced this pull request Aug 16, 2024

Fix: FA2 with packed training (#32487)

5674d8b

* fix check * add tests * [run-slow] llama, gemma2 * oops, whisper actually runs but needed some special treatment

ArthurZucker pushed a commit that referenced this pull request Aug 20, 2024

Fix: FA2 with packed training (#32487)

2716f0b

* fix check * add tests * [run-slow] llama, gemma2 * oops, whisper actually runs but needed some special treatment

ArthurZucker pushed a commit that referenced this pull request Aug 20, 2024

Fix: FA2 with packed training (#32487)

51741d7

* fix check * add tests * [run-slow] llama, gemma2 * oops, whisper actually runs but needed some special treatment

stevhliu pushed a commit to stevhliu/transformers that referenced this pull request Aug 21, 2024

Fix: FA2 with packed training (huggingface#32487)

69fc69d

* fix check * add tests * [run-slow] llama, gemma2 * oops, whisper actually runs but needed some special treatment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: FA2 with packed training #32487

Fix: FA2 with packed training #32487

zucchini-nlp commented Aug 7, 2024

HuggingFaceDocBuilderDev commented Aug 7, 2024

ArthurZucker left a comment

ArthurZucker Aug 7, 2024

zucchini-nlp Aug 7, 2024 •

edited

Loading

ArthurZucker left a comment

zucchini-nlp commented Aug 9, 2024

ArthurZucker commented Aug 12, 2024

zucchini-nlp commented Aug 12, 2024

Fix: FA2 with packed training #32487

Fix: FA2 with packed training #32487

Conversation

zucchini-nlp commented Aug 7, 2024

What does this PR do?

HuggingFaceDocBuilderDev commented Aug 7, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Aug 7, 2024

Choose a reason for hiding this comment

zucchini-nlp Aug 7, 2024 • edited Loading

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

zucchini-nlp commented Aug 9, 2024

ArthurZucker commented Aug 12, 2024

zucchini-nlp commented Aug 12, 2024

zucchini-nlp Aug 7, 2024 •

edited

Loading