Add Flash Attention 2 support to Bark #27364

ylacombe · 2023-11-08T13:14:41Z

What does this PR do?

Following a recent series of PRs and issues to improve Bark, this PR aims to add FA2 support to Bark. Bark self-attention class supports both causal and non-causal attention but otherwise changes are minimal.

I've also taken the opportunity to switch to _prepare_4d_attention_mask instead of manually creating the 4d attention mask.

Benchmarks are currently running at the moment to measure speed/memory gains!

cc @sanchit-gandhi and @amyeroberts

HuggingFaceDocBuilderDev · 2023-11-08T13:35:35Z

The documentation is not available anymore as the PR was closed or merged.

sanchit-gandhi

Very clean - thanks @ylacombe for adding this! Keen to see what kind of performance gain we get from this

sanchit-gandhi · 2023-11-08T13:32:26Z

src/transformers/models/bark/modeling_bark.py

+        cls, config, torch_dtype: Optional[torch.dtype] = None, device_map: Optional[Union[str, Dict[str, int]]] = None
+    ):
+        """
+        If you don't know about Flash Attention, check out the official repository of flash attention:


Could be worth explaining quickly why we override this method in the docstring!

tests/models/bark/test_modeling_bark.py

sanchit-gandhi · 2023-11-08T13:34:47Z

tests/models/bark/test_modeling_bark.py

+
+                dummy_attention_mask = inputs_dict.get("attention_mask", None)
+
+                if dummy_attention_mask is not None:


What's the motivation behind overriding the attention mask here?

Making sure that at least one of the input ids is masked !

sanchit-gandhi · 2023-11-08T13:35:04Z

tests/models/bark/test_modeling_bark.py

+
+                logits = (
+                    outputs.hidden_states[-1]
+                    if not model.config.is_encoder_decoder


We know bark is not encoder-decoder -> could we simplify the tests to reflect this?

nice catch!

sanchit-gandhi · 2023-11-08T13:35:49Z

tests/models/bark/test_modeling_bark.py

+                    else outputs_fa.decoder_hidden_states[-1]
+                )
+
+                assert torch.allclose(logits_fa, logits, atol=4e-2, rtol=4e-2)


Pretty high tolerance! We've compared the audio outputs qualitatively with / without flash attention and they match?

Thanks for the quick review, I've actually copied out and modified a test that is in the general suite, so I haven't change anything -> tolerance and attention mask overriding are the same than the original test!

I had the same comment on tolerance for FA2 tests :D 0.04 was agreed as being acceptable

tests/models/bark/test_modeling_bark.py

Co-authored-by: Sanchit Gandhi <[email protected]>

…ers into bark-flashattention-2

ylacombe · 2023-11-08T14:06:44Z

Thanks for the quick review, I've addressed your comments 🤗

amyeroberts

Very nice - thanks for adding!

amyeroberts · 2023-11-08T15:59:58Z

tests/models/bark/test_modeling_bark.py

+                    else outputs_fa.decoder_hidden_states[-1]
+                )
+
+                assert torch.allclose(logits_fa, logits, atol=4e-2, rtol=4e-2)


I had the same comment on tolerance for FA2 tests :D 0.04 was agreed as being acceptable

tests/models/bark/test_modeling_bark.py

Co-authored-by: amyeroberts <[email protected]>

ylacombe · 2023-11-08T17:06:30Z

Merging ! thanks for the quick reviews!

ArthurZucker

Thanks for adding this! I usually also request to add a section in the readme, and update the flash attention list of models that are supported here and the readme like this change.

ArthurZucker · 2023-11-09T12:19:33Z

src/transformers/models/bark/modeling_bark.py

+        else:
+            present = None
+
+        attn_output = self._flash_attention_forward(query, key, value, attention_mask, query_len, dropout=self.dropout)


Here self.dropout is a module not a float. The doc of the _flash_attention_forward does not match and is not restrictive enough

It might work but I'd rather we standardize!

It's actually a float here: https://github.com/ylacombe/transformers/blob/3258ff93304078b9e27d752e6c19d3813f664855/src/transformers/models/bark/modeling_bark.py#L93-L94 !

* change handmade attention mask to _prepare_4d_attention_mask * add flashattention2 support in Bark * add flashattention2 tests on BarkSemanticModel * make style * fix flashattention and tests + make style * fix memory leak and allow Bark to pass flash attention to sub-models * make style * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <[email protected]> * remove unecessary code from tests + justify overriding * Update tests/models/bark/test_modeling_bark.py Co-authored-by: amyeroberts <[email protected]> * make style --------- Co-authored-by: Sanchit Gandhi <[email protected]> Co-authored-by: amyeroberts <[email protected]>

ylacombe added 7 commits November 8, 2023 08:57

change handmade attention mask to _prepare_4d_attention_mask

0eadfab

add flashattention2 support in Bark

0fcff47

add flashattention2 tests on BarkSemanticModel

7ca710c

make style

10d81ba

fix flashattention and tests + make style

32fb57d

fix memory leak and allow Bark to pass flash attention to sub-models

c2ff5f4

make style

ef106a4

sanchit-gandhi approved these changes Nov 8, 2023

View reviewed changes

ylacombe and others added 3 commits November 8, 2023 13:56

Apply suggestions from code review

425d41d

Co-authored-by: Sanchit Gandhi <[email protected]>

remove unecessary code from tests + justify overriding

049c2e9

Merge branch 'bark-flashattention-2' of github.com:ylacombe/transform…

c6a34cf

…ers into bark-flashattention-2

amyeroberts approved these changes Nov 8, 2023

View reviewed changes

ylacombe and others added 2 commits November 8, 2023 16:05

Update tests/models/bark/test_modeling_bark.py

653fa13

Co-authored-by: amyeroberts <[email protected]>

make style

5f76f13

ylacombe merged commit a5bee89 into huggingface:main Nov 8, 2023
18 checks passed

ArthurZucker reviewed Nov 9, 2023

View reviewed changes

ylacombe mentioned this pull request Nov 9, 2023

update Bark FA2 docs #27400

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Flash Attention 2 support to Bark #27364

Add Flash Attention 2 support to Bark #27364

ylacombe commented Nov 8, 2023

HuggingFaceDocBuilderDev commented Nov 8, 2023 •

edited

Loading

sanchit-gandhi left a comment

sanchit-gandhi Nov 8, 2023

sanchit-gandhi Nov 8, 2023

ylacombe Nov 8, 2023

sanchit-gandhi Nov 8, 2023

ylacombe Nov 8, 2023

sanchit-gandhi Nov 8, 2023

ylacombe Nov 8, 2023

amyeroberts Nov 8, 2023

ylacombe commented Nov 8, 2023

amyeroberts left a comment

amyeroberts Nov 8, 2023

ylacombe commented Nov 8, 2023

ArthurZucker left a comment

ArthurZucker Nov 9, 2023

ArthurZucker Nov 9, 2023

ylacombe Nov 9, 2023


		dummy_attention_mask = inputs_dict.get("attention_mask", None)

		if dummy_attention_mask is not None:

Add Flash Attention 2 support to Bark #27364

Add Flash Attention 2 support to Bark #27364

Conversation

ylacombe commented Nov 8, 2023

What does this PR do?

HuggingFaceDocBuilderDev commented Nov 8, 2023 • edited Loading

sanchit-gandhi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ylacombe commented Nov 8, 2023

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ylacombe commented Nov 8, 2023

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Nov 8, 2023 •

edited

Loading