To apply FlashAttention #203

dyanos · 2023-06-19T11:46:32Z

To apply FlashAttention

dyanos · 2023-06-19T12:03:35Z

To install

pip install flash-attn

To apply

import torch
from flash_attn.flash_attention import FlashMHA

# Replace this with your correct GPU device
device = "cuda:0"

# Create attention layer. This is similar to torch.nn.MultiheadAttention,
# and it includes the input and output linear layers
flash_mha = FlashMHA(
    embed_dim=128, # total channels (= num_heads * head_dim)
    num_heads=8, # number of heads
    device=device,
    dtype=torch.float16,
)

# Run forward pass with dummy data
x = torch.randn(
    (64, 256, 128), # (batch, seqlen, embed_dim)
    device=device,
    dtype=torch.float16
)

output = flash_mha(x)[0]

from flash_attn.flash_attention import FlashAttention

# Create the nn.Module
flash_attention = FlashAttention()

dyanos self-assigned this Jun 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

To apply FlashAttention #203

To apply FlashAttention #203

dyanos commented Jun 19, 2023

dyanos commented Jun 19, 2023 •

edited

Loading

To apply FlashAttention #203

To apply FlashAttention #203

Comments

dyanos commented Jun 19, 2023

dyanos commented Jun 19, 2023 • edited Loading

To install

To apply

dyanos commented Jun 19, 2023 •

edited

Loading