Implementation of SDPA for Microsoft Phi-3 Mini #31863

Dev4011 · 2024-07-09T12:32:32Z

System Info

transformers version: 4.42.3
Platform: Linux-6.1.85+-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.23.4
Safetensors version: 0.4.3
Accelerate version: 0.32.1
Accelerate config: not found
PyTorch version (GPU?): 2.3.0+cu121 (True)
Tensorflow version (GPU?): 2.15.0 (True)
Flax version (CPU?/GPU?/TPU?): 0.8.4 (gpu)
Jax version: 0.4.26
JaxLib version: 0.4.26
Using distributed or parallel set-up in script?: Yes
Using GPU in script?: Yes
GPU type: Tesla T4

Who can help?

@ArthurZucker

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

I am referring to the below tutorial to finetune Microsoft Phi3 on my custom dataset: https://github.com/microsoft/Phi-3CookBook/blob/main/code/04.Finetuning/Phi-3-finetune-lora-python.ipynb

As I am doing it on Colab on T4 GPU, the Flash Attention is not supported yet [FlashAttention only supports Ampere GPUs or newer.]

Thus, according to the below code from tutorial, attention_implementation is selected as 'sdpa' with compute datatype as torch.float16

if torch.cuda.is_bf16_supported():
compute_dtype = torch.bfloat16
attn_implementation = 'flash_attention_2'
else:
compute_dtype = torch.float16
attn_implementation = 'sdpa'

Loading Model

model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype=compute_dtype, trust_remote_code=True, device_map='auto',
attn_implementation=attn_implementation
)

Error

It gives me the error: ValueError: Phi3ForCausalLM does not support an attention implementation through torch.nn.functional.scaled_dot_product_attention yet. Please request the support for this architecture: #28005.

and keeping attention_implementation='eager' leads to CUDA Out of Memory error.

Expected behavior

SDPA should be supported as an Attention Implementation for Microsoft Phi3 model

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2024-08-03T09:55:48Z

One thing to not is that Phi3 uses trust_remote_code=True, meaning not sure it's compatible with transformers.
AFAIK it is, se removing the trust_remote_code=True, should work

pocca2048 · 2024-08-05T07:44:28Z

I think this can be changed to True?

transformers/src/transformers/models/phi3/modeling_phi3.py

Line 844 in 3d7c2f9

_supports_sdpa = False

I confirmed that model can be loaded properly after changing that to True.

ArthurZucker · 2024-08-05T07:47:51Z

Yep, can you open a PR? 🤗

pocca2048 · 2024-08-06T08:20:43Z

@ArthurZucker Opened a PR!

bjzhb666 · 2024-08-23T07:09:26Z

Hi, I changed the _supports_sdpa to True, but I still get this error.

I am using V100. transformers version 4.42.4. How should I change the code? Thanks

bjzhb666 · 2024-08-23T07:42:35Z

Oh, I set the trust_remote_code=False and set the _supports_sdpa to True. It works now.

ArthurZucker · 2024-08-27T12:10:11Z

Thanks for updating!

amyeroberts added Feature request Request for a new feature SDPA labels Jul 9, 2024

ArthurZucker added the Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! label Aug 3, 2024

ArthurZucker removed the Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! label Aug 3, 2024

pocca2048 mentioned this issue Aug 6, 2024

Change Phi3 _supports_sdpa to True #32457

Merged

5 tasks

ArthurZucker closed this as completed in #32457 Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of SDPA for Microsoft Phi-3 Mini #31863

Implementation of SDPA for Microsoft Phi-3 Mini #31863

Dev4011 commented Jul 9, 2024 •

edited

Loading

ArthurZucker commented Aug 3, 2024

pocca2048 commented Aug 5, 2024

ArthurZucker commented Aug 5, 2024

pocca2048 commented Aug 6, 2024

bjzhb666 commented Aug 23, 2024

bjzhb666 commented Aug 23, 2024

ArthurZucker commented Aug 27, 2024

Implementation of SDPA for Microsoft Phi-3 Mini #31863

Implementation of SDPA for Microsoft Phi-3 Mini #31863

Comments

Dev4011 commented Jul 9, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Loading Model

Error

Expected behavior

ArthurZucker commented Aug 3, 2024

pocca2048 commented Aug 5, 2024

ArthurZucker commented Aug 5, 2024

pocca2048 commented Aug 6, 2024

bjzhb666 commented Aug 23, 2024

bjzhb666 commented Aug 23, 2024

ArthurZucker commented Aug 27, 2024

Dev4011 commented Jul 9, 2024 •

edited

Loading