Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of SDPA for Microsoft Phi-3 Mini #31863

Closed
2 of 4 tasks
Dev4011 opened this issue Jul 9, 2024 · 7 comments · Fixed by #32457
Closed
2 of 4 tasks

Implementation of SDPA for Microsoft Phi-3 Mini #31863

Dev4011 opened this issue Jul 9, 2024 · 7 comments · Fixed by #32457
Labels
Feature request Request for a new feature SDPA

Comments

@Dev4011
Copy link

Dev4011 commented Jul 9, 2024

System Info

  • transformers version: 4.42.3
  • Platform: Linux-6.1.85+-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Huggingface_hub version: 0.23.4
  • Safetensors version: 0.4.3
  • Accelerate version: 0.32.1
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.3.0+cu121 (True)
  • Tensorflow version (GPU?): 2.15.0 (True)
  • Flax version (CPU?/GPU?/TPU?): 0.8.4 (gpu)
  • Jax version: 0.4.26
  • JaxLib version: 0.4.26
  • Using distributed or parallel set-up in script?: Yes
  • Using GPU in script?: Yes
  • GPU type: Tesla T4

Who can help?

@ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I am referring to the below tutorial to finetune Microsoft Phi3 on my custom dataset: https://github.com/microsoft/Phi-3CookBook/blob/main/code/04.Finetuning/Phi-3-finetune-lora-python.ipynb

As I am doing it on Colab on T4 GPU, the Flash Attention is not supported yet [FlashAttention only supports Ampere GPUs or newer.]

Thus, according to the below code from tutorial, attention_implementation is selected as 'sdpa' with compute datatype as torch.float16

if torch.cuda.is_bf16_supported():
compute_dtype = torch.bfloat16
attn_implementation = 'flash_attention_2'
else:
compute_dtype = torch.float16
attn_implementation = 'sdpa'

Loading Model

model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype=compute_dtype, trust_remote_code=True, device_map='auto',
attn_implementation=attn_implementation
)

Error

It gives me the error: ValueError: Phi3ForCausalLM does not support an attention implementation through torch.nn.functional.scaled_dot_product_attention yet. Please request the support for this architecture: #28005.

and keeping attention_implementation='eager' leads to CUDA Out of Memory error.

Expected behavior

SDPA should be supported as an Attention Implementation for Microsoft Phi3 model

@amyeroberts amyeroberts added Feature request Request for a new feature SDPA labels Jul 9, 2024
@ArthurZucker ArthurZucker added the Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! label Aug 3, 2024
@ArthurZucker
Copy link
Collaborator

One thing to not is that Phi3 uses trust_remote_code=True, meaning not sure it's compatible with transformers.
AFAIK it is, se removing the trust_remote_code=True, should work

@ArthurZucker ArthurZucker removed the Good Second Issue Issues that are more difficult to do than "Good First" issues - give it a try if you want! label Aug 3, 2024
@pocca2048
Copy link
Contributor

I think this can be changed to True?


I confirmed that model can be loaded properly after changing that to True.

@ArthurZucker
Copy link
Collaborator

Yep, can you open a PR? 🤗

@pocca2048
Copy link
Contributor

@ArthurZucker Opened a PR!

@bjzhb666
Copy link

Hi, I changed the _supports_sdpa to True, but I still get this error.

I am using V100. transformers version 4.42.4. How should I change the code? Thanks

@bjzhb666
Copy link

Oh, I set the trust_remote_code=False and set the _supports_sdpa to True. It works now.

@ArthurZucker
Copy link
Collaborator

Thanks for updating!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Request for a new feature SDPA
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants