Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trainer doesn't shift labels for CAUSAL_LM PEFT models with label smoothing enabled #27161

Closed
4 tasks
kkteru opened this issue Oct 30, 2023 · 3 comments · Fixed by #27162
Closed
4 tasks

Trainer doesn't shift labels for CAUSAL_LM PEFT models with label smoothing enabled #27161

kkteru opened this issue Oct 30, 2023 · 3 comments · Fixed by #27162

Comments

@kkteru
Copy link
Contributor

kkteru commented Oct 30, 2023

System Info

  • transformers version: 4.35.0.dev0
  • Platform: Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Huggingface_hub version: 0.16.4
  • Safetensors version: 0.3.2
  • Accelerate version: 0.24.1
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.0.1+cu118 (True)
  • Tensorflow version (GPU?): 2.14.0 (True)
  • Flax version (CPU?/GPU?/TPU?): 0.7.0 (cpu)
  • Jax version: 0.4.13
  • JaxLib version: 0.4.13
  • Using GPU in script?: RTX 3090
  • Using distributed or parallel set-up in script?: No

Who can help?

@pacman100, @muellerzr

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I am not sure how to articulate this silent bug with a code snippet. I will try to explain it referring to the code and hopefully it will be clear.

The compute_loss function in the Trainer class gets the model_name from model.base_model when the model is a PeftModel as written here. The base_model of a PeftModel is defined here as one of PEFT_TYPE_TO_MODEL_MAPPING. The 'true' base model is actually stored at base_model.model as declared here.

The issue is--in this line of compute_loss method, the check to shift labels is done by seeing if model_name is inside the MODEL_FOR_CAUSAL_LM_MAPPING_NAMES list. Since the model_name for a PeftModel isn't in that list, the labels aren't shifted.

From what I can tell, a simple fix without breaking anything else could be to modify this line to:

model_name = unwrap_model(model.base_model.model)._get_name()

Expected behavior

The labels should be shifted for causal language modelling tasks even when using peft models.

@kkteru
Copy link
Contributor Author

kkteru commented Oct 30, 2023

Created a PR for this quick fix, if that helps.

@amyeroberts
Copy link
Collaborator

cc @younesbelkada

@younesbelkada
Copy link
Contributor

Thanks for the deepdive! I will reply you on the PR itself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants