Issue with transformers 4.36 #1252

BenjaminBossan · 2023-12-11T14:55:02Z

It seems that transformers==4.36 breaks some of our tests. Locally, they pass with 4.35 but fail with 4.36.

One offending line is this:

peft/src/peft/tuners/adaption_prompt/utils.py

Line 76 in b08e6fa

seq_len += past_key_value[0].shape[-2]

It seems that past_key_values used to be a tuple of tensors, now it is a DynamicCache, whose __getitem__ returns a tuple of tensors. How can we rewrite the PEFT code in a backwards compatible fashion (i.e. it should also work with older transformers versions)?

Another error is this:

        if causal_4d_mask is not None:
>           expanded_attn_mask = causal_4d_mask.masked_fill(expanded_attn_mask.bool(), torch.finfo(dtype).min)
E           RuntimeError: The size of tensor a (14) must match the size of tensor b (17) at non-singleton dimension 3

If there is no easy fix, LMK and we'll have to pin the transformers version for now.

ping @tomaarsen @younesbelkada

HuggingFaceDocBuilderDev · 2023-12-11T14:58:40Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

tomaarsen · 2023-12-11T15:39:15Z

cc: @gante as well. I don't have the bandwidth to look into this today

BenjaminBossan · 2023-12-11T16:24:06Z

LMK if there is no bandwidth for this issue, then we should pin the transformers version for now, otherwise CI will remain broken.

gante · 2023-12-11T16:28:50Z

@BenjaminBossan

uhmmm... the catch is that the function is missing one input, the layer index. If layer_idx held the layer index, the following replacement would work:

if past_key_value is not None:
    if isinstance(past_key_value, tuple):
        seq_len += past_key_value[0].shape[-2]
    else:
        seq_len += past_key_value.get_seq_length(layer_idx)

gante · 2023-12-11T16:30:55Z

For context, the layer index is now held in the decoder layer and in the attention layer themselves, e.g. here

BenjaminBossan · 2023-12-11T16:56:04Z

Thanks @gante! I made the following change:

    if past_key_value is not None:
        if isinstance(past_key_value, tuple):
            seq_len += past_key_value[0].shape[-2]
        else:
            # since transformers 4.36, this is a DynamicCache instance
            seq_len += past_key_value.get_seq_length(model.layer_idx)

This makes the first batch of tests pass 🎉

Unfortunately, the size mismatch in to_4d remains. For example:

        if causal_4d_mask is not None:
>           expanded_attn_mask = causal_4d_mask.masked_fill(expanded_attn_mask.bool(), torch.finfo(dtype).min)
E           RuntimeError: The size of tensor a (14) must match the size of tensor b (17) at non-singleton dimension 3

../../../anaconda3/envs/peft/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py:136: RuntimeError

gante · 2023-12-11T16:57:42Z

@BenjaminBossan the second one is tricker :D would you be able to share a snippet so I can reproduce on my end?

BenjaminBossan · 2023-12-11T17:09:20Z

When checking out peft, this test fails:

pytest tests/ -k test_disable_adapter_45_test_HuggingFaceM4_tiny_random_LlamaForCausalLM_prompt_encoder

Possible this PEFT code is the issue, but I'm not sure:

peft/src/peft/peft_model.py

Lines 1162 to 1167 in e73967e

    
           if model_kwargs["past_key_values"] is None: 
        
               inputs_embeds = self.word_embeddings(model_kwargs["input_ids"]) 
        
               prompts = self.get_prompt(batch_size=model_kwargs["input_ids"].shape[0], task_ids=task_ids) 
        
               prompts = prompts.to(inputs_embeds.dtype) 
        
               model_kwargs["inputs_embeds"] = torch.cat((prompts, inputs_embeds), dim=1) 
        
               model_kwargs["input_ids"] = None

edit: Maybe not, I couldn't spot a difference here between transformers 4.35 and 4.36

edit2: When I comment out this line:

peft/src/peft/peft_model.py

Line 1122 in e73967e

    
           self.base_model.prepare_inputs_for_generation = self.prepare_inputs_for_generation

the error disappears (but a later assertion that PEFT changes the result fails, as expected). However, when I jump into the output of self.prepare_inputs_for_generation, I can't spot any difference between 4.35 and 4.36.

gante · 2023-12-11T18:02:44Z

@BenjaminBossan I think the attention mask in peft_model.py L1140 is being incorrectly constructed 🤔

The exception pops up because the input has a length of 17 (13 from the cache, 4 from the new input_ids) and the attention mask has a length of 14 (10 from peft_config.num_virtual_tokens + 4 from input_ids). In the previous generation round, we have an input of length 13 (10 from self.get_prompt() + 3 from input_ids converted into input_embeds), so all shapes seem to be correct except for the ad hoc attention mask.

If we replace

if model_kwargs.get("attention_mask", None) is not None:
  prefix_attention_mask = torch.ones(
      model_kwargs["input_ids"].shape[0], peft_config.num_virtual_tokens
  ).to(model_kwargs["input_ids"].device)
  model_kwargs["attention_mask"] = torch.cat(
      (prefix_attention_mask, model_kwargs["attention_mask"]), dim=1
  )

by

if model_kwargs.get("attention_mask", None) is not None:
  if model_kwargs["past_key_values"] is None:
      prefix_attention_mask = torch.ones(
          model_kwargs["input_ids"].shape[0], peft_config.num_virtual_tokens
      ).to(model_kwargs["input_ids"].device)
  else:
      prefix_attention_mask = torch.ones(
          model_kwargs["input_ids"].shape[0], model_kwargs["past_key_values"][0][0].shape[-2]
      ).to(model_kwargs["input_ids"].device)
  model_kwargs["attention_mask"] = torch.cat(
      (prefix_attention_mask, model_kwargs["attention_mask"]), dim=1
  )

then test pass because the attention mask is now of the expected shape. However, I'm not fully qualified to tell whether it is the right fix for PEFT :)

BenjaminBossan · 2023-12-11T18:47:39Z

Thanks for digging into this. Your fix indeed makes the test pass -- alas, only for transformers 4.36 and not for 4.35, even though the shapes are the same for both. I'll try to dig deeper tomorrow.

However, I'm not fully qualified to tell whether it is the right fix for PEFT :)

That part of the code is also not very familiar to me :D

Jaykumaran · 2023-12-12T06:17:09Z

ttributeError Traceback (most recent call last)
Cell In[9], line 6
4 from datasets import load_dataset
5 # from peft import LoraConfig, get_peft_model, PeftConfig, PeftModel, prepare_model_for_kbit_training
----> 6 from transformers import (AutoModelForCausalLM, AutoTokenizer,
7 BitsAndBytesConfig, HfArgumentParser,
8 TrainingArguments,GenerationConfig, logging, pipeline)
9 from trl import SFTTrainer

File /opt/conda/lib/python3.10/site-packages/transformers/init.py:26
23 from typing import TYPE_CHECKING
25 # Check the dependencies satisfy the minimal versions required.
---> 26 from . import dependency_versions_check
27 from .utils import (
28 OptionalDependencyNotAvailable,
29 _LazyModule,
(...)
46 logging,
47 )
50 logger = logging.get_logger(name) # pylint: disable=invalid-name

File /opt/conda/lib/python3.10/site-packages/transformers/dependency_versions_check.py:16
1 # Copyright 2020 The HuggingFace Team. All rights reserved.
2 #
3 # Licensed under the Apache License, Version 2.0 (the "License");
(...)
12 # See the License for the specific language governing permissions and
13 # limitations under the License.
15 from .dependency_versions_table import deps
---> 16 from .utils.versions import require_version, require_version_core
19 # define which module versions we always want to check at run time
20 # (usually the ones defined in install_requires in setup.py)
21 #
22 # order specific notes:
23 # - tqdm must be checked before tokenizers
25 pkgs_to_check_at_runtime = [
26 "python",
27 "tqdm",
(...)
37 "pyyaml",
38 ]

File /opt/conda/lib/python3.10/site-packages/transformers/utils/init.py:61
23 from .doc import (
24 add_code_sample_docstrings,
25 add_end_docstrings,
(...)
29 replace_return_docstrings,
30 )
31 from .generic import (
32 ContextManagers,
33 ExplicitEnum,
(...)
59 working_or_temp_dir,
60 )
---> 61 from .hub import (
62 CLOUDFRONT_DISTRIB_PREFIX,
63 DISABLE_TELEMETRY,
64 HF_MODULES_CACHE,
65 HUGGINGFACE_CO_PREFIX,
66 HUGGINGFACE_CO_RESOLVE_ENDPOINT,
67 PYTORCH_PRETRAINED_BERT_CACHE,
68 PYTORCH_TRANSFORMERS_CACHE,
69 S3_BUCKET_PREFIX,
70 TRANSFORMERS_CACHE,
71 TRANSFORMERS_DYNAMIC_MODULE_NAME,
72 EntryNotFoundError,
73 PushInProgress,
74 PushToHubMixin,
75 RepositoryNotFoundError,
76 RevisionNotFoundError,
77 cached_file,
78 default_cache_path,
79 define_sagemaker_information,
80 download_url,
81 extract_commit_hash,
82 get_cached_models,
83 get_file_from_repo,
84 has_file,
85 http_user_agent,
86 is_offline_mode,
87 is_remote_url,
88 move_cache,
89 send_example_telemetry,
90 try_to_load_from_cache,
91 )
92 from .import_utils import (
93 ENV_VARS_TRUE_AND_AUTO_VALUES,
94 ENV_VARS_TRUE_VALUES,
(...)
197 torch_required,
198 )
199 from .peft_utils import (
200 ADAPTER_CONFIG_NAME,
201 ADAPTER_SAFE_WEIGHTS_NAME,
(...)
204 find_adapter_config_file,
205 )

File /opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py:94
84 old_default_cache_path = os.path.join(torch_cache_home, "transformers")
86 # Determine default cache directory. Lots of legacy environment variables to ensure backward compatibility.
87 # The best way to set the cache path is with the environment variable HF_HOME. For more details, checkout this
88 # documentation page: https://huggingface.co/docs/huggingface_hub/package_reference/environment_variables.
(...)
92 #
93 # TODO: clean this for v5?
---> 94 PYTORCH_PRETRAINED_BERT_CACHE = os.getenv("PYTORCH_PRETRAINED_BERT_CACHE", constants.HF_HUB_CACHE)
95 PYTORCH_TRANSFORMERS_CACHE = os.getenv("PYTORCH_TRANSFORMERS_CACHE", PYTORCH_PRETRAINED_BERT_CACHE)
96 TRANSFORMERS_CACHE = os.getenv("TRANSFORMERS_CACHE", PYTORCH_TRANSFORMERS_CACHE)

AttributeError: module 'huggingface_hub.constants' has no attribute 'HF_HUB_CACHE'

import os

import torch
from datasets import load_dataset
from peft import LoraConfig, get_peft_model, PeftConfig, PeftModel, prepare_model_for_kbit_training
from transformers import (AutoModelForCausalLM, AutoTokenizer,
BitsAndBytesConfig, HfArgumentParser,
TrainingArguments,GenerationConfig, logging, pipeline)
from trl import SFTTrainer

!pip install trl transformers accelerate git+https://github.com/huggingface/peft.git -Uqqq
!pip install datasets bitsandbytes einops wandb -Uqqq

tomaarsen · 2023-12-12T07:51:09Z

@Jaykumaran That sounds like your huggingface_hub version is too low.

BenjaminBossan · 2023-12-12T12:15:16Z

@younesbelkada @tomaarsen @gante Please check if the fix/workaround is good. Ideally, I'd like not to hard-code the supported architectures, not sure if transformers provides a way to check that instead.

gante

A few nits, otherwise LGTM 👍

src/peft/peft_model.py

Co-authored-by: Joao Gante <[email protected]>

All model architectures should now support cache.

younesbelkada

Awesome investigation!

tomaarsen · 2023-12-12T14:40:19Z

Thanks @gante, @younesbelkada & @BenjaminBossan for looking into this!

See huggingface#1252 for more context. The initial idea was for transformers 4.37 to add the new caching to all architectures, but this was postponed to 4.38. The code needs to be adapted for prompt tuning not to break when transformers 4.37 is released.

pacman100 · 2024-01-12T07:56:41Z

After spending 2 hours in the trenches of generation.utils.GenerationMixin, cache_utils.DynamicCache, modeling_llama.LlamaForCausalLM and peft.PeftModel, the fix of this PR is incorrect and the change in the logic of the prepare_inputs_for_generation of models like LlamaForCausalLM to make DynamicCache work is the cause for it.

pacman100 · 2024-01-12T08:05:54Z

The highlighted part below is the cause of the issue here as the new logic removed it while not maintaining backward compatibility of using only the last input id for the next generation when past_key_value_length>=input_ids_length. The assumption 3 in the new changes should be reconsidered I think.

gante · 2024-01-12T09:30:34Z

After spending 2 hours in the trenches of generation.utils.GenerationMixin, cache_utils.DynamicCache, modeling_llama.LlamaForCausalLM and peft.PeftModel

@pacman100 I feel you 😢

I'm sorry for breaking the old default behavior, which was used here -- it was the only solution I could find to ensure all new generation methods worked correctly. The transformers codebase didn't have a case like this one (past length > inputs length AND we only want to use the latest token in the fwd pass), so I didn't consider it at all.

See #1252 and #1352 for more context. The initial idea was for transformers 4.37 to add the new caching to all architectures, but this was postponed to 4.38. The code needs to be adapted for prompt tuning not to break when transformers 4.37 is released.

Empty commit to check CI

d06fe11

BenjaminBossan changed the title ~~Empty commit to check CI~~ Issue with transformers 4.36 Dec 11, 2023

BenjaminBossan mentioned this pull request Dec 12, 2023

Fix: Multiple adapters with bnb layers #1243

Merged

BenjaminBossan and others added 3 commits December 12, 2023 11:19

Fix issues with transformers 4.36.0

9801947

push (#5)

fbaa646

Next attempt to fix the issue

7745d0b

BenjaminBossan marked this pull request as ready for review December 12, 2023 12:13

gante approved these changes Dec 12, 2023

View reviewed changes

src/peft/peft_model.py Outdated Show resolved Hide resolved

src/peft/peft_model.py Outdated Show resolved Hide resolved

src/peft/peft_model.py Outdated Show resolved Hide resolved

BenjaminBossan and others added 5 commits December 12, 2023 13:28

Apply suggestions from code review

634e75e

Co-authored-by: Joao Gante <[email protected]>

Add comment about next transformers version

599b460

Provisions for transformers 4.37

4e3e87d

All model architectures should now support cache.

Skip flaky test on MacOS

0bbf4e7

Skip another flaky test on MacOS

75d07fa

younesbelkada approved these changes Dec 12, 2023

View reviewed changes

Check if seeding + eval() removes flakiness

d3c3be8

BenjaminBossan merged commit ee6f6dc into huggingface:main Dec 12, 2023
14 checks passed

BenjaminBossan mentioned this pull request Dec 12, 2023

Release: 0.7.1 #1257

Merged

BenjaminBossan mentioned this pull request Jan 11, 2024

New transformers caching ETA now v4.38 #1348

Merged

pacman100 mentioned this pull request Jan 12, 2024

fix prepare_inputs_for_generation logic for Prompt Learning methods #1352

Merged

HZQ950419 mentioned this pull request Mar 28, 2024

AttributeError: 'tuple' object has no attribute 'update' AGI-Edgerunners/LLM-Adapters#60

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with transformers 4.36 #1252

Issue with transformers 4.36 #1252

BenjaminBossan commented Dec 11, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Dec 11, 2023

tomaarsen commented Dec 11, 2023

BenjaminBossan commented Dec 11, 2023

gante commented Dec 11, 2023

gante commented Dec 11, 2023

BenjaminBossan commented Dec 11, 2023

gante commented Dec 11, 2023

BenjaminBossan commented Dec 11, 2023 •

edited

Loading

gante commented Dec 11, 2023 •

edited

Loading

BenjaminBossan commented Dec 11, 2023

Jaykumaran commented Dec 12, 2023

tomaarsen commented Dec 12, 2023

BenjaminBossan commented Dec 12, 2023

gante left a comment

younesbelkada left a comment

tomaarsen commented Dec 12, 2023

pacman100 commented Jan 12, 2024

pacman100 commented Jan 12, 2024 •

edited

Loading

gante commented Jan 12, 2024

Issue with transformers 4.36 #1252

Issue with transformers 4.36 #1252

Conversation

BenjaminBossan commented Dec 11, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Dec 11, 2023

tomaarsen commented Dec 11, 2023

BenjaminBossan commented Dec 11, 2023

gante commented Dec 11, 2023

gante commented Dec 11, 2023

BenjaminBossan commented Dec 11, 2023

gante commented Dec 11, 2023

BenjaminBossan commented Dec 11, 2023 • edited Loading

gante commented Dec 11, 2023 • edited Loading

BenjaminBossan commented Dec 11, 2023

Jaykumaran commented Dec 12, 2023

tomaarsen commented Dec 12, 2023

BenjaminBossan commented Dec 12, 2023

gante left a comment

Choose a reason for hiding this comment

younesbelkada left a comment

Choose a reason for hiding this comment

tomaarsen commented Dec 12, 2023

pacman100 commented Jan 12, 2024

pacman100 commented Jan 12, 2024 • edited Loading

gante commented Jan 12, 2024

BenjaminBossan commented Dec 11, 2023 •

edited

Loading

BenjaminBossan commented Dec 11, 2023 •

edited

Loading

gante commented Dec 11, 2023 •

edited

Loading

pacman100 commented Jan 12, 2024 •

edited

Loading