support copies #32159

ArthurZucker · 2024-07-23T14:50:13Z

What does this PR do?

We can't copy the cache 😢 inheriting from module fixes this easily
This renders us unable to re-use prompts / system prompt like this:

import os, torch, copy
from transformers import AutoModelForCausalLM, AutoTokenizer, DynamicCache
device = "cuda"
ckpt = "path-to-ckpt"
INITIAL_PROMPT = "From now on, you are going to answer all my questions with historical details. Make sure to always add a bit of french here and there, for style."

model = AutoModelForCausalLM.from_pretrained(ckpt, torch_dtype=torch.float16)
model.to(device)

tokenizer = AutoTokenizer.from_pretrained(ckpt)

prompt_cache = DynamicCache()
inputs = tokenizer(INITIAL_PROMPT, return_tensors="pt").to("cuda")
prompt_cache = model(**inputs, past_key_values = prompt_cache).past_key_values


prompt = "Why are french people obsessed with french?"
new_inputs = tokenizer(INITIAL_PROMPT + prompt, return_tensors="pt").to("cuda")
past_key_values = copy.deepcopy(prompt_cache)
outputs = model.generate(**new_inputs, past_key_values=past_key_values,max_new_tokens=20) 
response = tokenizer.batch_decode(outputs)[0]
print(response)
"""
"""

prompt = "What is the best city to swim in?"
new_inputs = tokenizer(INITIAL_PROMPT + prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**new_inputs, past_key_values=copy.deepcopy(prompt_cache),max_new_tokens=20) 
response = tokenizer.batch_decode(outputs)[0]
print(response)
"""

amyeroberts · 2024-07-23T14:53:08Z

We can't copy the cache 😢

What kind of copying are we talking about here? Like cache.copy?

gante · 2024-07-23T14:55:29Z

@amyeroberts copy.deepcopy

On main, without the fix, we get

RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment.  If you were attempting to deepcopy a module, this may be because of a torch.nn.utils.weight_norm usage, see https://github.com/pytorch/pytorch/pull/103001

Cache copying is needed to reuse the cache from the prompt. E.g. to run new prompts on top of the system prompt without spending compute on the system prompt.

HuggingFaceDocBuilderDev · 2024-07-23T15:38:44Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

vladfaust · 2024-08-06T08:49:54Z

I'm sorry if it's not the right place to ask this question, but.

In Llama.cpp it's trivial to save and load state to/from disk to maintain the cache between sessions. Is it currently possible with Transformers, and if yes, could you please provide a minimal example or point to docs?

Cheers,

gante · 2024-08-06T10:34:53Z

@vladfaust yes it is possible, but it requires custom code (i.e. you would need to store and restore the cache's tensors).

We will add a user-friendly API for that in the future :)

nirbenda

LGTM

ArthurZucker · 2024-08-06T15:19:00Z

Ps this was actually already merged in #32168 so I'll close this one!

ArthurZucker added 4 commits July 23, 2024 17:14

support copies

fd39b17

fix style and init

1d99fc9

fixup

93b2ee9

properly call super init

80bb8fb

ArthurZucker force-pushed the support-copy branch from da262b0 to 80bb8fb Compare July 23, 2024 15:14

ArthurZucker marked this pull request as ready for review July 23, 2024 15:15

ArthurZucker mentioned this pull request Jul 23, 2024

[wip][meta-llama][torch.compile] Fix issues with torch.compile #32102

Draft

5 tasks

ArthurZucker added the run-slow label Jul 23, 2024

ArthurZucker added 2 commits July 23, 2024 17:15

[run-slow]llama

075f25f

[run-slow] llama

78f1c17

ArthurZucker mentioned this pull request Jul 26, 2024

Cache: create docs #32150

Merged

nirbenda approved these changes Aug 6, 2024

View reviewed changes

ArthurZucker closed this Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support copies #32159

support copies #32159

ArthurZucker commented Jul 23, 2024 •

edited

Loading

amyeroberts commented Jul 23, 2024

gante commented Jul 23, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 23, 2024

vladfaust commented Aug 6, 2024

gante commented Aug 6, 2024

nirbenda left a comment

ArthurZucker commented Aug 6, 2024

support copies #32159

support copies #32159

Conversation

ArthurZucker commented Jul 23, 2024 • edited Loading

What does this PR do?

amyeroberts commented Jul 23, 2024

gante commented Jul 23, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Jul 23, 2024

vladfaust commented Aug 6, 2024

gante commented Aug 6, 2024

nirbenda left a comment

Choose a reason for hiding this comment

ArthurZucker commented Aug 6, 2024

ArthurZucker commented Jul 23, 2024 •

edited

Loading

gante commented Jul 23, 2024 •

edited

Loading