Generate: fix end to end compilation #32465

gante · 2024-08-06T11:33:52Z

What does this PR do?

As @zucchini-nlp noticed, RUN_SLOW=1 py.test tests/models/llama/test_modeling_llama.py::LlamaModelTest::test_generate_compile_fullgraph was failing on main

This PR fixes it.

Ran locally:

slow llama tests
slow cache tests [NOTE: the torch.export test is failing, and was failing since it was introduced.]

gante · 2024-08-06T11:34:21Z

src/transformers/cache_utils.py

+                self.register_buffer(f"key_cache_{idx}", torch.zeros(cache_shape, dtype=dtype, device=device))
+                self.register_buffer(f"value_cache_{idx}", torch.zeros(cache_shape, dtype=dtype, device=device))


self.register_buffer can't be compiled, but is needed for torch.export

gante · 2024-08-06T11:35:20Z

src/transformers/generation/utils.py

@@ -1477,7 +1479,7 @@ def _get_cache(self, cache_implementation: str, max_batch_size: int, max_cache_l
                "config": self.config,
                "max_batch_size": max_batch_size,
                "max_cache_len": max_cache_len,
-                "device": self.device,
+                "device": device,


I dropped this line in the end-to-end PR, in a rebase 😢 . self.device can't be called at compilation time

HuggingFaceDocBuilderDev · 2024-08-06T11:53:16Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Let's ge tthis merged asap! I don't see tests updates so I suppose everything was caught by our CI but slow CI not push ci right?

gante · 2024-08-06T14:06:41Z

Correct, was caught by the slow ci :)

gante added 2 commits August 6, 2024 10:53

tmp commit

a930834

working end to end comp

830ec53

gante requested a review from ArthurZucker August 6, 2024 11:33

gante commented Aug 6, 2024

View reviewed changes

ArthurZucker approved these changes Aug 6, 2024

View reviewed changes

gante merged commit 3d8bd11 into huggingface:main Aug 6, 2024
21 checks passed

gante deleted the fix_end_to_end_compilation branch August 6, 2024 14:07

nbroad1881 pushed a commit that referenced this pull request Aug 7, 2024

Generate: fix end to end compilation (#32465)

9ef3b04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate: fix end to end compilation #32465

Generate: fix end to end compilation #32465

gante commented Aug 6, 2024

gante Aug 6, 2024

gante Aug 6, 2024

HuggingFaceDocBuilderDev commented Aug 6, 2024

ArthurZucker left a comment

gante commented Aug 6, 2024

		self.register_buffer(f"key_cache_{idx}", torch.zeros(cache_shape, dtype=dtype, device=device))
		self.register_buffer(f"value_cache_{idx}", torch.zeros(cache_shape, dtype=dtype, device=device))

Generate: fix end to end compilation #32465

Generate: fix end to end compilation #32465

Conversation

gante commented Aug 6, 2024

What does this PR do?

gante Aug 6, 2024

Choose a reason for hiding this comment

gante Aug 6, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Aug 6, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

gante commented Aug 6, 2024