Generate: `assisted_decoding` now accepts arbitrary candidate generators #27750

gante · 2023-11-28T18:34:26Z

What does this PR do?

A common trend is starting to pop up: people are experimenting with new strategies to generate candidate sequences, to then run an assisted-generation-like strategy. A key example is the new technique in #27722, which is equal to assisted_decoding except for the candidate generation part. This technique in particular achieves nice speedups in some settings, and doesn't need an assistant model -- a model-free speedup!

To facilitate experimentation and the addition of new candidate generation techniques, this PR abstracts the candidate generation part in assisted_decoding to a new class with a stable API. This was inspired in classes like LogitsProcessor or StoppingCriteria -- components of generate that can easily be replaced. All these changes are backwards compatible! 🤗

Suggested review order:

utils.py, to see the shape of assisted_decoding under the abstracted API
candidate.py, to see the structure of the new base class (and the specific case of the original assisted generation)

The following tests are passing:

RUN_SLOW=1 py.test tests/models/whisper/ -k speculative
py.test tests/ -k test_assisted (which catches mixin and integration tests associated with assisted generation)

Happy to add more tests if needed :)

gante · 2023-11-28T18:40:39Z

src/transformers/generation/candidates.py

+                self.num_assistant_tokens = max(1.0, self.num_assistant_tokens - 1.0)
+
+
+def _crop_past_key_values(model, past_key_values, maximum_length):


Note: these functions were moved here to avoid circular imports

might be good to use the new cache format no?

soon! (we still need to maintain retrocompatibility)

gante · 2023-11-28T18:41:02Z

src/transformers/generation/utils.py

@@ -888,6 +895,29 @@ def _reorder_cache(self, past_key_values, beam_idx):
            f" enable beam search for {self.__class__}"
        )

+    def _get_candidate_generator(


This function will be expanded as we add more CandidateGenerator :)

so will the logic here be something like

check params in generation_config (some if else condition)

based on params, set candidate_generator

@apoorvumang exactly!

got it, so I'll write a PromptLookupCandidateGenerator that implements CandidateGenerator, and then wire it up in this function

So the plan is to have similar checks to the ones we have for the supported logits processor I guess?

@ArthurZucker precisely, we will have flags to control which candidate generation strategies we have in place. I suspect that, because some candidate generation strategies are so cheap (like the one proposed in #27722), assisted generation may become mainstream!

HuggingFaceDocBuilderDev · 2023-11-28T18:57:41Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

apoorvumang · 2023-11-29T16:21:27Z

One more place needs to change I think - generation_mode is currently set using _get_generation_mode , where this is the logic:

if assistant_model is not None:
            if generation_mode in ("greedy_search", "sample"):
                generation_mode = GenerationMode.ASSISTED_GENERATION
            else:
                raise ValueError(
                    "You've set `assistant_model`, which triggers assisted generate. Currently, assisted generate "
                    "is only supported with Greedy Search and Sample."
                )

how do u suggest this should change to support prompt lookup decoding? @gante

gante · 2023-11-30T11:28:44Z

@apoorvumang I'd add an or after if assistant_model is not None

ArthurZucker

🔥 would maybe rename the file with candiate_generators like we have logits_processor but otherwise great!

ArthurZucker · 2023-12-06T16:03:54Z

src/transformers/generation/utils.py

@@ -888,6 +895,29 @@ def _reorder_cache(self, past_key_values, beam_idx):
            f" enable beam search for {self.__class__}"
        )

+    def _get_candidate_generator(


So the plan is to have similar checks to the ones we have for the supported logits processor I guess?

src/transformers/generation/utils.py

src/transformers/generation/candidates.py

ArthurZucker · 2023-12-08T13:19:59Z

src/transformers/generation/candidates.py

+                self.num_assistant_tokens = max(1.0, self.num_assistant_tokens - 1.0)
+
+
+def _crop_past_key_values(model, past_key_values, maximum_length):


might be good to use the new cache format no?

Co-authored-by: Arthur <[email protected]>

…ors (huggingface#27750) Co-authored-by: Arthur <[email protected]>

gante mentioned this pull request Nov 28, 2023

Adding support for prompt lookup decoding (variant of assisted generation) #27722

Closed

gante requested a review from ArthurZucker November 28, 2023 18:37

gante commented Nov 28, 2023

View reviewed changes

ArthurZucker approved these changes Dec 8, 2023

View reviewed changes

gante and others added 6 commits December 11, 2023 19:57

MVP

b5289e2

fix ci

f676845

more ci

44d0844

remove redundant kwarg

be72667

Update src/transformers/generation/utils.py

2808cdf

Co-authored-by: Arthur <[email protected]>

rename file

a57367b

gante force-pushed the arbitrary_candidate_fn branch from 77a8b67 to a57367b Compare December 11, 2023 19:57

gante merged commit 4b759da into huggingface:main Dec 12, 2023
22 checks passed

gante deleted the arbitrary_candidate_fn branch December 12, 2023 09:26

This was referenced Dec 12, 2023

Adding Prompt lookup decoding #27775

Merged

Generate: speculative decoding #27979

Merged

iantbutler01 pushed a commit to BismuthCloud/transformers that referenced this pull request Dec 16, 2023

Generate: assisted_decoding now accepts arbitrary candidate generat…

f279f68

…ors (huggingface#27750) Co-authored-by: Arthur <[email protected]>

staghado pushed a commit to staghado/transformers that referenced this pull request Jan 15, 2024

Generate: assisted_decoding now accepts arbitrary candidate generat…

c2f52e6

…ors (huggingface#27750) Co-authored-by: Arthur <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate: `assisted_decoding` now accepts arbitrary candidate generators #27750

Generate: `assisted_decoding` now accepts arbitrary candidate generators #27750

gante commented Nov 28, 2023 •

edited

Loading

gante Nov 28, 2023

ArthurZucker Dec 8, 2023

gante Dec 11, 2023

gante Nov 28, 2023

apoorvumang Nov 29, 2023

gante Nov 29, 2023

apoorvumang Nov 29, 2023

ArthurZucker Dec 6, 2023

gante Dec 11, 2023

HuggingFaceDocBuilderDev commented Nov 28, 2023

apoorvumang commented Nov 29, 2023

gante commented Nov 30, 2023

ArthurZucker left a comment

ArthurZucker Dec 6, 2023

ArthurZucker Dec 8, 2023

		self.num_assistant_tokens = max(1.0, self.num_assistant_tokens - 1.0)


		def _crop_past_key_values(model, past_key_values, maximum_length):

Generate: assisted_decoding now accepts arbitrary candidate generators #27750

Generate: assisted_decoding now accepts arbitrary candidate generators #27750

Conversation

gante commented Nov 28, 2023 • edited Loading

What does this PR do?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Nov 28, 2023

apoorvumang commented Nov 29, 2023

gante commented Nov 30, 2023

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Generate: `assisted_decoding` now accepts arbitrary candidate generators #27750

Generate: `assisted_decoding` now accepts arbitrary candidate generators #27750

gante commented Nov 28, 2023 •

edited

Loading