[Feature Request] Adding Eagle, Medusa, Look Ahead decoding ( improvements of Speculative decoding) #2791

HamidShojanazeri · 2024-02-06T20:51:15Z

Thanks for the great work team. I wonder if there is any plan to add new improvements to speculative decoding such as Eagle, Medusa, look ahead decoding. These could result in accumulative speed ups for VLLM.

cc: @WoosukKwon

simon-mo · 2024-02-06T22:29:41Z

Yes. The plan is here #2188

HamidShojanazeri · 2024-02-08T02:27:09Z

thanks for sharing @simon-mo that sounds great! I also wonder if newer methods also can improve on speculative decoding with removing the need for a draft model and we are exploring that path as well?

simon-mo · 2024-02-08T22:31:38Z

The speculative decoding framework is designed to support a wide range of draft model and draft model free algorithms. Once the immediate features are in place (by @cadedaniel), we welcome community's contribution for more methods!

cadedaniel · 2024-02-08T22:43:01Z

Correct! And yes, speculation methods without a draft model have benefits in both performance and usability. Unclear right now which specific approach will end up being the best but vLLM should support it.

caliber1313 · 2024-02-10T10:52:43Z

I would like to suggest Hydra in your project alongside with medusa. Please find hydra repository here: https://github.com/zankner/Hydra.

Thank you for your consideration

josephrocca · 2024-06-05T21:47:25Z

For those interested in some ranking data of the different methods, below is a copy-paste from a neat project by @hemingkx called Spec-Bench. The ranking when running 33B models is similar. Please see the linked repo for latest data - just pasting here for those who are skimming this thread.

Device: a single NVIDIA GeForce RTX 3090 GPU (24GB) with 12 CPU cores
Testing environment: Pytorch 2.0.1, under CUDA 11.8
Experimental Settings: Vicuna-7B-v1.3, greedy decoding, FP16 precision, batch size = 1

Models	Multi-turn Conversation	Translation	Summa-rization	Question Answering	Mathematical Reasoning	Retrieval-aug. Generation	#Mean Accepted Tokens	Overall
EAGLE🏅	2.44x	1.81x	2.13x	2.11x	2.54x	1.82x	3.57	2.16x
SpS🥈	1.98x	1.37x	2.00x	1.95x	1.89x	1.76x	2.29	1.83x
Hydra🥉	2.04x	1.67x	1.56x	1.81x	2.16x	1.48x	3.26	1.80x
PLD	1.57x	1.07x	2.31x	1.25x	1.62x	1.56x	1.74	1.55x
Medusa	1.60x	1.38x	1.28x	1.46x	1.64x	1.22x	2.32	1.44x
REST	1.49x	1.18x	1.21x	1.46x	1.35x	1.27x	1.63	1.32x
Lookahead	1.13x	0.97x	1.05x	1.07x	1.29x	0.98x	1.65	1.08x

github-actions · 2024-10-30T02:02:41Z

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

github-actions bot added the stale label Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Adding Eagle, Medusa, Look Ahead decoding ( improvements of Speculative decoding) #2791

[Feature Request] Adding Eagle, Medusa, Look Ahead decoding ( improvements of Speculative decoding) #2791

HamidShojanazeri commented Feb 6, 2024 •

edited

Loading

simon-mo commented Feb 6, 2024 •

edited

Loading

HamidShojanazeri commented Feb 8, 2024

simon-mo commented Feb 8, 2024

cadedaniel commented Feb 8, 2024

caliber1313 commented Feb 10, 2024

josephrocca commented Jun 5, 2024 •

edited

Loading

github-actions bot commented Oct 30, 2024

[Feature Request] Adding Eagle, Medusa, Look Ahead decoding ( improvements of Speculative decoding) #2791

[Feature Request] Adding Eagle, Medusa, Look Ahead decoding ( improvements of Speculative decoding) #2791

Comments

HamidShojanazeri commented Feb 6, 2024 • edited Loading

simon-mo commented Feb 6, 2024 • edited Loading

HamidShojanazeri commented Feb 8, 2024

simon-mo commented Feb 8, 2024

cadedaniel commented Feb 8, 2024

caliber1313 commented Feb 10, 2024

josephrocca commented Jun 5, 2024 • edited Loading

github-actions bot commented Oct 30, 2024

HamidShojanazeri commented Feb 6, 2024 •

edited

Loading

simon-mo commented Feb 6, 2024 •

edited

Loading

josephrocca commented Jun 5, 2024 •

edited

Loading