-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Adding Eagle, Medusa, Look Ahead decoding ( improvements of Speculative decoding) #2791
Comments
Yes. The plan is here #2188 |
thanks for sharing @simon-mo that sounds great! I also wonder if newer methods also can improve on speculative decoding with removing the need for a draft model and we are exploring that path as well? |
The speculative decoding framework is designed to support a wide range of draft model and draft model free algorithms. Once the immediate features are in place (by @cadedaniel), we welcome community's contribution for more methods! |
Correct! And yes, speculation methods without a draft model have benefits in both performance and usability. Unclear right now which specific approach will end up being the best but vLLM should support it. |
I would like to suggest Hydra in your project alongside with medusa. Please find hydra repository here: https://github.com/zankner/Hydra. Thank you for your consideration |
For those interested in some ranking data of the different methods, below is a copy-paste from a neat project by @hemingkx called Spec-Bench. The ranking when running 33B models is similar. Please see the linked repo for latest data - just pasting here for those who are skimming this thread.
|
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
Thanks for the great work team. I wonder if there is any plan to add new improvements to speculative decoding such as Eagle, Medusa, look ahead decoding. These could result in accumulative speed ups for VLLM.
cc: @WoosukKwon
The text was updated successfully, but these errors were encountered: