Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper page - Accelerating LLM Inference with Staged Speculative Decoding #495

Open
1 task
irthomasthomas opened this issue Feb 1, 2024 · 0 comments
Open
1 task
Labels
Algorithms Sorting, Learning or Classifying. All algorithms go here. llm Large Language Models llm-experiments experiments with large language models llm-serving-optimisations Tips, tricks and tools to speedup inference of large language models MachineLearning ML Models, Training and Inference Papers Research papers Research personal research notes for a topic TIL Short notes or tips on coding, linux, llms, ml, etc

Comments

@irthomasthomas
Copy link
Owner

Paper Page - Accelerating LLM Inference with Staged Speculative Decoding

Published on Aug 9, 2023 | Featured in Daily Papers on Aug 10, 2023

Authors: Benjamin Spector, Chris Re


Abstract

Recent advances with large language models (LLM) have highlighted their diverse capabilities. This paper proposes a novel algorithm, staged speculative decoding, to accelerate LLM inference in small-batch, on-device scenarios. We address the low arithmetic intensity of small-batch inference by improving upon previous work in speculative decoding. The algorithm restructures the speculative batch as a tree, reducing generation costs and increasing the expected tokens per batch. Additionally, it introduces a second stage of speculative decoding, further decreasing single-batch decoding latency by 3.16x with a 762M parameter GPT-2-L model, all while perfectly preserving output quality.

Read the Paper »


New Paper Card

Suggested labels

{ "label-name": "Algorithm", "description": "Staged speculative decoding algorithm for LLM inference acceleration", "confidence": 91.15 }

@irthomasthomas irthomasthomas added llm Large Language Models MachineLearning ML Models, Training and Inference New-Label Choose this option if the existing labels are insufficient to describe the content accurately Papers Research papers Research personal research notes for a topic TIL Short notes or tips on coding, linux, llms, ml, etc llm-experiments experiments with large language models Algorithms Sorting, Learning or Classifying. All algorithms go here. llm-serving-optimisations Tips, tricks and tools to speedup inference of large language models and removed New-Label Choose this option if the existing labels are insufficient to describe the content accurately labels Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algorithms Sorting, Learning or Classifying. All algorithms go here. llm Large Language Models llm-experiments experiments with large language models llm-serving-optimisations Tips, tricks and tools to speedup inference of large language models MachineLearning ML Models, Training and Inference Papers Research papers Research personal research notes for a topic TIL Short notes or tips on coding, linux, llms, ml, etc
Projects
None yet
Development

No branches or pull requests

1 participant