vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 4.5k
Star 29.9k

Code
Issues 1.8k
Pull requests 400
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: vllm-project/vllm

Labels 55 Milestones 0

New pull request New

400 Open 4,176 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[6/N] pass whole config to inner model

#10205 opened Nov 11, 2024 by youkaichao

Loading…

[v1][torch.compile] manage cudagraph buffer in compiler

#10203 opened Nov 10, 2024 by youkaichao

Loading…

[Bugfix] bitsandbytes models fail to run pipeline parallel

#10200 opened Nov 10, 2024 by HoangCongDuc

Loading…

[Bugfix][SpecDecode] apply sampling parameters to target probabilities for consistency in rejection sampling.

#10198 opened Nov 10, 2024 by jeongin601

Loading…

[Hardware][CPU] Add embedding models support for CPU backend ci/build x86 CPU

#10193 opened Nov 10, 2024 by Isotr0py • Draft

[Core] Add RunAI Model Streamer as optional loader. ci/build documentation

Improvements or additions to documentation

#10192 opened Nov 10, 2024 by omer-dayan

Loading…

[Not to be submitted] Adding logs for in sd+chunked-prefill

#10186 opened Nov 9, 2024 by sroy745 • Draft

[Model] Add support for Qwen2 for embeddings documentation

Improvements or additions to documentation

ready

ONLY add when PR is ready to merge/full CI is needed

#10184 opened Nov 9, 2024 by DarkLight1337

Loading…

Add docs on serving with Llama Stack documentation

Improvements or additions to documentation

#10183 opened Nov 9, 2024 by terrytangyuan

Loading…

[Bug]: When apply continue_final_message for OpenAI server, the "echo":false is ignored frontend

#10180 opened Nov 9, 2024 by chaunceyjiang

Loading…

[Frontend] Add per-request number of cached token stats frontend

#10174 opened Nov 9, 2024 by zifeitong

Loading…

[Docs] Misc updates to TPU installation instructions documentation

Improvements or additions to documentation

#10165 opened Nov 8, 2024 by mikegre-google

Loading…

[Bugfix][Frontend] Update Llama 3.2 Chat Template to support Vision and Non-Tool use

#10164 opened Nov 8, 2024 by tjohnson31415

Loading…

[Doc] Move PR template content to docs ci/build documentation

Improvements or additions to documentation

#10159 opened Nov 8, 2024 by russellb

Loading…

[Core][LoRA]Add LoRA for EncoderDecoderModelRunner needs-rebase

#10143 opened Nov 8, 2024 by jeejeelee • Draft

3 tasks

Fix missing data type in flashinfer prefill

#10141 opened Nov 8, 2024 by reyoung

Loading…

[Feature] [Spec decode]: Enable MLPSpeculator/Medusa and prompt_logprobs with ChunkedPrefill needs-rebase

#10132 opened Nov 7, 2024 by NickLucche • Draft

1 task

[Kernel]Enable HPU for Speculative Decoding

#10131 opened Nov 7, 2024 by xuechendi

Loading…

[Mistral] FP8 format needs-rebase

#10130 opened Nov 7, 2024 by patrickvonplaten • Draft

[WIP] Prefix Cache Aware Scheduling [1/n]

#10128 opened Nov 7, 2024 by rickyyx

Loading…

[V1][Bugfix] Propagate V1 LLMEngine properly ready

ONLY add when PR is ready to merge/full CI is needed

#10127 opened Nov 7, 2024 by comaniac

Loading…

[Core] Add padding-aware scheduling for 2D prefills

#10125 opened Nov 7, 2024 by kzawora-intel

Loading…

[V1] Allow piecewise cuda graphs to run with custom allreduce

#10121 opened Nov 7, 2024 by SageMoore • Draft

Fix quantization config of vl model

#10120 opened Nov 7, 2024 by jinzhen-lin

Loading…

[Hardware][CPU][torch.compile] integrate torch compile needs-rebase

#10113 opened Nov 7, 2024 by bigPYJ1151 • Draft

Previous 1 2 3 4 5 … 15 16 Next

Previous Next

ProTip! Filter pull requests by the default branch with base:main.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly