Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support with vLLM #17

Open
Aniketto16 opened this issue Feb 1, 2024 · 4 comments
Open

Support with vLLM #17

Aniketto16 opened this issue Feb 1, 2024 · 4 comments

Comments

@Aniketto16
Copy link

Hello!
Thank you for your great work, its amazing how much hard work you put for this algorithm. I just had one question is it possible to integrate this with vLLM serving ?

This will really boost the inference time in limited resources setting once you cross the 8192 token mark, is there a way ? Thank you in advance for your help!!

@Mooler0410
Copy link
Collaborator

We are not very familiar with vLLM and its internal mechanism. We will check its compatibility with SelfExtend. Thanks for your suggestion!

@K-Mistele
Copy link

+1, would love to see this in vLLM!

@linchen111
Copy link

+1, would love to see this in vLLM too!

@WeixuanXiong
Copy link

+1, would love to see this in vLLM, since lots of online services are based on vllm! It will be so ideal if we can easily use self extend trick on our online service!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants