You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello!
Thank you for your great work, its amazing how much hard work you put for this algorithm. I just had one question is it possible to integrate this with vLLM serving ?
This will really boost the inference time in limited resources setting once you cross the 8192 token mark, is there a way ? Thank you in advance for your help!!
The text was updated successfully, but these errors were encountered:
+1, would love to see this in vLLM, since lots of online services are based on vllm! It will be so ideal if we can easily use self extend trick on our online service!
Hello!
Thank you for your great work, its amazing how much hard work you put for this algorithm. I just had one question is it possible to integrate this with vLLM serving ?
This will really boost the inference time in limited resources setting once you cross the 8192 token mark, is there a way ? Thank you in advance for your help!!
The text was updated successfully, but these errors were encountered: