-
Notifications
You must be signed in to change notification settings - Fork 988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds streaming option to generate #1424
Conversation
Is this what you were looking for @williamFalcon ? |
nice! can you test it with this Studio |
It should work now @williamFalcon . I modeled the adjusted |
Many thanks for the feedback @aniketmaurya ! Does the current code look better? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not import and use the chat.py generation implementation which already implements streaming? Instead of modifying the generate.py file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! added few comments.
Also please take a latest pull of LitServe, last week we merged some PR which affects streaming.
@carmocca That's a good question, I am not sure why I was so fixated on Thanks for the feedback, @aniketmaurya !
I could try that internally for testing but we'd probably have to wait until the next release. But could you ping me when the latest LitServe comes out so that I can bump the version in LitGPT? |
Thanks for all your help with LitServe so far @aniketmaurya . It's probably something stupid on my part, but with the recent LitServe version from GitHub main somehow it's still not working right I.e. with
The
whereas
I think it's something in my LitServe implementation because the If you have some time to take another look, that'd be really appreciated. |
Hi @rasbt , we will have to iterate through the response in streaming mode like: import requests
url = "http://127.0.0.1:8000/predict"
resp = requests.post(url, json={"prompt": "Fix typos in the following sentence: Exampel input"}, headers=None,
stream=True)
for line in resp.iter_lines():
if line:
print(line.decode("utf-8")) when we don't iterate and use response.content, requests will concat everything in a bytestring. |
That makes sense! Thanks for your patience here @aniketmaurya , I know almost nothing about web stuff and serving. The code you shared does make sense, but yeah, it's still only generating a single token. Something must still be incorrect somewhere. However, I also wonder if there's maybe a LitServe issue. I tried a simpler example: import litserve as ls
import json
class SimpleStreamAPI(ls.LitAPI):
def setup(self, device) -> None:
self.model = lambda x: str(x)
def decode_request(self, request):
return request["input"]
def predict(self, x):
for i in range(10):
yield self.model(i)
def encode_response(self, output_stream):
for output in output_stream:
yield {"output": output}
if __name__ == "__main__":
server = ls.LitServer(SimpleStreamAPI(), stream=True)
server.run(port=8000) and then via import requests
url = "http://127.0.0.1:8000/predict"
resp = requests.post(url, json={"input": "1, 2, 3"}, headers=None, stream=True)
for line in resp.iter_lines():
if line:
print(line.decode("utf-8")) it just returns EDIT: Nevermind, I pulled the latest LitServe version and it's fixed! |
Ok great, this works flawlessly with the new LitServe version now: url = "http://127.0.0.1:8000/predict"
resp = requests.post(url, json={"prompt": "Hello world"}, headers=None, stream=True)
for line in resp.iter_lines():
if line:
print(line.decode("utf-8"))
|
I now also got it to work with the chat version via #1426. I guess we can close this one then and leave the base generate function untouched. Or alternatively, the question is if we should replace the |
I didn't merge them before because the logic to check for the stop tokens is not as performant as simply generating a fixed number of tokens. They could probably still be reconcilliated with some if-elses but code simplicity over deduplication was desirable before |
Makes sense and no worries. I'd say let's keep it as is. I got the LitServe streaming example to work after all, so mission accomplished 😊 |
Fixes #1420