Streaming for serving with chat's generate function #1426

rasbt · 2024-05-20T20:31:19Z

This is an alternative to #1424 using the generate function from chat.

litgpt/deploy/serve.py

rasbt · 2024-05-20T22:49:10Z

This now also works:

litgpt  serve --checkpoint_dir checkpoints/EleutherAI/pythia-1b --stream true

import requests
import litserve

print("LitServe:", litserve.__version__)
url = "http://127.0.0.1:8000/predict"

resp = requests.post(url, json={"prompt": "Hello world"}, headers=None, stream=True)
for line in resp.iter_lines():
    if line:
        print(line.decode("utf-8"))

LitServe: 0.1.1.dev0
{"output": "!\""}{"output": "\n"}{"output": "\n"}{"output": "After"}{"output": " my"}{"output": " visit"}{"output": " to"}{"output": " the"}{"output": " United"}{"output": " Kingdom"}{"output": ","}{"output": " I"}{"output": " found"}{"output": " a"}{"output": " new"}{"output": " favorite"}{"output": " place"}{"output": " to"}{"output": " visit"}{"output": "."}{"output": " I"}{"output": " was"}{"output": " excited"}{"output": " to"}{"output": " go"}{"output": " back"}{"output": " to"}{"output": " the"}{"output": " World"}{"output": "'s"}{"output": " End"}{"output": " to"}{"output": " experience"}{"output": " more"}{"output": " of"}{"output": " the"}{"output": " area"}{"output": "."}{"output": "\n"}{"output": "\n"}{"output": "The"}{"output": " World"}{"output": "'s"}{"output": " End"}{"output": " is"}{"output": " situated"}{"output": " in"}{"output": " the"}{"output": " f"}{"output": "ooth"}

The only remaining issue is that the stop token terminates everything. Otherwise, it works fine.

rasbt · 2024-05-21T17:14:51Z

@aniketmaurya Are there any best practices / examples for how to piece the outputs together as a string? It looks very obvious at first glance, and I have tried many things, but it's harder than I thought and requires lots of lines of code because I sometimes get invalid JSON errors.

litgpt/deploy/serve.py

aniketmaurya · 2024-05-22T11:08:26Z

@rasbt opened an issue for you here. The PR should make it easier for clients to iterate over the chunks and decode the JSON response.

rasbt · 2024-05-23T18:29:31Z

Thanks @aniketmaurya , I can now do:

import requests
import litserve
import json


print("LitServe:", litserve.__version__)
url = "http://127.0.0.1:8000/predict"

resp = requests.post(url, json={"prompt": "Hello world"}, headers=None, stream=True)
for line in resp.iter_lines():
    if line:
        print(json.loads(line)["output"], end="")

and it works perfectly!

rasbt · 2024-05-23T18:42:28Z

The tests are failing because there hasn't been a litserve 0.1.1 release yet. It's fine (no rush), I just think we should wait until there's been a new release.

aniketmaurya · 2024-05-23T19:42:11Z

The tests are failing because there hasn't been a litserve 0.1.1 release yet. It's fine (no rush), I just think we should wait until there's been a new release.

@lantiga do we want to make a release anytime soon? LitServe recently had a lot of bug fixes in addition to OpenAI spec too since the last release.

aniketmaurya · 2024-05-30T12:53:34Z

hi @rasbt, can we add a __init__.py to the deploy module so that the BaseLitAPI is importable in case user want to override the default behaviour or customize it?

EDIT: Actually, I was able to import it even without that. I got an import error when I did an editable install of LitGPT.

rasbt · 2024-05-30T13:28:22Z

Thanks for the suggestion, it absolutely makes sense to add it, @aniketmaurya . Just did.

stream-with-chat

e04667b

rasbt requested review from awaelchli, carmocca and lantiga as code owners May 20, 2024 20:31

remove unnecessary import

708b05c

rasbt commented May 20, 2024

View reviewed changes

litgpt/deploy/serve.py Outdated Show resolved Hide resolved

rasbt added 2 commits May 20, 2024 20:50

proper indentation

1df5180

disable stop tokens

b2ea42e

Merge branch 'main' into stream-with-chat

f001ec6

rasbt mentioned this pull request May 20, 2024

Adds streaming option to generate #1424

Closed

carmocca reviewed May 22, 2024

View reviewed changes

litgpt/deploy/serve.py Show resolved Hide resolved

litgpt/deploy/serve.py Outdated Show resolved Hide resolved

rasbt added 3 commits May 22, 2024 18:12

Merge branch 'main' into stream-with-chat

15d9a99

bump litserve version

aa06c53

fix stop token arg

53e466e

Merge branch 'main' into stream-with-chat

7a16f11

rasbt added 2 commits May 30, 2024 08:26

Merge branch 'main' into stream-with-chat

f614245

add __init__

46a2613

rasbt added 2 commits June 4, 2024 14:43

Merge branch 'main' into stream-with-chat

2791654

bump litserve version

02c5cd6

rasbt merged commit 0f3bca7 into main Jun 4, 2024
9 checks passed

rasbt deleted the stream-with-chat branch June 4, 2024 19:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming for serving with chat's generate function #1426

Streaming for serving with chat's generate function #1426

rasbt commented May 20, 2024

rasbt commented May 20, 2024

rasbt commented May 21, 2024

aniketmaurya commented May 22, 2024

rasbt commented May 23, 2024 •

edited

Loading

rasbt commented May 23, 2024

aniketmaurya commented May 23, 2024

aniketmaurya commented May 30, 2024 •

edited

Loading

rasbt commented May 30, 2024

Streaming for serving with chat's generate function #1426

Streaming for serving with chat's generate function #1426

Conversation

rasbt commented May 20, 2024

rasbt commented May 20, 2024

rasbt commented May 21, 2024

aniketmaurya commented May 22, 2024

rasbt commented May 23, 2024 • edited Loading

rasbt commented May 23, 2024

aniketmaurya commented May 23, 2024

aniketmaurya commented May 30, 2024 • edited Loading

rasbt commented May 30, 2024

rasbt commented May 23, 2024 •

edited

Loading

aniketmaurya commented May 30, 2024 •

edited

Loading