Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Update OpenAI spec to include image url in message content #113

Merged

Conversation

bhimrazy
Copy link
Contributor

@bhimrazy bhimrazy commented May 23, 2024

Before submitting
  • Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Fixes #107.

  • Add support for images to be included in chat messages, similar to gpt-4o.

@bhimrazy bhimrazy requested a review from lantiga as a code owner May 23, 2024 17:26
@lantiga
Copy link
Collaborator

lantiga commented May 23, 2024

Thanks for the PR @bhimrazy!
Can you add a test and a minimal end to end example on the readme?

@williamFalcon
Copy link
Contributor

@bhimrazy sick! super excited to try this.

@lantiga @lantiga do we have a guide or something to show how to add the test and example?

@lantiga
Copy link
Collaborator

lantiga commented May 23, 2024

no, good point /cc @aniketmaurya

@bhimrazy for now you can take inspiration from:

@aniketmaurya
Copy link
Collaborator

aniketmaurya commented May 23, 2024

thank you for the PR @bhimrazy! as Luca mentioned, you can take inspiration from the existing LitSpec test cases.

Maybe you can try sending the request with image content to the server and check that it is able to parse and doesn't break.

{
 "model": "lit",
  "messages": [
     {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                },
            ],
        }
  ]
}

@bhimrazy
Copy link
Contributor Author

bhimrazy commented May 23, 2024

Thanks, @lantiga, @williamFalcon, and @aniketmaurya for all of these feedback.
I will go through the given examples and add the test cases and e2e example.

@aniketmaurya
Copy link
Collaborator

awesome @bhimrazy!! don't hesitate to reach out if you need any help.

@bhimrazy
Copy link
Contributor Author

Hi @aniketmaurya @lantiga , Could you please help me through the addition of end-to-end documentation to the README?
I have prepared a draft version of it, included below.


LitServe's OpenAISpec also enables capability to handle images in the input. Below is an example of how to set this up using LitServe.

import litserve as ls
from litserve.specs.openai import ChatMessage

class OpenAISpecLitAPI(ls.LitAPI):
    def setup(self, device):
        self.model = None

    def predict(self, x):
        yield {"role": "assistant", "content": "This is a generated output"}

    def encode_response(self, output: dict) -> ChatMessage:
        yield ChatMessage(role="assistant", content="This is a custom encoded output")


if __name__ == "__main__":
    server = ls.LitServer(OpenAISpecLitAPI(), spec=ls.OpenAISpec())
    server.run(port=8000)

In this case, predict is expected to take an input with the following shape:

  • Text Input Example:

    [{"role": "system", "content": "You are a helpful assistant."},
     {"role": "user", "content": "Hello there"},
     {"role": "assistant", "content": "Hello, how can I help?"},
     {"role": "user", "content": "What is the capital of Australia?"}]
    
  • Mixed Text and Image Input Example:

    [{"role": "system", "content": "You are a helpful assistant."},
     {
     "role": "user", 
     "content": [
                    {"type": "text", "text": "What's in this image?"},
                    {
                        "type": "image_url",
                        "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                    },
                ]
    
    },
     {"role": "assistant", "content": "A wooden boardwalk through a green field under a blue sky."},
     {"role": "user", "content": "How is the weather depicted in the image?"}]
    

The above server can be queried using a standard OpenAI client:

import requests

response = requests.post("http://127.0.0.1:8000/v1/chat/completions", json={
    "model": "my-gpt2",
    "stream": False,  # You can stream chunked response by setting this True
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                },
            ]
      }
    ]
  })

@aniketmaurya
Copy link
Collaborator

aniketmaurya commented May 24, 2024

looks good @bhimrazy! would be nice if you can show that image content could be processed in predict or decode_request step. For example:

    def predict(self, x):
        if isinstance(x["content"], list):
                # do something with image url
                image_url = x["content"][1]["image_url"]
                yield {"role": "assistant", "content": "the image describes nature and bla bla..."}
        else:
            yield {"role": "assistant", "content": "This is a generated output."}

@bhimrazy
Copy link
Contributor Author

looks good @bhimrazy! would be nice if you can show that image content could be processed in predict or decode_request step. For example:

    def predict(self, x):
        if isinstance(x["content"], list):
                # do something with image url
                image_url = x["content"][1]["image_url"]
                yield {"role": "assistant", "content": "the image describes nature and bla bla..."}
        else:
            yield {"role": "assistant", "content": "This is a generated output."}

Sure Thanks!

@bhimrazy
Copy link
Contributor Author

Hi @lantiga, The PR is ready for review.
Thank you!

@lantiga
Copy link
Collaborator

lantiga commented May 24, 2024

Awesome @bhimrazy, reviewing now!

Copy link
Collaborator

@lantiga lantiga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

src/litserve/specs/openai.py Outdated Show resolved Hide resolved
src/litserve/specs/openai.py Outdated Show resolved Hide resolved
src/litserve/specs/openai.py Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@lantiga
Copy link
Collaborator

lantiga commented May 24, 2024

Awesome job @bhimrazy, let's see what CI thinks and then we're ready to merge!

@lantiga lantiga merged commit 0e8915c into Lightning-AI:main May 24, 2024
17 checks passed
@lantiga
Copy link
Collaborator

lantiga commented May 24, 2024

Let's goo! Merged 🚀

@williamFalcon
Copy link
Contributor

congrats @bhimrazy!
solid contribution

@bhimrazy bhimrazy deleted the feat/add-image-input-support-in-openai-spec branch May 25, 2024 01:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support to include images for OpenAI Chat Template
4 participants