OpenAI v1 Chat Completions API #171

tgaddair · 2024-01-10T05:51:04Z

Closes #145.

Usage:

Python:

from openai import OpenAI

openai_api_key = "EMPTY"
openai_api_base = "http://127.0.0.1:8080/v1"
client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

resp = client.chat.completions.create(
    model="",
    messages=[
        {
            "role": "system",
            "content": "You are a friendly chatbot who always responds in the style of a pirate",
        },
        {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
    ],
    max_tokens=100,
)
print(resp)

Streaming:

messages = client.chat.completions.create(
    model="",
    messages=[
        {
            "role": "system",
            "content": "You are a friendly chatbot who always responds in the style of a pirate",
        },
        {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
    ],
    max_tokens=100,
    stream=True,
)

for message in messages:
    print(message)

REST:

curl http://127.0.0.1:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
  "model": "",
  "messages": [
  {
      "role": "system",
      "content": "You are a friendly chatbot who always responds in the style of a pirate"
  },
  {
      "role": "user",
      "content": "How many helicopters can a human eat in one sitting?"
  }
  ],
  "max_tokens": 100
}'

Finally, if the LoRA adapter has its own tokenizer and chat template, that will be used instead of the base model chat template:

resp = client.chat.completions.create(
    model="alignment-handbook/zephyr-7b-dpo-lora",
    messages=[
        {
            "role": "system",
            "content": "You are a friendly chatbot who always responds in the style of a pirate",
        },
        {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
    ],
    max_tokens=100,
)
print("Response:", resp[0].choices[0].text)

prd-tuong-nguyen · 2024-01-22T10:16:40Z

hi @tgaddair does it work for local adapter?
With original endpoint, I can add {adapter_source: "local"} to load adapter from local directory.
How can I do it via OpenAI API?

tgaddair added 4 commits January 9, 2024 21:47

Apply chat template

c4b8e07

Updated

cedd958

Fmt

e848bd0

Add generation prompt

99d3a2b

tgaddair marked this pull request as ready for review January 10, 2024 06:08

tgaddair added 4 commits January 10, 2024 09:50

Refactor

9544b0a

Handle custom tokenizer

598a4eb

Handle tokenizer

5bef4b7

Fixed tests

3d1d906

tgaddair merged commit a90d443 into main Jan 10, 2024
1 check passed

tgaddair deleted the chat-completions branch January 10, 2024 18:35

tgaddair restored the chat-completions branch January 10, 2024 19:01

tgaddair deleted the chat-completions branch January 10, 2024 19:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenAI v1 Chat Completions API #171

OpenAI v1 Chat Completions API #171

tgaddair commented Jan 10, 2024 •

edited

Loading

prd-tuong-nguyen commented Jan 22, 2024

OpenAI v1 Chat Completions API #171

OpenAI v1 Chat Completions API #171

Conversation

tgaddair commented Jan 10, 2024 • edited Loading

prd-tuong-nguyen commented Jan 22, 2024

tgaddair commented Jan 10, 2024 •

edited

Loading