Skip to content

🌴 Drop-in replacement REST API for Vertex AI (PaLM 2, Codey, Gemini) that is compatible with the OpenAI API specifications

License

Notifications You must be signed in to change notification settings

Cyclenerd/google-cloud-gcp-openai-api

Repository files navigation

OpenAI API for Google Cloud Vertex AI

Badge: Google Cloud Badge: OpenAI Badge: Python

This project is a drop-in replacement REST API for Vertex AI (PaLM 2, Codey, Gemini) that is compatible with the OpenAI API specifications.

Examples:

Chat with Gemini in Chatbot UI Get help from Gemini in VSCode
Screenshot: Chatbot UI chat Screenshot: VSCode chat

This project is inspired by the idea of LocalAI but with the focus on making Google Cloud Platform Vertex AI PaLM more accessible to anyone.

A Google Cloud Run service is installed that translates the OpenAI API calls to Vertex AI (PaLM 2, Codey, Gemini).

Diagram: OpenAI, Google Cloud Run and Vertex AI

Supported OpenAI API services:

OpenAI API Supported
List models /v1/models βœ…
Chat Completions /v1/chat/completions βœ…
Completions (Legacy) /v1/completions ❌
Embeddings /v1/embeddings ❌

The software is developed in Python and based on FastAPI and LangChain.

Everything is designed to be very simple, so you can easily adjust the source code to your individual needs.

Step by Step Guide

A Jupyter notebook Vertex_AI_Chat.ipynb with step-by-step instructions is prepared. It will help you to deploy the API backend and Chatbot UI frontend as Google Cloud Run service.

Deploying to Cloud Run

Requirements:

Your user (the one used for deployment) must have proper permissions in the project. For a fast and hassle-free deployemnt the "Owner" role is recommended.

In addition, the default compute service account ([PROJECT_NR][email protected]) must have the role "Role Vertex AI User" (roles/aiplatform.user).

Authenticate:

gcloud auth login

Set default project:

gcloud config set project [PROJECT_ID]

Run the following script to create a container image and deploy that container as a public API (which allows unauthenticated calls) in Google Cloud Run:

bash deploy.sh

Note: You can change the generated fake OpenAI API key and Google Cloud region with environment variables:

export OPENAI_API_KEY="sk-XYZ"
export GOOGLE_CLOUD_LOCATION="europe-west1"
bash deploy.sh

Running Locally

The software was tested on GNU/Linux and macOS with Python 3.11 and 3.12.3 (3.12.4 currently not working). If you want to use the software under Windows, you must set the environment variables with set instead of export.

You should also create a virtual environment with the version of Python you want to use, and activate it before proceeding.

You also need the Google Cloud CLI. The Google Cloud CLI includes the gcloud command-line tool.

Initiate a Python virtual environment and install requirements:

python3 -m venv .venv && \
source .venv/bin/activate && \
pip install -r requirements.txt

Authenticate:

gcloud auth application-default login

Set default project:

gcloud auth application-default set-quota-project [PROJECT_ID]

Run with default model:

export DEBUG="True"
export OPENAI_API_KEY="sk-XYZ"
uvicorn vertex:app --reload

Example for Windows:

set DEBUG=True
set OPENAI_API_KEY=sk-XYZ
uvicorn vertex:app --reload

Run with Gemini gemini-pro model:

export DEBUG="True"
export OPENAI_API_KEY="sk-XYZ"
export MODEL_NAME="gemini-pro"
uvicorn vertex:app --reload

Run with Codey codechat-bison-32k model:

export DEBUG="True"
export OPENAI_API_KEY="sk-XYZ"
export MODEL_NAME="codechat-bison-32k"
export MAX_OUTPUT_TOKENS="16000"
uvicorn vertex:app --reload

The application will now be running on your local computer. You can access it by opening a web browser and navigating to the following address:

http://localhost:8000/

Usage

HTTP request and response formats are consistent with the OpenAI API.

For example, to generate a chat completion, you can send a POST request to the /v1/chat/completions endpoint with the instruction as the request body:

curl --location 'http://[ENDPOINT]/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer [API-KEY]' \
--data '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {
        "role": "user",
        "content": "Say this is a test!"
      }
    ]
  }'

Response:

{
  "id": "cmpl-efccdeb3d2a6cfe144fdde11",
  "created": 1691577522,
  "object": "chat.completion",
  "model": "gpt-3.5-turbo",
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0
  },
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Sure, this is a test."
      },
      "finish_reason": "stop",
      "index": 0
    }
  ]
}

Bruno API client

Screenshot: Bruno API client

Download export for Bruno API client: bruno-export.json

Configuration

The configuration of the software can be done with environment variables.

Screenshot: Google Cloud run

The following variables with default values exist:

Variable Default Description
DEBUG False Show debug messages that help during development.
GOOGLE_CLOUD_LOCATION us-central1 Google Cloud Platform region for API calls.
GOOGLE_CLOUD_PROJECT_ID [DEFAULT_AUTH_PROJECT] Identifier for your project. If not specified, the project of authentication is used.
HOST 0.0.0.0 Bind socket to this host.
MAX_OUTPUT_TOKENS 512 Token limit determines the maximum amount of text output from one prompt. Can be overridden by the end user as required by the OpenAI API specification.
MODEL_NAME chat-bison One of the foundation models that are available in Vertex AI.
OPENAI_API_KEY sk-[RANDOM_HEX] Self-generated fake OpenAI API key used for authentication against the application.
PORT 8000 Bind socket to this port.
TEMPERATURE 0.2 Sampling temperature, it controls the degree of randomness in token selection. Can be overridden by the end user as required by the OpenAI API specification.
TOP_K 40 How the model selects tokens for output, the next token is selected from.
TOP_P 0.8 Tokens are selected from most probable to least until the sum of their. Can be overridden by the end user as required by the OpenAI API specification.

OpenAI Client Library

If your application uses client libraries provided by OpenAI, you only need to modify the OPENAI_API_BASE environment variable to match your Google Cloud Run endpoint URL:

export OPENAI_API_BASE="https://openai-api-vertex-XYZ.a.run.app/v1"
python your_openai_app.py

Chatbot UI

When deploying the Chatbot UI application, the following environment variables must be set:

Variable Value
OPENAI_API_KEY API key generated during deployment
OPENAI_API_HOST Google Cloud Run URL

Screenshot: Chatbot UI container

Deploying Chatbot UI to Cloud Run

Run the following script to create a container image from the GitHub source code and deploy that container as a public website (which allows unauthenticated calls) in Google Cloud Run:

export OPENAI_API_KEY="sk-XYZ"
export OPENAI_API_HOST="https://openai-api-vertex-XYZ.a.run.app"
bash chatbot-ui.sh

Chatbox

Set the following Chatbox settings:

Setting Value
AI Provider OpenAI API
OpenAI API Key API key generated during deployment
API Host Google Cloud Run URL

Screenshot: Chatbot UI container

VSCode-OpenAI

The VSCode-OpenAI extension is a powerful and versatile tool designed to integrate OpenAI features seamlessly into your code editor.

To activate the setup, you have two options:

  • either use the command "vscode-openai.configuration.show.quickpick" or
  • access it through the vscode-openai Status Bar located at the bottom left corner of VSCode.

Screenshot: VSCode settings

Select openai.com and enter the Google Cloud Run URL with /v1 during setup.

ChatGPT Discord Bot

When deploying the Discord Bot application, the following environment variables must be set:

Variable Value
OPENAI_API_KEY API key generated during deployment
OPENAI_API_BASE Google Cloud Run URL with /v1

ChatGPT in Slack

When deploying the ChatGPT in Slack application, the following environment variables must be set:

Variable Value
OPENAI_API_KEY API key generated during deployment
OPENAI_API_BASE Google Cloud Run URL with /v1

ChatGPT Telegram Bot

When deploying the ChatGPT Telegram Bot application, the following environment variables must be set:

Variable Value
OPENAI_API_KEY API key generated during deployment
OPENAI_API_BASE Google Cloud Run URL with /v1

Contributing

Have a patch that will benefit this project? Awesome! Follow these steps to have it accepted.

  1. Please read how to contribute.
  2. Fork this Git repository and make your changes.
  3. Create a Pull Request.
  4. Incorporate review feedback to your changes.
  5. Accepted!

License

All files in this repository are under the Apache License, Version 2.0 unless noted otherwise.