feat: gpt-4 and gpt-4 32k services (#456)

* feat: gpt-4 and gpt-4 32k services * fix: add to universal * fix: add params
deeppavlov · May 12, 2023 · 1f3ff38 · 1f3ff38
1 parent b6ad4e0
commit 1f3ff38
Show file tree

Hide file tree

Showing 11 changed files with 193 additions and 7 deletions.
diff --git a/MODELS.md b/MODELS.md
@@ -2,10 +2,12 @@
 
 Here you may find a list of models that currently available for use in Generative Assistants.
 
-| model name               | container name           | model link                                                          | open-source?             | size (billion parameters) | GPU usage                 | max tokens (prompt + response) | description                                                                                                                                                                                                                                                                   |
-|--------------------------|--------------------------|---------------------------------------------------------------------|--------------------------|---------------------------|---------------------------|--------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| BLOOMZ 7B                | transformers-lm-bloomz7b | [link](https://huggingface.co/bigscience/bloomz-7b1)                | yes                      | 7.1B                      | 33GB                      | 2,048 tokens                   | An open-source multilingual instruction-based large language model (46 languages). NB: free of charge. This model is up and running on our servers and can be used for free.                                                                                                  |
-| GPT-J 6B                 | transformers-lm-gptj     | [link](https://huggingface.co/EleutherAI/gpt-j-6b)                  | yes                      | 6B                        | 25GB                      | 2,048 tokens                   | An open-source English-only large language model which is NOT fine-tuned for instruction following and NOT capable of code generation. NB: free of charge. This model is up and running on our servers and can be used for free.                                              |
-| GPT-3.5                  | openai-api-davinci3      | [link](https://platform.openai.com/docs/models/gpt-3-5)             | no (paid access via API) | supposedly, 175B          | - (cannot be run locally) | 4,097 tokens                   | A multulingual instruction-based large language model which is capable of code generation. Unlike ChatGPT, not optimised for chat. NB: paid. You must provide your OpenAI API key to use the model. Your OpenAI account will be charged according to your usage.              |
-| ChatGPT                  | openai-api-chatgpt       | [link](https://platform.openai.com/docs/models/gpt-3-5)             | no (paid access via API) | supposedly, 175B          | - (cannot be run locally) | 4,096 tokens                   | Based on gpt-3.5-turbo -- the most capable of the entire GPT-3/GPT-3.5 models family. Optimized for chat. Able to understand and generate code. NB: paid. You must provide your OpenAI API key to use the model. Your OpenAI account will be charged according to your usage. |
-| Open-Assistant SFT-1 12B | transformers-lm-oasst12b | [link](https://huggingface.co/OpenAssistant/oasst-sft-1-pythia-12b) | yes                      | 12B                       | 26GB (half-precision)     | 5,120 tokens                   | An open-source English-only instruction-based large language model which is NOT good at answering math and coding questions. NB: free of charge. This model is up and running on our servers and can be used for free.                                                        |
+| model name                | container name           | model link                                                           | open-source?             | size (billion parameters) | GPU usage                 | max tokens (prompt + response) | description                                                                                                                                                                                                                                                                                                                                 |
+|---------------------------|--------------------------|----------------------------------------------------------------------|--------------------------|---------------------------|---------------------------|--------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| BLOOMZ 7B                 | transformers-lm-bloomz7b | [link](https://huggingface.co/bigscience/bloomz-7b1)                 | yes                      | 7.1B                      | 33GB                      | 2,048 tokens                   | An open-source multilingual instruction-based large language model (46 languages). NB: free of charge. This model is up and running on our servers and can be used for free.                                                                                                                                                                |
+| GPT-J 6B                  | transformers-lm-gptj     | [link](https://huggingface.co/EleutherAI/gpt-j-6b)                   | yes                      | 6B                        | 25GB                      | 2,048 tokens                   | An open-source English-only large language model which is NOT fine-tuned for instruction following and NOT capable of code generation. NB: free of charge. This model is up and running on our servers and can be used for free.                                                                                                            |
+| GPT-3.5                   | openai-api-davinci3      | [link](https://platform.openai.com/docs/models/gpt-3-5)              | no (paid access via API) | supposedly, 175B          | - (cannot be run locally) | 4,097 tokens                   | A multulingual instruction-based large language model which is capable of code generation. Unlike ChatGPT, not optimised for chat. NB: paid. You must provide your OpenAI API key to use the model. Your OpenAI account will be charged according to your usage.                                                                            |
+| ChatGPT                   | openai-api-chatgpt       | [link](https://platform.openai.com/docs/models/gpt-3-5)              | no (paid access via API) | supposedly, 175B          | - (cannot be run locally) | 4,096 tokens                   | Based on gpt-3.5-turbo -- the most capable of the entire GPT-3/GPT-3.5 models family. Optimized for chat. Able to understand and generate code. NB: paid. You must provide your OpenAI API key to use the model. Your OpenAI account will be charged according to your usage.                                                               |
+| Open-Assistant SFT-1 12B  | transformers-lm-oasst12b | [link](https://huggingface.co/OpenAssistant/oasst-sft-1-pythia-12b)  | yes                      | 12B                       | 26GB (half-precision)     | 5,120 tokens                   | An open-source English-only instruction-based large language model which is NOT good at answering math and coding questions. NB: free of charge. This model is up and running on our servers and can be used for free.                                                                                                                      |
+| GPT-4                     | openai-api-gpt4          | [link](https://platform.openai.com/docs/models/gpt-4)                | no (paid access via API) | supposedly, 175B          | - (cannot be run locally) | 8,192 tokens                   | A multilingual instruction-based large language model which is capable of code generation and other complex tasks. More capable than any GPT-3.5 model, able to do more complex tasks, and optimized for chat. NB: paid. You must provide your OpenAI API key to use the model. Your OpenAI account will be charged according to your usage. |
+| GPT-4 32K                 | openai-api-gpt4-32k      | [link](https://platform.openai.com/docs/models/gpt-4)                | no (paid access via API) | supposedly, 175B          | - (cannot be run locally) | 32,768 tokens                  | A multilingual instruction-based large language model which is capable of code generation and other complex tasks. 	Same capabilities as the base gpt-4 mode but with 4x the context length. NB: paid. You must provide your OpenAI API key to use the model. Your OpenAI account will be charged according to your usage.                  |
diff --git a/assistant_dists/universal_prompted_assistant/dev.yml b/assistant_dists/universal_prompted_assistant/dev.yml
@@ -54,6 +54,18 @@ services:
       - "./common:/src/common"
     ports:
       - 8131:8131
+  openai-api-gpt4:
+    volumes:
+      - "./services/openai_api_lm:/src"
+      - "./common:/src/common"
+    ports:
+      - 8159:8159
+  openai-api-gpt4-32k:
+    volumes:
+      - "./services/openai_api_lm:/src"
+      - "./common:/src/common"
+    ports:
+      - 8160:8160
   dff-universal-prompted-skill:
     volumes:
       - "./skills/dff_universal_prompted_skill:/src"

diff --git a/assistant_dists/universal_prompted_assistant/docker-compose.override.yml b/assistant_dists/universal_prompted_assistant/docker-compose.override.yml
@@ -5,6 +5,7 @@ services:
       WAIT_HOSTS: "sentseg:8011, ranking-based-response-selector:8002, combined-classification:8087, 
         sentence-ranker:8128, 
         transformers-lm-gptj:8130, transformers-lm-oasst12b:8158, openai-api-chatgpt:8145, openai-api-davinci3:8131,
+        openai-api-gpt4:8159, openai-api-gpt4-32k:8160,
         dff-universal-prompted-skill:8147"
       WAIT_HOSTS_TIMEOUT: ${WAIT_TIMEOUT:-1000}
 
@@ -164,6 +165,46 @@ services:
         reservations:
           memory: 100M
 
+  openai-api-gpt4:
+    env_file: [ .env ]
+    build:
+      args:
+        SERVICE_PORT: 8159
+        SERVICE_NAME: openai_api_gpt4
+        PRETRAINED_MODEL_NAME_OR_PATH: gpt-4
+      context: .
+      dockerfile: ./services/openai_api_lm/Dockerfile
+    command: flask run -h 0.0.0.0 -p 8159
+    environment:
+      - CUDA_VISIBLE_DEVICES=0
+      - FLASK_APP=server
+    deploy:
+      resources:
+        limits:
+          memory: 100M
+        reservations:
+          memory: 100M
+
+  openai-api-gpt4-32k:
+    env_file: [ .env ]
+    build:
+      args:
+        SERVICE_PORT: 8160
+        SERVICE_NAME: openai_api_gpt4_32k
+        PRETRAINED_MODEL_NAME_OR_PATH: gpt-4-32k
+      context: .
+      dockerfile: ./services/openai_api_lm/Dockerfile
+    command: flask run -h 0.0.0.0 -p 8160
+    environment:
+      - CUDA_VISIBLE_DEVICES=0
+      - FLASK_APP=server
+    deploy:
+      resources:
+        limits:
+          memory: 100M
+        reservations:
+          memory: 100M
+
   dff-universal-prompted-skill:
     env_file: [ .env ]
     build:

diff --git a/components/jkdhfgkhgodfiugpojwrnkjnlg.yml b/components/jkdhfgkhgodfiugpojwrnkjnlg.yml
@@ -0,0 +1,28 @@
+name: openai_api_gpt4
+display_name: GPT-4
+component_type: Generative
+model_type: NN-based
+is_customizable: false
+author: [email protected]
+description: A multilingual instruction-based large language model 
+  which is capable of code generation and other complex tasks. 
+  More capable than any GPT-3.5 model, able to do more complex tasks, 
+  and optimized for chat. Paid. 
+  You must provide your OpenAI API key to use the model. 
+  Your OpenAI account will be charged according to your usage.
+ram_usage: 100M
+gpu_usage: null
+group: services
+connector:
+  protocol: http
+  timeout: 20.0
+  url: http://openai-api-gpt4:8159/respond
+dialog_formatter: null
+response_formatter: null
+previous_services: null
+required_previous_services: null
+state_manager_method: null
+tags: null
+endpoint: respond
+service: services/openai_api_lm/service_configs/openai-api-gpt4
+date_created: '2023-04-16T09:45:32'
diff --git a/components/oinfjkrbnfmhkfsjdhfsd.yml b/components/oinfjkrbnfmhkfsjdhfsd.yml
@@ -0,0 +1,27 @@
+name: openai_api_gpt4_32k
+display_name: GPT-4 32k
+component_type: Generative
+model_type: NN-based
+is_customizable: false
+author: [email protected]
+description: A multilingual instruction-based large language model 
+  which is capable of code generation and other complex tasks. 
+  Same capabilities as the base gpt-4 mode but with 4x the context length. 
+  Paid. You must provide your OpenAI API key to use the model. 
+  Your OpenAI account will be charged according to your usage.
+ram_usage: 100M
+gpu_usage: null
+group: services
+connector:
+  protocol: http
+  timeout: 20.0
+  url: http://openai-api-gpt4-32k:8160/respond
+dialog_formatter: null
+response_formatter: null
+previous_services: null
+required_previous_services: null
+state_manager_method: null
+tags: null
+endpoint: respond
+service: services/openai_api_lm/service_configs/openai-api-gpt4-32k
+date_created: '2023-04-16T09:45:32'
diff --git a/services/openai_api_lm/server.py b/services/openai_api_lm/server.py
@@ -26,6 +26,8 @@
 DEFAULT_CONFIGS = {
     "text-davinci-003": json.load(open("generative_configs/openai-text-davinci-003.json", "r")),
     "gpt-3.5-turbo": json.load(open("generative_configs/openai-chatgpt.json", "r")),
+    "gpt-4": json.load(open("generative_configs/openai-chatgpt.json", "r")),
+    "gpt-4-32k": json.load(open("generative_configs/openai-chatgpt.json", "r")),
 }
 
 

diff --git a/services/openai_api_lm/service_configs/openai-api-gpt4-32k/environment.yml b/services/openai_api_lm/service_configs/openai-api-gpt4-32k/environment.yml
@@ -0,0 +1,5 @@
+SERVICE_PORT: 8160
+SERVICE_NAME: openai_api_gpt4_32k
+PRETRAINED_MODEL_NAME_OR_PATH: gpt-4-32k
+CUDA_VISIBLE_DEVICES: '0'
+FLASK_APP: server
diff --git a/services/openai_api_lm/service_configs/openai-api-gpt4-32k/service.yml b/services/openai_api_lm/service_configs/openai-api-gpt4-32k/service.yml
@@ -0,0 +1,31 @@
+name: openai-api-gpt4-32k
+endpoints:
+- respond
+compose:
+  env_file:
+  - .env
+  build:
+    args:
+      SERVICE_PORT: 8160
+      SERVICE_NAME: openai_api_gpt4_32k
+      PRETRAINED_MODEL_NAME_OR_PATH: gpt-4-32k
+      CUDA_VISIBLE_DEVICES: '0'
+      FLASK_APP: server
+    context: .
+    dockerfile: ./services/openai_api_lm/Dockerfile
+  command: flask run -h 0.0.0.0 -p 8160
+  environment:
+  - CUDA_VISIBLE_DEVICES=0
+  - FLASK_APP=server
+  deploy:
+    resources:
+      limits:
+        memory: 100M
+      reservations:
+        memory: 100M
+  volumes:
+  - ./services/openai_api_lm:/src
+  - ./common:/src/common
+  ports:
+  - 8160:8160
+proxy: null
diff --git a/services/openai_api_lm/service_configs/openai-api-gpt4/environment.yml b/services/openai_api_lm/service_configs/openai-api-gpt4/environment.yml
@@ -0,0 +1,5 @@
+SERVICE_PORT: 8159
+SERVICE_NAME: openai_api_gpt4
+PRETRAINED_MODEL_NAME_OR_PATH: gpt-4
+CUDA_VISIBLE_DEVICES: '0'
+FLASK_APP: server
diff --git a/services/openai_api_lm/service_configs/openai-api-gpt4/service.yml b/services/openai_api_lm/service_configs/openai-api-gpt4/service.yml
@@ -0,0 +1,31 @@
+name: openai-api-gpt4
+endpoints:
+- respond
+compose:
+  env_file:
+  - .env
+  build:
+    args:
+      SERVICE_PORT: 8159
+      SERVICE_NAME: openai_api_gpt4
+      PRETRAINED_MODEL_NAME_OR_PATH: gpt-4
+      CUDA_VISIBLE_DEVICES: '0'
+      FLASK_APP: server
+    context: .
+    dockerfile: ./services/openai_api_lm/Dockerfile
+  command: flask run -h 0.0.0.0 -p 8159
+  environment:
+  - CUDA_VISIBLE_DEVICES=0
+  - FLASK_APP=server
+  deploy:
+    resources:
+      limits:
+        memory: 100M
+      reservations:
+        memory: 100M
+  volumes:
+  - ./services/openai_api_lm:/src
+  - ./common:/src/common
+  ports:
+  - 8159:8159
+proxy: null
diff --git a/skills/dff_universal_prompted_skill/scenario/response.py b/skills/dff_universal_prompted_skill/scenario/response.py
@@ -32,6 +32,8 @@
     "http://transformers-lm-oasst12b:8158/respond": [],
     "http://openai-api-chatgpt:8145/respond": ["OPENAI_API_KEY", "OPENAI_ORGANIZATION"],
     "http://openai-api-davinci3:8131/respond": ["OPENAI_API_KEY", "OPENAI_ORGANIZATION"],
+    "http://openai-api-gpt4:8159/respond": ["OPENAI_API_KEY", "OPENAI_ORGANIZATION"],
+    "http://openai-api-gpt4-32k:8160/respond": ["OPENAI_API_KEY", "OPENAI_ORGANIZATION"],
 }