Skip to content

Commit

Permalink
merge upstream dev:
Browse files Browse the repository at this point in the history
  • Loading branch information
ruthenian8 committed May 29, 2023
1 parent ab40dae commit eac585d
Show file tree
Hide file tree
Showing 356 changed files with 2,011 additions and 611 deletions.
4 changes: 2 additions & 2 deletions .env
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,15 @@ TEXT_QA_URL=http://text-qa:8078/model
BADLIST_ANNOTATOR_URL=http://badlisted-words:8018/badlisted_words_batch
COMET_ATOMIC_SERVICE_URL=http://comet-atomic:8053/comet
COMET_CONCEPTNET_SERVICE_URL=http://comet-conceptnet:8065/comet
MASKED_LM_SERVICE_URL=http://masked-lm:8088/respond
MASKED_LM_SERVICE_URL=http://masked-lm:8102/respond
DP_WIKIDATA_URL=http://wiki-parser:8077/model
DP_ENTITY_LINKING_URL=http://entity-linking:8075/model
KNOWLEDGE_GROUNDING_SERVICE_URL=http://knowledge-grounding:8083/respond
WIKIDATA_DIALOGUE_SERVICE_URL=http://wikidata-dial-service:8092/model
NEWS_API_ANNOTATOR_URL=http://news-api-annotator:8112/respond
WIKI_FACTS_URL=http://wiki-facts:8116/respond
FACT_RANDOM_SERVICE_URL=http://fact-random:8119/respond
INFILLING_SERVICE_URL=http://infilling:8122/respond
INFILLING_SERVICE_URL=http://infilling:8106/respond
DIALOGPT_CONTINUE_SERVICE_URL=http://dialogpt:8125/continue
PROMPT_STORYGPT_SERVICE_URL=http://prompt-storygpt:8127/respond
STORYGPT_SERVICE_URL=http://storygpt:8126/respond
Expand Down
1 change: 1 addition & 0 deletions MODELS.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@ Here you may find a list of models that currently available for use in Generativ
| Open-Assistant Pythia 12B | transformers-lm-oasst12b | [link](https://huggingface.co/OpenAssistant/pythia-12b-sft-v8-7k-steps) | yes | 12B | 26GB (half-precision) | 5,120 tokens | An open-source English-only instruction-based large language model which is NOT good at answering math and coding questions. NB: free of charge. This model is up and running on our servers and can be used for free. |
| GPT-4 | openai-api-gpt4 | [link](https://platform.openai.com/docs/models/gpt-4) | no (paid access via API) | supposedly, 175B | - (cannot be run locally) | 8,192 tokens | A multilingual instruction-based large language model which is capable of code generation and other complex tasks. More capable than any GPT-3.5 model, able to do more complex tasks, and optimized for chat. NB: paid. You must provide your OpenAI API key to use the model. Your OpenAI account will be charged according to your usage. |
| GPT-4 32K | openai-api-gpt4-32k | [link](https://platform.openai.com/docs/models/gpt-4) | no (paid access via API) | supposedly, 175B | - (cannot be run locally) | 32,768 tokens | A multilingual instruction-based large language model which is capable of code generation and other complex tasks. Same capabilities as the base gpt-4 mode but with 4x the context length. NB: paid. You must provide your OpenAI API key to use the model. Your OpenAI account will be charged according to your usage. |
| GPT-JT 6B | transformers-lm-gptjt | [link](https://huggingface.co/togethercomputer/GPT-JT-6B-v1) | yes | 6B | 26GB | 2,048 tokens | An open-source English-only large language model which was fine-tuned for instruction following but is NOT capable of code generation. NB: free of charge. This model is up and running on our servers and can be used for free. |
34 changes: 18 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -260,22 +260,24 @@ Dream Architecture is presented in the following image:
| Wiki Facts | 1.7 GB RAM | model that extracts related facts from Wikipedia and WikiHow pages |

## Services
| Name | Requirements | Description |
|------------------------|-------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DialoGPT | 1.2 GB RAM, 2.1 GB GPU | generative service based on Transformers generative model, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (for example, `microsoft/DialoGPT-small` with 0.2-0.5 sec on GPU) |
| DialoGPT Persona-based | 1.2 GB RAM, 2.1 GB GPU | generative service based on Transformers generative model, the model was pre-trained on the PersonaChat dataset to generate a response conditioned on a several sentences of the socialbot's persona |
| Image Captioning | 4 GB RAM, 5.4 GB GPU | creates text representation of a received image |
| Infilling | 1 GB RAM, 1.2 GB GPU | (turned off but the code is available) generative service based on Infilling model, for the given utterance returns utterance where `_` from original text is replaced with generated tokens |
| Knowledge Grounding | 2 GB RAM, 2.1 GB GPU | generative service based on BlenderBot architecture providing a response to the context taking into account an additional text paragraph |
| Masked LM | 1.1 GB RAM, 1 GB GPU | (turned off but the code is available) |
| Seq2seq Persona-based | 1.5 GB RAM, 1.5 GB GPU | generative service based on Transformers seq2seq model, the model was pre-trained on the PersonaChat dataset to generate a response conditioned on a several sentences of the socialbot's persona |
| Sentence Ranker | 1.2 GB RAM, 2.1 GB GPU | ranking model given as `PRETRAINED_MODEL_NAME_OR_PATH` which for a pair os sentences returns a float score of correspondence |
| StoryGPT | 2.6 GB RAM, 2.15 GB GPU | generative service based on fine-tuned GPT-2, for the given set of keywords returns a short story using the keywords |
| GPT-3.5 | 100 MB RAM | generative service based on OpenAI API service, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, `text-davinci-003` is used. |
| ChatGPT | 100 MB RAM | generative service based on OpenAI API service, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, `gpt-3.5-turbo` is used. |
| Prompt StoryGPT | 3 GB RAM, 4 GB GPU | generative service based on fine-tuned GPT-2, for the given topic represented by one noun returns short story on a given topic |
| GPT-J 6B | 1.5 GB RAM, 24.2 GB GPU | generative service based on Transformers generative model, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, [GPT-J model](https://huggingface.co/EleutherAI/gpt-j-6B) is used. |
| BLOOMZ 7B | 2.5 GB RAM, 29 GB GPU | generative service based on Transformers generative model, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, [BLOOMZ-7b1 model](https://huggingface.co/bigscience/bloomz-7b1) is used. |
| Name | Requirements | Description |
|------------------------|-------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DialoGPT | 1.2 GB RAM, 2.1 GB GPU | generative service based on Transformers generative model, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (for example, `microsoft/DialoGPT-small` with 0.2-0.5 sec on GPU) |
| DialoGPT Persona-based | 1.2 GB RAM, 2.1 GB GPU | generative service based on Transformers generative model, the model was pre-trained on the PersonaChat dataset to generate a response conditioned on a several sentences of the socialbot's persona |
| Image Captioning | 4 GB RAM, 5.4 GB GPU | creates text representation of a received image |
| Infilling | 1 GB RAM, 1.2 GB GPU | (turned off but the code is available) generative service based on Infilling model, for the given utterance returns utterance where `_` from original text is replaced with generated tokens |
| Knowledge Grounding | 2 GB RAM, 2.1 GB GPU | generative service based on BlenderBot architecture providing a response to the context taking into account an additional text paragraph |
| Masked LM | 1.1 GB RAM, 1 GB GPU | (turned off but the code is available) |
| Seq2seq Persona-based | 1.5 GB RAM, 1.5 GB GPU | generative service based on Transformers seq2seq model, the model was pre-trained on the PersonaChat dataset to generate a response conditioned on a several sentences of the socialbot's persona |
| Sentence Ranker | 1.2 GB RAM, 2.1 GB GPU | ranking model given as `PRETRAINED_MODEL_NAME_OR_PATH` which for a pair os sentences returns a float score of correspondence |
| StoryGPT | 2.6 GB RAM, 2.15 GB GPU | generative service based on fine-tuned GPT-2, for the given set of keywords returns a short story using the keywords |
| GPT-3.5 | 100 MB RAM | generative service based on OpenAI API service, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, `text-davinci-003` is used. |
| ChatGPT | 100 MB RAM | generative service based on OpenAI API service, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, `gpt-3.5-turbo` is used. |
| Prompt StoryGPT | 3 GB RAM, 4 GB GPU | generative service based on fine-tuned GPT-2, for the given topic represented by one noun returns short story on a given topic |
| GPT-J 6B | 1.5 GB RAM, 24.2 GB GPU | generative service based on Transformers generative model, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, [GPT-J model](https://huggingface.co/EleutherAI/gpt-j-6B) is used. |
| BLOOMZ 7B | 2.5 GB RAM, 29 GB GPU | generative service based on Transformers generative model, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, [BLOOMZ-7b1 model](https://huggingface.co/bigscience/bloomz-7b1) is used. |
| GPT-JT 6B | 2.5 GB RAM, 25.1 GB GPU | generative service based on Transformers generative model, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, [GPT-JT model](https://huggingface.co/togethercomputer/GPT-JT-6B-v1) is used. |


## Skills
| Name | Requirements | Description |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,13 @@ compose:
CONFIG: fact_retrieval_rus.json
COMMIT: c8264bf82eaa3ed138395ab68f71d47a4175f2fc
TOP_N: 20
SERVICE_PORT: 8130
SERVICE_PORT: 8110
SRC_DIR: annotators/fact_retrieval_rus
CUDA_VISIBLE_DEVICES: '0'
FLASK_APP: server
context: ./
dockerfile: annotators/fact_retrieval_rus/Dockerfile
command: flask run -h 0.0.0.0 -p 8130
command: flask run -h 0.0.0.0 -p 8110
environment:
- CUDA_VISIBLE_DEVICES=0
- FLASK_APP=server
Expand All @@ -29,5 +29,5 @@ compose:
- ./annotators/fact_retrieval_rus:/src
- ~/.deeppavlov:/root/.deeppavlov
ports:
- 8130:8130
- 8110:8110
proxy: null
82 changes: 63 additions & 19 deletions annotators/prompt_selector/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import logging
import requests
import time
from copy import deepcopy
from os import getenv, listdir
from pathlib import Path

Expand All @@ -27,14 +28,18 @@
prompt_name = Path(filename).stem
if ".json" in filename and prompt_name in PROMPTS_TO_CONSIDER:
data = json.load(open(f"common/prompts/{filename}", "r"))
PROMPTS.append(data["prompt"])
PROMPTS.append(data.get("goals", ""))
PROMPTS_NAMES.append(prompt_name)


def get_result(request):
global PROMPTS, PROMPTS_NAMES
st_time = time.time()
# batch of contexts
contexts = request.json["contexts"]
# batch of prompts_goals dicts [{"promptname1": "promptgoal1", "promptname2": "promptgoal2"}]
prompts_goals_from_attributes = request.json["prompts_goals"]

result = []
pairs = []
context_ids = []
Expand All @@ -44,34 +49,73 @@ def get_result(request):
str_context = "\n".join(context[-3:])
else:
str_context = context[-1]
for prompt in PROMPTS:
pairs += [[str_context, prompt]]
for _prompt_goals, _prompt_name in zip(PROMPTS, PROMPTS_NAMES):
pairs += [
[
str_context,
prompts_goals_from_attributes[context_id].get(_prompt_name, "")
if not _prompt_goals
else _prompt_goals,
]
]
context_ids += [context_id]
context_ids = np.array(context_ids)
try:
scores = requests.post(SENTENCE_RANKER_SERVICE_URL, json={"sentence_pairs": pairs}, timeout=1.5).json()[0][
"batch"
]
scores = np.array(scores)
for i, context in enumerate(contexts):
curr_ids = np.where(context_ids == i)[0]
most_relevant_sent_ids = np.argsort(scores[curr_ids])[::-1][:N_SENTENCES_TO_RETURN]
curr_result = {
"prompts": [PROMPTS_NAMES[_id] for _id in most_relevant_sent_ids],
"max_similarity": scores[curr_ids][most_relevant_sent_ids[0]],
}
result += [curr_result]
except Exception as exc:
logger.exception(exc)
sentry_sdk.capture_exception(exc)
is_empty_prompts = np.array([len(pair[1]) == 0 for pair in pairs])
if all(is_empty_prompts):
logger.info("All goals from prompts are empty. Skip ranking.")
result = [{"prompts": [], "max_similarity": 0.0}] * len(contexts)
else:
try:
scores = requests.post(SENTENCE_RANKER_SERVICE_URL, json={"sentence_pairs": pairs}, timeout=1.5).json()[0][
"batch"
]
scores = np.array(scores)
for i, context in enumerate(contexts):
curr_ids = np.where(context_ids == i)[0]
# assign to -1 scores for pairs with empty prompt (actually, its goals)
for _id in curr_ids:
if is_empty_prompts[_id]:
scores[_id] = -1.0
most_relevant_sent_ids = np.argsort(scores[curr_ids])[::-1][:N_SENTENCES_TO_RETURN]
curr_result = {
"prompts": [PROMPTS_NAMES[_id] for _id in most_relevant_sent_ids],
"max_similarity": scores[curr_ids][most_relevant_sent_ids[0]],
}
# add to prompts to be turned on, those prompts which goals are empty
for _id in curr_ids:
if is_empty_prompts[_id]:
curr_result["prompts"] += [PROMPTS_NAMES[_id]]
result += [curr_result]
except Exception as exc:
logger.exception(exc)
sentry_sdk.capture_exception(exc)
result = [{"prompts": [], "max_similarity": 0.0}] * len(contexts)

total_time = time.time() - st_time
logger.info(f"prompt-selector exec time: {total_time:.3f}s")
logger.info(f"prompt-selector result: {result}")
return result


@app.route("/collect_goals", methods=["POST"])
def collect_goals():
# prompts_goals_from_attributes = [{"promptname1": "promptgoal1", "promptname2": "promptgoal2"}]
# these are goals from attributes of skills' hypotheses, generated by LLMs on the previous step of the dialog
prompts_goals_from_attributes = request.json["prompts_goals"]
# these are human attributes which may already contain goals for some prompts
human_attributes = request.json["human_attributes"]
result = []

for _prompts_goals_all, _human_attr in zip(prompts_goals_from_attributes, human_attributes):
# _prompts_goals_all = {"promptname1": "promptgoal1", "promptname2": "promptgoal2"}
_prompts_goals_not_empty = {name: goals for name, goals in _prompts_goals_all.items() if len(goals)}
_new_prompts_goals = deepcopy(_human_attr.get("prompts_goals", {}))
_new_prompts_goals.update(_prompts_goals_not_empty)
result += [{"human_attributes": {"prompts_goals": _new_prompts_goals}}]
logger.info(f"prompt_selector collected goals from hypotheses' attributes: {result}")
return jsonify(result)


@app.route("/respond", methods=["POST"])
def respond():
result = get_result(request)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: prompt-selector
endpoints:
- respond
- collect_goals
compose:
env_file:
- .env
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: prompt-selector
endpoints:
- respond
- collect_goals
compose:
env_file:
- .env
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: prompt-selector
endpoints:
- respond
- collect_goals
compose:
env_file:
- .env
Expand Down
Loading

0 comments on commit eac585d

Please sign in to comment.