merge upstream dev:

deeppavlov · May 29, 2023 · eac585d · eac585d
1 parent ab40dae
commit eac585d
Show file tree

Hide file tree

Showing 356 changed files with 2,011 additions and 611 deletions.
diff --git a/.env b/.env
@@ -20,15 +20,15 @@ TEXT_QA_URL=http://text-qa:8078/model
 BADLIST_ANNOTATOR_URL=http://badlisted-words:8018/badlisted_words_batch
 COMET_ATOMIC_SERVICE_URL=http://comet-atomic:8053/comet
 COMET_CONCEPTNET_SERVICE_URL=http://comet-conceptnet:8065/comet
-MASKED_LM_SERVICE_URL=http://masked-lm:8088/respond
+MASKED_LM_SERVICE_URL=http://masked-lm:8102/respond
 DP_WIKIDATA_URL=http://wiki-parser:8077/model
 DP_ENTITY_LINKING_URL=http://entity-linking:8075/model
 KNOWLEDGE_GROUNDING_SERVICE_URL=http://knowledge-grounding:8083/respond
 WIKIDATA_DIALOGUE_SERVICE_URL=http://wikidata-dial-service:8092/model
 NEWS_API_ANNOTATOR_URL=http://news-api-annotator:8112/respond
 WIKI_FACTS_URL=http://wiki-facts:8116/respond
 FACT_RANDOM_SERVICE_URL=http://fact-random:8119/respond
-INFILLING_SERVICE_URL=http://infilling:8122/respond
+INFILLING_SERVICE_URL=http://infilling:8106/respond
 DIALOGPT_CONTINUE_SERVICE_URL=http://dialogpt:8125/continue
 PROMPT_STORYGPT_SERVICE_URL=http://prompt-storygpt:8127/respond
 STORYGPT_SERVICE_URL=http://storygpt:8126/respond

diff --git a/MODELS.md b/MODELS.md
@@ -11,3 +11,4 @@ Here you may find a list of models that currently available for use in Generativ
 | Open-Assistant Pythia 12B | transformers-lm-oasst12b | [link](https://huggingface.co/OpenAssistant/pythia-12b-sft-v8-7k-steps) | yes                      | 12B                       | 26GB (half-precision)     | 5,120 tokens                   | An open-source English-only instruction-based large language model which is NOT good at answering math and coding questions. NB: free of charge. This model is up and running on our servers and can be used for free.                                                                                                                       |
 | GPT-4                     | openai-api-gpt4          | [link](https://platform.openai.com/docs/models/gpt-4)                   | no (paid access via API) | supposedly, 175B          | - (cannot be run locally) | 8,192 tokens                   | A multilingual instruction-based large language model which is capable of code generation and other complex tasks. More capable than any GPT-3.5 model, able to do more complex tasks, and optimized for chat. NB: paid. You must provide your OpenAI API key to use the model. Your OpenAI account will be charged according to your usage. |
 | GPT-4 32K                 | openai-api-gpt4-32k      | [link](https://platform.openai.com/docs/models/gpt-4)                   | no (paid access via API) | supposedly, 175B          | - (cannot be run locally) | 32,768 tokens                  | A multilingual instruction-based large language model which is capable of code generation and other complex tasks. 	Same capabilities as the base gpt-4 mode but with 4x the context length. NB: paid. You must provide your OpenAI API key to use the model. Your OpenAI account will be charged according to your usage.                   |
+| GPT-JT 6B                 | transformers-lm-gptjt    | [link](https://huggingface.co/togethercomputer/GPT-JT-6B-v1)            | yes                      | 6B                        | 26GB                      | 2,048 tokens                   | An open-source English-only large language model which was fine-tuned for instruction following but is NOT capable of code generation. NB: free of charge. This model is up and running on our servers and can be used for free.                                                                                                             |
diff --git a/README.md b/README.md
@@ -260,22 +260,24 @@ Dream Architecture is presented in the following image:
 | Wiki Facts                  | 1.7 GB RAM             | model that extracts related facts from Wikipedia and WikiHow pages                                                                                                                                                             |
 
 ## Services
-| Name                   | Requirements            | Description                                                                                                                                                                                                                                       |
-|------------------------|-------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| DialoGPT               | 1.2 GB RAM, 2.1 GB GPU  | generative service based on Transformers generative model, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (for example, `microsoft/DialoGPT-small` with 0.2-0.5 sec on GPU)                                          |
-| DialoGPT Persona-based | 1.2 GB RAM, 2.1 GB GPU  | generative service based on Transformers generative model, the model was pre-trained on the PersonaChat dataset to generate a response conditioned on a several sentences of the socialbot's persona                                              |
-| Image Captioning       | 4 GB RAM, 5.4 GB GPU    | creates text representation of a received image                                                                                                                                                                                                   |
-| Infilling              | 1  GB RAM, 1.2 GB GPU   | (turned off but the code is available) generative service based on Infilling model, for the given utterance returns utterance where `_` from original text is replaced with generated tokens                                                      |
-| Knowledge Grounding    | 2 GB RAM, 2.1 GB GPU    | generative service based on BlenderBot architecture providing a response to the context taking into account an additional text paragraph                                                                                                          |
-| Masked LM              | 1.1 GB RAM, 1 GB GPU    | (turned off but the code is available)                                                                                                                                                                                                            |
-| Seq2seq Persona-based  | 1.5 GB RAM, 1.5 GB GPU  | generative service based on Transformers seq2seq model, the model was pre-trained on the PersonaChat dataset to generate a response conditioned on a several sentences of the socialbot's persona                                                 |
-| Sentence Ranker        | 1.2 GB RAM, 2.1 GB GPU  | ranking model given as `PRETRAINED_MODEL_NAME_OR_PATH` which for a pair os sentences returns a float score of correspondence                                                                                                                      |
-| StoryGPT               | 2.6 GB RAM, 2.15 GB GPU | generative service based on fine-tuned GPT-2, for the given set of keywords returns a short story using the keywords                                                                                                                              |
-| GPT-3.5                | 100 MB RAM              | generative service based on OpenAI API service, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, `text-davinci-003` is used.                                                          |
-| ChatGPT                | 100 MB RAM              | generative service based on OpenAI API service, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, `gpt-3.5-turbo` is used.                                                             |
-| Prompt StoryGPT        | 3 GB RAM, 4 GB GPU      | generative service based on fine-tuned GPT-2, for the given topic represented by one noun returns short story on a given topic                                                                                                                    |
-| GPT-J 6B               | 1.5 GB RAM, 24.2 GB GPU | generative service based on Transformers generative model, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, [GPT-J model](https://huggingface.co/EleutherAI/gpt-j-6B) is used.        |
-| BLOOMZ 7B              | 2.5 GB RAM, 29 GB GPU   | generative service based on Transformers generative model, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, [BLOOMZ-7b1 model](https://huggingface.co/bigscience/bloomz-7b1) is used. |
+| Name                   | Requirements            | Description                                                                                                                                                                                                                                           |
+|------------------------|-------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| DialoGPT               | 1.2 GB RAM, 2.1 GB GPU  | generative service based on Transformers generative model, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (for example, `microsoft/DialoGPT-small` with 0.2-0.5 sec on GPU)                                              |
+| DialoGPT Persona-based | 1.2 GB RAM, 2.1 GB GPU  | generative service based on Transformers generative model, the model was pre-trained on the PersonaChat dataset to generate a response conditioned on a several sentences of the socialbot's persona                                                  |
+| Image Captioning       | 4 GB RAM, 5.4 GB GPU    | creates text representation of a received image                                                                                                                                                                                                       |
+| Infilling              | 1  GB RAM, 1.2 GB GPU   | (turned off but the code is available) generative service based on Infilling model, for the given utterance returns utterance where `_` from original text is replaced with generated tokens                                                          |
+| Knowledge Grounding    | 2 GB RAM, 2.1 GB GPU    | generative service based on BlenderBot architecture providing a response to the context taking into account an additional text paragraph                                                                                                              |
+| Masked LM              | 1.1 GB RAM, 1 GB GPU    | (turned off but the code is available)                                                                                                                                                                                                                |
+| Seq2seq Persona-based  | 1.5 GB RAM, 1.5 GB GPU  | generative service based on Transformers seq2seq model, the model was pre-trained on the PersonaChat dataset to generate a response conditioned on a several sentences of the socialbot's persona                                                     |
+| Sentence Ranker        | 1.2 GB RAM, 2.1 GB GPU  | ranking model given as `PRETRAINED_MODEL_NAME_OR_PATH` which for a pair os sentences returns a float score of correspondence                                                                                                                          |
+| StoryGPT               | 2.6 GB RAM, 2.15 GB GPU | generative service based on fine-tuned GPT-2, for the given set of keywords returns a short story using the keywords                                                                                                                                  |
+| GPT-3.5                | 100 MB RAM              | generative service based on OpenAI API service, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, `text-davinci-003` is used.                                                              |
+| ChatGPT                | 100 MB RAM              | generative service based on OpenAI API service, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, `gpt-3.5-turbo` is used.                                                                 |
+| Prompt StoryGPT        | 3 GB RAM, 4 GB GPU      | generative service based on fine-tuned GPT-2, for the given topic represented by one noun returns short story on a given topic                                                                                                                        |
+| GPT-J 6B               | 1.5 GB RAM, 24.2 GB GPU | generative service based on Transformers generative model, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, [GPT-J model](https://huggingface.co/EleutherAI/gpt-j-6B) is used.            |
+| BLOOMZ 7B              | 2.5 GB RAM, 29 GB GPU   | generative service based on Transformers generative model, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, [BLOOMZ-7b1 model](https://huggingface.co/bigscience/bloomz-7b1) is used.     |
+| GPT-JT 6B              | 2.5 GB RAM, 25.1 GB GPU | generative service based on Transformers generative model, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, [GPT-JT model](https://huggingface.co/togethercomputer/GPT-JT-6B-v1) is used. |
+
 
 ## Skills
 | Name                               | Requirements              | Description                                                                                                                                                                                                                                                   |

diff --git a/annotators/fact_retrieval_rus/service_configs/fact-retrieval-ru/service.yml b/annotators/fact_retrieval_rus/service_configs/fact-retrieval-ru/service.yml
@@ -9,13 +9,13 @@ compose:
       CONFIG: fact_retrieval_rus.json
       COMMIT: c8264bf82eaa3ed138395ab68f71d47a4175f2fc
       TOP_N: 20
-      SERVICE_PORT: 8130
+      SERVICE_PORT: 8110
       SRC_DIR: annotators/fact_retrieval_rus
       CUDA_VISIBLE_DEVICES: '0'
       FLASK_APP: server
     context: ./
     dockerfile: annotators/fact_retrieval_rus/Dockerfile
-  command: flask run -h 0.0.0.0 -p 8130
+  command: flask run -h 0.0.0.0 -p 8110
   environment:
   - CUDA_VISIBLE_DEVICES=0
   - FLASK_APP=server
@@ -29,5 +29,5 @@ compose:
   - ./annotators/fact_retrieval_rus:/src
   - ~/.deeppavlov:/root/.deeppavlov
   ports:
-  - 8130:8130
+  - 8110:8110
 proxy: null
diff --git a/annotators/prompt_selector/server.py b/annotators/prompt_selector/server.py
@@ -2,6 +2,7 @@
 import logging
 import requests
 import time
+from copy import deepcopy
 from os import getenv, listdir
 from pathlib import Path
 
@@ -27,14 +28,18 @@
     prompt_name = Path(filename).stem
     if ".json" in filename and prompt_name in PROMPTS_TO_CONSIDER:
         data = json.load(open(f"common/prompts/{filename}", "r"))
-        PROMPTS.append(data["prompt"])
+        PROMPTS.append(data.get("goals", ""))
         PROMPTS_NAMES.append(prompt_name)
 
 
 def get_result(request):
     global PROMPTS, PROMPTS_NAMES
     st_time = time.time()
+    # batch of contexts
     contexts = request.json["contexts"]
+    # batch of prompts_goals dicts [{"promptname1": "promptgoal1", "promptname2": "promptgoal2"}]
+    prompts_goals_from_attributes = request.json["prompts_goals"]
+
     result = []
     pairs = []
     context_ids = []
@@ -44,34 +49,73 @@ def get_result(request):
             str_context = "\n".join(context[-3:])
         else:
             str_context = context[-1]
-        for prompt in PROMPTS:
-            pairs += [[str_context, prompt]]
+        for _prompt_goals, _prompt_name in zip(PROMPTS, PROMPTS_NAMES):
+            pairs += [
+                [
+                    str_context,
+                    prompts_goals_from_attributes[context_id].get(_prompt_name, "")
+                    if not _prompt_goals
+                    else _prompt_goals,
+                ]
+            ]
             context_ids += [context_id]
     context_ids = np.array(context_ids)
-    try:
-        scores = requests.post(SENTENCE_RANKER_SERVICE_URL, json={"sentence_pairs": pairs}, timeout=1.5).json()[0][
-            "batch"
-        ]
-        scores = np.array(scores)
-        for i, context in enumerate(contexts):
-            curr_ids = np.where(context_ids == i)[0]
-            most_relevant_sent_ids = np.argsort(scores[curr_ids])[::-1][:N_SENTENCES_TO_RETURN]
-            curr_result = {
-                "prompts": [PROMPTS_NAMES[_id] for _id in most_relevant_sent_ids],
-                "max_similarity": scores[curr_ids][most_relevant_sent_ids[0]],
-            }
-            result += [curr_result]
-    except Exception as exc:
-        logger.exception(exc)
-        sentry_sdk.capture_exception(exc)
+    is_empty_prompts = np.array([len(pair[1]) == 0 for pair in pairs])
+    if all(is_empty_prompts):
+        logger.info("All goals from prompts are empty. Skip ranking.")
         result = [{"prompts": [], "max_similarity": 0.0}] * len(contexts)
+    else:
+        try:
+            scores = requests.post(SENTENCE_RANKER_SERVICE_URL, json={"sentence_pairs": pairs}, timeout=1.5).json()[0][
+                "batch"
+            ]
+            scores = np.array(scores)
+            for i, context in enumerate(contexts):
+                curr_ids = np.where(context_ids == i)[0]
+                # assign to -1 scores for pairs with empty prompt (actually, its goals)
+                for _id in curr_ids:
+                    if is_empty_prompts[_id]:
+                        scores[_id] = -1.0
+                most_relevant_sent_ids = np.argsort(scores[curr_ids])[::-1][:N_SENTENCES_TO_RETURN]
+                curr_result = {
+                    "prompts": [PROMPTS_NAMES[_id] for _id in most_relevant_sent_ids],
+                    "max_similarity": scores[curr_ids][most_relevant_sent_ids[0]],
+                }
+                # add to prompts to be turned on, those prompts which goals are empty
+                for _id in curr_ids:
+                    if is_empty_prompts[_id]:
+                        curr_result["prompts"] += [PROMPTS_NAMES[_id]]
+                result += [curr_result]
+        except Exception as exc:
+            logger.exception(exc)
+            sentry_sdk.capture_exception(exc)
+            result = [{"prompts": [], "max_similarity": 0.0}] * len(contexts)
 
     total_time = time.time() - st_time
     logger.info(f"prompt-selector exec time: {total_time:.3f}s")
     logger.info(f"prompt-selector result: {result}")
     return result
 
 
+@app.route("/collect_goals", methods=["POST"])
+def collect_goals():
+    # prompts_goals_from_attributes = [{"promptname1": "promptgoal1", "promptname2": "promptgoal2"}]
+    # these are goals from attributes of skills' hypotheses, generated by LLMs on the previous step of the dialog
+    prompts_goals_from_attributes = request.json["prompts_goals"]
+    # these are human attributes which may already contain goals for some prompts
+    human_attributes = request.json["human_attributes"]
+    result = []
+
+    for _prompts_goals_all, _human_attr in zip(prompts_goals_from_attributes, human_attributes):
+        # _prompts_goals_all = {"promptname1": "promptgoal1", "promptname2": "promptgoal2"}
+        _prompts_goals_not_empty = {name: goals for name, goals in _prompts_goals_all.items() if len(goals)}
+        _new_prompts_goals = deepcopy(_human_attr.get("prompts_goals", {}))
+        _new_prompts_goals.update(_prompts_goals_not_empty)
+        result += [{"human_attributes": {"prompts_goals": _new_prompts_goals}}]
+    logger.info(f"prompt_selector collected goals from hypotheses' attributes: {result}")
+    return jsonify(result)
+
+
 @app.route("/respond", methods=["POST"])
 def respond():
     result = get_result(request)

diff --git a/annotators/prompt_selector/service_configs/ai_faq_assistant/service.yml b/annotators/prompt_selector/service_configs/ai_faq_assistant/service.yml
@@ -1,6 +1,7 @@
 name: prompt-selector
 endpoints:
 - respond
+- collect_goals
 compose:
   env_file:
   - .env

diff --git a/annotators/prompt_selector/service_configs/deeppavlov_assistant/service.yml b/annotators/prompt_selector/service_configs/deeppavlov_assistant/service.yml
@@ -1,6 +1,7 @@
 name: prompt-selector
 endpoints:
 - respond
+- collect_goals
 compose:
   env_file:
   - .env

diff --git a/annotators/prompt_selector/service_configs/deepy_assistant/service.yml b/annotators/prompt_selector/service_configs/deepy_assistant/service.yml
@@ -1,6 +1,7 @@
 name: prompt-selector
 endpoints:
 - respond
+- collect_goals
 compose:
   env_file:
   - .env