Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summarization models #393

Merged
merged 33 commits into from
Aug 4, 2023
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
a627aef
Added abstractive summarization model for English texts
Kolpnick Apr 16, 2023
75c4a74
Added abstractive summarization model for Russian texts
Kolpnick Apr 16, 2023
783e3ba
Added summarization annotator
Kolpnick Apr 19, 2023
f754ff9
Moved rut5 summarizer to dream_russian
Kolpnick Apr 19, 2023
4be04a4
Changed endpoint
Kolpnick Apr 19, 2023
d41b8bc
Added model path to Dockerfile
Kolpnick Apr 19, 2023
42bb9c6
Updated test
Kolpnick Apr 19, 2023
6c86a2f
Updated summarization annotator input
Kolpnick Apr 19, 2023
19bdfe6
Updated test
Kolpnick Apr 19, 2023
1e67609
Changed summarization service url
Kolpnick Apr 20, 2023
7805fe8
Changed test
Kolpnick Apr 20, 2023
bf3ebf4
Merge branch 'dev' into summarization_models
Kolpnick Apr 20, 2023
5419d7e
Merge branch 'dev' into summarization_models
Kolpnick Jul 4, 2023
8c54ef8
Increased timeout
Kolpnick Jul 4, 2023
9398288
Updated ram_usage
Kolpnick Jul 4, 2023
9364d4e
Updated ports
Kolpnick Jul 4, 2023
9ba8452
Updated models cards
Kolpnick Jul 4, 2023
69ee92d
Added more info messages
Kolpnick Jul 5, 2023
b90597b
Fixed path error
Kolpnick Jul 5, 2023
b115153
Added summarization output to bot attributes
Kolpnick Jul 5, 2023
c94de38
Merge branch 'dev' into summarization_models
Kolpnick Jul 20, 2023
9ca6ad8
Added timeout param to dockerfile
Kolpnick Jul 20, 2023
c60225c
Updated model cards and ports
Kolpnick Jul 20, 2023
8def7d0
Fixed problem with incorrect batch processing
Kolpnick Jul 20, 2023
e744657
Updated summarization save format
Kolpnick Jul 20, 2023
4ad6e46
Updated dialog summarization model
Kolpnick Jul 21, 2023
487ad86
Updated tests
Kolpnick Jul 21, 2023
8d1f484
Minor formatting changes
Kolpnick Jul 24, 2023
71cad71
Fixed black and flake8 codestyle
Kolpnick Jul 24, 2023
f2820ef
Fixed black codestyle
Kolpnick Jul 24, 2023
09e2145
Merge branch 'dev' into summarization_models
Kolpnick Aug 1, 2023
da489f1
Updated models ports
Kolpnick Aug 1, 2023
03b675f
Small fixes
Kolpnick Aug 4, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .env
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,4 @@ PROMPT_STORYGPT_SERVICE_URL=http://prompt-storygpt:8127/respond
STORYGPT_SERVICE_URL=http://storygpt:8126/respond
SENTENCE_RANKER_SERVICE_URL=http://sentence-ranker:8128/respond
FILE_SERVER_URL=http://files:3000
SUMMARIZATION_SERVICE_URL=http://brio-summarizer:8169/respond_batch
1 change: 1 addition & 0 deletions .env_ru
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,4 @@ DP_WIKIDATA_URL=http://wiki-parser-ru:8077/model
DP_ENTITY_LINKING_URL=http://entity-linking-ru:8075/model
FILE_SERVER_URL=http://files:3000
SENTENCE_RANKER_SERVICE_URL=http://dialogrpt-ru:8122/rank_sentences
SUMMARIZATION_SERVICE_URL=http://rut5-summarizer:8170/respond_batch
6 changes: 6 additions & 0 deletions annotators/summarization_annotator/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
FROM python:3.7.4

COPY ${WORK_DIR}/requirements.txt /src/requirements.txt
RUN pip install -r /src/requirements.txt
COPY ${WORK_DIR} /src
WORKDIR /src
7 changes: 7 additions & 0 deletions annotators/summarization_annotator/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
sentry-sdk[flask]==0.14.1
flask==1.1.1
itsdangerous==2.0.1
gunicorn==19.9.0
requests==2.22.0
jinja2<=3.0.3
Werkzeug<=2.0.3
60 changes: 60 additions & 0 deletions annotators/summarization_annotator/server.py
dilyararimovna marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
import logging
import time
from os import getenv

import sentry_sdk
import requests
from flask import Flask, jsonify, request


sentry_sdk.init(getenv("SENTRY_DSN"))
logging.basicConfig(format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", level=logging.INFO)
logger = logging.getLogger(__name__)
app = Flask(__name__)

SUMMARIZATION_SERVICE_URL = getenv("SUMMARIZATION_SERVICE_URL")
logger.info(f"summarization-annotator considered summarizer: {SUMMARIZATION_SERVICE_URL}")


def get_summary(dialog):
summary = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

почему дефолтное значение лист, если обычно там строка лежит? надо пустую строку сделать дефолтным значением тогда

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Исправил в коммите

if len(dialog) != 11:
logger.info(f"summarization-annotator is not ready to summarize dialog as the length of unsummarized dialog is {len(dialog)} != 11")
return summary

logger.info(f"summarization-annotator is ready to summarize dialog as the length of unsummarized dialog is 11")
dialog = dialog[:6]
smilni marked this conversation as resolved.
Show resolved Hide resolved
for i in range(len(dialog)):
if i % 2 == 0:
dialog[i] = 'User: ' + dialog[i]
else:
dialog[i] = 'Bot: ' + dialog[i]
dialog = ['\n'.join(dialog)]
logger.info(f"summarization-annotator will summarize this: {dialog}")

try:
summary = requests.post(SUMMARIZATION_SERVICE_URL, json={"sentences": dialog}, timeout=10).json()[0]['batch'][0]
except Exception as exc:
logger.exception(exc)
sentry_sdk.capture_exception(exc)

return summary


@app.route("/respond", methods=["POST"])
def respond():
start_time = time.time()
dialog = request.json.get('dialog', [])

logger.info(f"summarization-annotator received dialog: {dialog}")
result = get_summary(dialog)
summarization_attribute = [{"bot_attributes": {"summarized_dialog": result}}]
logger.info(f"summarization-annotator output: {summarization_attribute}")

total_time = time.time() - start_time
logger.info(f"summarization-annotator exec time: {round(total_time, 2)} sec")
return jsonify(summarization_attribute)
dilyararimovna marked this conversation as resolved.
Show resolved Hide resolved


if __name__ == "__main__":
app.run(debug=False, host="0.0.0.0", port=8171)
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
SERVICE_PORT: 8171
SERVICE_NAME: summarization_annotator
FLASK_APP: server
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: summarization-annotator
endpoints:
- respond
compose:
env_file:
- .env
build:
args:
SERVICE_PORT: 8171
SERVICE_NAME: summarization_annotator
context: ./annotators/summarization_annotator/
command: flask run -h 0.0.0.0 -p 8171
environment:
- FLASK_APP=server
deploy:
resources:
limits:
memory: 256M
reservations:
memory: 256M
volumes:
- ./annotators/summarization_annotator:/src
ports:
- 8171:8171
proxy: null
54 changes: 54 additions & 0 deletions annotators/summarization_annotator/test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
import requests
from os import getenv


SUMMARIZATION_SERVICE_URL = getenv("SUMMARIZATION_SERVICE_URL")

Kolpnick marked this conversation as resolved.
Show resolved Hide resolved

def test_skill():
url = "http://0.0.0.0:8171/respond"

if SUMMARIZATION_SERVICE_URL == "http://brio-summarizer:8169/respond_batch":
input_data = {"dialog": ["Good morning!",
"Hi, this is a Dream Socialbot! How is the day going so far for you?",
"Good! Can you tell me something about cooking and baking?",
"Sure! Baking cookies is comforting, and cookies are the sweetest "
"little bit of comfort food. Do you like cooking?",
"It depends on my mood.",
"May I recommend you a meal to try to practice cooking?",
"No. Better tell me what do you have in mind?",
"I've recently found a couple easy and healthy meals. How about cooking quinoa with "
"turkey and broccoli?",
"That sounds like a healthy and tasty meal! Quinoa is a great source of protein, and "
"when paired with lean turkey and broccoli, it's a well-rounded and balanced meal.",
"I am glad for you! I listened to my favorite music all day. "
"Such a great thing you know! Has anything extraordinary happened today?",
"I can tell you more about what made your day great or we can just chat?"
"I'm happy to listen!"]}

desired_output = ["a Dream Socialbot talks to users about cooking and baking cookies. The bot says cookies "
"are comforting, and baking them is a good way to feel good. The robot is called a "
"Dream Social bot. It is designed to talk to users in a friendly, conversational manner."]
else:
input_data = {"dialog": ["Привет! У тебя есть хобби?",
"Мое хобби — кулинария.",
"Здорово! А ты любишь готовить?",
"Ага, я могу отлично приготовить разные блюда.",
"Ты собираешь кулинарные рецепты?",
"Да, уже есть большая коллекция.",
"А какая национальная кухня тебе нравится?",
"Конечно, русская.",
"Русские блюда очень оригинальные, вкусные и полезные.",
"А что ты любишь готовить больше всего?",
"Я люблю готовить мясные блюда. Так что приглашаю в гости!"]}

desired_output = ["У тебя есть хобби — кулинария, а у тебя есть большая коллекция кулинарных рецептов. Bot: Я "
"собираю кулинарные рецепты, собираю кулинарные рецепты, собираю кулинарные рецепты."]

result = requests.post(url, json=input_data).json()
assert result == [{"bot_attributes": {"summarized_dialog": desired_output[0]}}]
print("SUCCESS!")


if __name__ == "__main__":
test_skill()
3 changes: 3 additions & 0 deletions annotators/summarization_annotator/test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/bin/bash

python test.py
10 changes: 10 additions & 0 deletions assistant_dists/dream/dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -454,6 +454,16 @@ services:
- "~/.deeppavlov/cache:/root/.cache"
ports:
- 8102:8102
brio-summarizer:
volumes:
- "./services/brio_summarizer:/src"
ports:
- 8169:8169
summarization-annotator:
volumes:
- "./annotators/summarization_annotator:/src"
ports:
- 8171:8171
dff-template-skill:
volumes:
- "./skills/dff_template_skill:/src"
Expand Down
36 changes: 36 additions & 0 deletions assistant_dists/dream/docker-compose.override.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1390,6 +1390,42 @@ services:
reservations:
memory: 4G

brio-summarizer:
env_file: [ .env ]
build:
args:
SERVICE_PORT: 8169
SERVICE_NAME: brio_summarizer
PRETRAINED_MODEL_NAME: "Yale-LILY/brio-cnndm-uncased"
context: ./services/brio_summarizer/
command: flask run -h 0.0.0.0 -p 8169
environment:
- CUDA_VISIBLE_DEVICES=0
- FLASK_APP=server
deploy:
resources:
limits:
memory: 4G
reservations:
memory: 4G

summarization-annotator:
dilyararimovna marked this conversation as resolved.
Show resolved Hide resolved
env_file: [ .env ]
build:
args:
SERVICE_PORT: 8171
SERVICE_NAME: summarization_annotator
context: ./annotators/summarization_annotator/
command: flask run -h 0.0.0.0 -p 8171
environment:
- FLASK_APP=server
deploy:
resources:
limits:
memory: 256M
reservations:
memory: 256M

dff-template-skill:
env_file: [ .env ]
build:
Expand Down
19 changes: 19 additions & 0 deletions assistant_dists/dream/pipeline_conf.json
Original file line number Diff line number Diff line change
Expand Up @@ -515,6 +515,25 @@
"component": "components/VkkvnRwjgB5GwrH98k5EKA.yml",
"service": "annotators/relative_persona_extractor/service_configs/relative-persona-extractor"
}
},
"summarization_annotator": {
"connector": {
"protocol": "http",
"timeout": 10.0,
"url": "http://summarization-annotator:8171/respond"
},
"dialog_formatter": "state_formatters.dp_formatters:summarization_annotator_formatter",
"response_formatter": "state_formatters.dp_formatters:simple_formatter_service",
"previous_services": [
"annotators.spelling_preprocessing"
],
"state_manager_method": "update_attributes",
"is_enabled": true,
"source": {
"directory": "annotators/summarization_annotator",
"container": "summarization-annotator",
"endpoint": "respond"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

посмотри как это сделано в других сервисах, сделай по аналогии. карточки сейчас обязательны для всего

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

то же самое про русский дрим

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Исправил в коммите

}
}
},
"response_annotators": {
Expand Down
10 changes: 10 additions & 0 deletions assistant_dists/dream_russian/dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -137,4 +137,14 @@ services:
- "~/.deeppavlov:/root/.deeppavlov"
ports:
- 8078:8078
rut5-summarizer:
volumes:
- "./services/ruT5_summarizer:/src"
ports:
- 8170:8170
summarization-annotator:
volumes:
- "./annotators/summarization_annotator:/src"
ports:
- 8171:8171
version: "3.7"
36 changes: 36 additions & 0 deletions assistant_dists/dream_russian/docker-compose.override.yml
Original file line number Diff line number Diff line change
Expand Up @@ -445,4 +445,40 @@ services:
reservations:
memory: 3G

rut5-summarizer:
env_file: [ .env ]
build:
args:
SERVICE_PORT: 8170
SERVICE_NAME: ruT5_summarizer
PRETRAINED_MODEL_NAME: "IlyaGusev/rut5_base_sum_gazeta"
context: ./services/ruT5_summarizer/
command: flask run -h 0.0.0.0 -p 8170
environment:
- CUDA_VISIBLE_DEVICES=0
- FLASK_APP=server
deploy:
resources:
limits:
memory: 4G
reservations:
memory: 4G

summarization-annotator:
env_file: [ .env ]
build:
args:
SERVICE_PORT: 8171
SERVICE_NAME: summarization_annotator
context: ./annotators/summarization_annotator/
command: flask run -h 0.0.0.0 -p 8171
environment:
- FLASK_APP=server
deploy:
resources:
limits:
memory: 256M
reservations:
memory: 256M

version: '3.7'
19 changes: 19 additions & 0 deletions assistant_dists/dream_russian/pipeline_conf.json
Original file line number Diff line number Diff line change
Expand Up @@ -266,6 +266,25 @@
"component": "components/feDgqHKLibnMNM3HSbnmA.yml",
"service": "annotators/wiki_parser/service_configs/wiki-parser-ru"
}
},
"summarization_annotator": {
"connector": {
"protocol": "http",
"timeout": 10.0,
"url": "http://summarization-annotator:8171/respond"
},
"dialog_formatter": "state_formatters.dp_formatters:summarization_annotator_formatter",
"response_formatter": "state_formatters.dp_formatters:simple_formatter_service",
"previous_services": [
"annotators.spelling_preprocessing"
],
"state_manager_method": "update_attributes",
"is_enabled": true,
"source": {
"directory": "annotators/summarization_annotator",
"container": "summarization-annotator",
"endpoint": "respond"
}
}
},
"response_annotators": {
Expand Down
3 changes: 3 additions & 0 deletions components.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -170,3 +170,6 @@
8166
8167 openai-api-chatgpt-16k
8168 transformers-lm-vicuna13b
8169 brio-summarizer
8170 rut5-summarizer
8171 summarization-annotator
24 changes: 24 additions & 0 deletions components/riRfdGz86P51B9bL7fO6JR.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: summarization-annotator
display_name: Summarization Annotator
container_name: summarization-annotator
component_type: null
model_type: NN-based
is_customizable: false
author: DeepPavlov
description: Annotator that accesses summarization services
ram_usage: 256M
gpu_usage: null
connector:
protocol: http
timeout: 10.0
url: http://summarization-annotator:8171/respond
dialog_formatter: state_formatters.dp_formatters:summarization_annotator_formatter
response_formatter: state_formatters.dp_formatters:simple_formatter_service
previous_services:
- annotators.spelling_preprocessing
required_previous_services: null
state_manager_method: add_annotation
tags: null
endpoint: respond
service: annotators/summarization_annotator/service_configs/aummarization-annotator
date_created: '2023-07-04T11:39:32'
9 changes: 9 additions & 0 deletions services/brio_summarizer/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
FROM python:3.7.4

ARG PRETRAINED_MODEL_NAME
ENV PRETRAINED_MODEL_NAME ${PRETRAINED_MODEL_NAME}

COPY ${WORK_DIR}/requirements.txt /src/requirements.txt
RUN pip install -r /src/requirements.txt
COPY ${WORK_DIR} /src
WORKDIR /src
9 changes: 9 additions & 0 deletions services/brio_summarizer/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
torch==1.13.1
transformers==4.27.0
sentry-sdk[flask]==0.14.1
flask==1.1.1
itsdangerous==2.0.1
gunicorn==19.9.0
requests==2.22.0
jinja2<=3.0.3
Werkzeug<=2.0.3
Loading