-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Summarization models #393
Summarization models #393
Conversation
logger.info(dialog) | ||
|
||
try: | ||
summary = requests.post(SUMMARIZATION_SERVICE_URL, json={"sentences": dialog}, timeout=3).json()[0]['batch'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 -- мало. у меня один из твоих тестов из-за этого не проходит (логи). я бы поставила ну хотя бы 5 (хотя для пересказов диалогов с генеративками где одна реплика может содержать кучу токенов и этого будет мало) или вообще этот таймаут вынесла как аргумент, как сделано у генеративных сервисов. но про вынести не уверена, надо у Дили уточнить.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
в пайплане соответственно тоже увеличить надо будет
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Исправил в коммите
- group: annotators | ||
connector: | ||
protocol: http | ||
timeout: 2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
в других местах у тебя другой таймаут у summarization-annotator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Исправил в коммите
- group: services | ||
connector: | ||
protocol: http | ||
timeout: 2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
мало
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Исправил в коммите
- group: services | ||
connector: | ||
protocol: http | ||
timeout: 2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
тоже мало
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Исправил в коммите
is_customizable: false | ||
author: DeepPavlov | ||
description: Annotator that accesses summarization services | ||
ram_usage: 1G |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
так много? в докер-компоуз файле 256мб
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Исправил в коммите
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
256мб, не гб 😦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Не заметил)) Но данный файл всё равно был удалён при изменении карточек в коммите
еще -- подмержь свежий дев, разреши конфликт. посмотри на то, как изменились карточки, сделай по образцу сервисов и аннотаторов в деве. добавь карточки в папку components, порты в файл components.tsv (проверь чтоб твои были незаняты, кстати) |
"directory": "annotators/summarization_annotator", | ||
"container": "summarization-annotator", | ||
"endpoint": "respond" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
посмотри как это сделано в других сервисах, сделай по аналогии. карточки сейчас обязательны для всего
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
то же самое про русский дрим
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Исправил в коммите
пожалуйста, подмержь свежий дев! и в целом каждый раз когда вносишь изменения, 1) подмерживай дев, 2) разрешай конфликты, если они есть (щас их много) |
Общий комментарий по архитектуре -- ты не совсем правильно работаешь с батчами. |
result = "" | ||
result += prev_summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
может просто result = prev_summary вместо двух строк?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Исправил в коммите
if new_summary: | ||
result += " " + new_summary | ||
result = result.strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
я бы заменила на
if new_summary:
result = f"{result} {new_summary}".strip()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Исправил в коммите
services/dialog_summarizer/server.py
Outdated
start_time = time.time() | ||
sentences = request.json.get("sentences", []) | ||
logger.debug(f"Sentences: {sentences}") | ||
tokenized_text = tokenizer(sentences, max_length=512, return_tensors="pt", truncation=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
а почему больше 512 не может быть?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Исправил в коммите
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
почти все, пару вопросов и все
|
||
for dialog, prev_summary in zip(dialogs_batch, summaries_batch): | ||
logger.info(f"summarization-annotator received dialog: {dialog}") | ||
logger.info(f"summarization-annotator received previous summary: {[prev_summary]}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
мб перевести в режим debug эти два лога, а то логи будут быстро разрастаться
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Исправил в коммите
if new_summary: | ||
result = f"{result} {new_summary}".strip() | ||
summarization_attribute.append({"bot_attributes": {"summarized_dialog": result}}) | ||
logger.info(f"summarization-annotator output: {summarization_attribute}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ну или оставить только этот принт что ли
|
||
|
||
def get_summary(dialog): | ||
summary = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
почему дефолтное значение лист, если обычно там строка лежит? надо пустую строку сделать дефолтным значением тогда
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Исправил в коммите
* Added abstractive summarization model for English texts * Added abstractive summarization model for Russian texts * Added summarization annotator * Moved rut5 summarizer to dream_russian * Changed endpoint * Added model path to Dockerfile * Updated test * Updated summarization annotator input * Updated test * Changed summarization service url * Changed test * Increased timeout * Updated ram_usage * Updated ports * Updated models cards * Added more info messages * Fixed path error * Added summarization output to bot attributes * Added timeout param to dockerfile * Updated model cards and ports * Fixed problem with incorrect batch processing * Updated summarization save format * Updated dialog summarization model * Updated tests * Minor formatting changes * Fixed black and flake8 codestyle * Fixed black codestyle * Updated models ports * Small fixes
* robot first commit (no cards) * feat: do not use sentence ranker url from env (#535) * Feat/ruxglm prompted dist (#528) * feat: ignore env secret ru * feat: add access token * feat: distribution ruxglm * fix: ruxglm cards * fix: use use_auth_token * fix: eos tokens type * fix: stats cpu ram * fix: skills cards * fix: components cards * fix: eos tokens * fix: eos tokens * fix: eos tokens * fix: eos tokens * fix: cards and table * fix: ADDITIONAL_EOS_TOKENS * fix: ADDITIONAL_EOS_TOKENS * fix: codestyle * fix: universal * fix: dream persona ru skill name * fix: prompt selector ru * fix: replacement * fix: prompt selecrto * fix: use params * fix: timeout and history * fix: trye very short persona * fix: increase timeout * fix: sub replacement tokens correctly * fix: sub replacement tokens correctly * fix: use stopping criteria * fix: typing * fix: revert long persona * fix: duplicate spaces * fix: correct components for russian distribution * fix: proxy for russian distribution * feat: universal distr for ru * feat: universal distr for ru * fix: remove extra * fix: working configs * fix: configs * feat: ruxglm prompted dists * fix: component cards * fix: container name * fix: remove extra space after new line * fix: remove extra space after new line * feat: tests for dream ruxglm * fix: proxy and ru lang * fix: change port of universal ru * fix: rights on file * fix: tests skills * fix: test for resp selector * fix: tests for proxied components * fix: remove do sample true * fix: generative params * feat: used sentence ranker url * feat: utilized default llm * Feat/ru prompted dists (#532) * feat: ignore env secret ru * feat: add access token * feat: distribution ruxglm * fix: ruxglm cards * fix: use use_auth_token * fix: eos tokens type * fix: stats cpu ram * fix: skills cards * fix: components cards * fix: eos tokens * fix: eos tokens * fix: eos tokens * fix: eos tokens * fix: cards and table * fix: ADDITIONAL_EOS_TOKENS * fix: ADDITIONAL_EOS_TOKENS * fix: codestyle * fix: universal * fix: dream persona ru skill name * fix: prompt selector ru * fix: replacement * fix: prompt selecrto * fix: use params * fix: timeout and history * fix: trye very short persona * fix: increase timeout * fix: sub replacement tokens correctly * fix: sub replacement tokens correctly * fix: use stopping criteria * fix: typing * fix: revert long persona * fix: duplicate spaces * fix: correct components for russian distribution * fix: proxy for russian distribution * feat: universal distr for ru * feat: universal distr for ru * fix: remove extra * fix: working configs * fix: configs * feat: ruxglm prompted dists * fix: component cards * fix: container name * first dist (no cards) * fix: remove extra space after new line * fix: remove extra space after new line * feat: tests for dream ruxglm * fix: proxy and ru lang * fix: change port of universal ru * fix: rights on file * fix: tests skills * fix: test for resp selector * multiskill_ru_assistant * fix: tests for proxied components * fairytale and action stories dists * journalist helper dist * fairytale fixes * one more fix * action stories cards * add quotation marks * fairytale cards * storyteller cards * journalist helper cards * multiskill ru cards * agent services cards * minor fixes * fix: utilize sentence ranker url --------- Co-authored-by: dilyararimovna <[email protected]> * update components.tsv (#537) * update components.tsv * tabulation * Feat/rugpt 3.5 distribution (#534) * feat: ignore env secret ru * feat: add access token * feat: distribution ruxglm * fix: ruxglm cards * fix: use use_auth_token * fix: eos tokens type * fix: stats cpu ram * fix: skills cards * fix: components cards * fix: eos tokens * fix: eos tokens * fix: eos tokens * fix: eos tokens * fix: cards and table * fix: ADDITIONAL_EOS_TOKENS * fix: ADDITIONAL_EOS_TOKENS * fix: codestyle * fix: universal * fix: dream persona ru skill name * fix: prompt selector ru * fix: replacement * fix: prompt selecrto * fix: use params * fix: timeout and history * fix: trye very short persona * fix: increase timeout * fix: sub replacement tokens correctly * fix: sub replacement tokens correctly * fix: use stopping criteria * fix: typing * fix: revert long persona * fix: duplicate spaces * fix: correct components for russian distribution * fix: proxy for russian distribution * feat: universal distr for ru * feat: universal distr for ru * fix: remove extra * fix: working configs * fix: configs * feat: ruxglm prompted dists * fix: component cards * fix: container name * fix: remove extra space after new line * fix: remove extra space after new line * feat: tests for dream ruxglm * fix: proxy and ru lang * fix: change port of universal ru * fix: rights on file * fix: tests skills * fix: test for resp selector * fix: tests for proxied components * feat: rugpt-3.5 by sber in universal russian distribution * fix; wait for it * fix: models card * fix: models card * fix: add to list * fix: change port * fix: change port * fix: change size to correct * feat: instruction how to add a new model * fix ru prompt selector, remove unused component (#538) * feat: replace oasst12b with gptjt (#541) * Feat/utilize rugpt35 (#540) * feat: utilize rugpt35 * feat: tests for jounrlist rugpt35 * feat: tests for jounrlist rugpt35 * fix: rights for tfile * feat: names * fix: ru_dists_names_and_prompts (#543) * rename ruxglm to u * more renaming * tabs * tabs * some more renaming * short prompt * many cards and name changes * fix typo * fixes for Dilya * tiny fix * tiny fix * huge name check * names * typo prompt * fix: no tests for non existing skills --------- Co-authored-by: dilyararimovna <[email protected]> * fix: cards for ru dists (#544) * fix: rugpt35 config and envs (#546) * Summarization models (#393) * Added abstractive summarization model for English texts * Added abstractive summarization model for Russian texts * Added summarization annotator * Moved rut5 summarizer to dream_russian * Changed endpoint * Added model path to Dockerfile * Updated test * Updated summarization annotator input * Updated test * Changed summarization service url * Changed test * Increased timeout * Updated ram_usage * Updated ports * Updated models cards * Added more info messages * Fixed path error * Added summarization output to bot attributes * Added timeout param to dockerfile * Updated model cards and ports * Fixed problem with incorrect batch processing * Updated summarization save format * Updated dialog summarization model * Updated tests * Minor formatting changes * Fixed black and flake8 codestyle * Fixed black codestyle * Updated models ports * Small fixes * Models table upd (#539) * Fix requirements.txt (#84) * fix itsdangerous requirements * pin itsdangerous requirements for all flask==1.1.1 servers * updated MODELS.md table: added info about models' licensing and commercial use + merged link+name cols to improve overall readability and decrease redundancy * Update MODELS.md fixed "is" for better consistency * fix: format table and add new models back * fix: sizes of models on gpu * updated table --------- Co-authored-by: Andrii.Hura <[email protected]> Co-authored-by: mtalimanchuk <[email protected]> Co-authored-by: Dilyara Baymurzina <[email protected]> * fix: anthropic model params (#547) * fix summarization annotator card (#549) * add cards for prompted robot * ports and n_utt * port * increase WAIT_HOSTS_TIMEOUT in cards --------- Co-authored-by: Dilyara Zharikova (Baymurzina) <[email protected]> Co-authored-by: Maxim Talimanchuk <[email protected]> Co-authored-by: Nikolay <[email protected]> Co-authored-by: Anastásis <[email protected]> Co-authored-by: Andrii.Hura <[email protected]>
No description provided.