Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summarization models #393

Merged
merged 33 commits into from
Aug 4, 2023
Merged

Conversation

Kolpnick
Copy link
Contributor

No description provided.

logger.info(dialog)

try:
summary = requests.post(SUMMARIZATION_SERVICE_URL, json={"sentences": dialog}, timeout=3).json()[0]['batch']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 -- мало. у меня один из твоих тестов из-за этого не проходит (логи). я бы поставила ну хотя бы 5 (хотя для пересказов диалогов с генеративками где одна реплика может содержать кучу токенов и этого будет мало) или вообще этот таймаут вынесла как аргумент, как сделано у генеративных сервисов. но про вынести не уверена, надо у Дили уточнить.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

в пайплане соответственно тоже увеличить надо будет

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Исправил в коммите

annotators/summarization_annotator/test.py Show resolved Hide resolved
annotators/summarization_annotator/server.py Outdated Show resolved Hide resolved
annotators/summarization_annotator/server.py Show resolved Hide resolved
- group: annotators
connector:
protocol: http
timeout: 2.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

в других местах у тебя другой таймаут у summarization-annotator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Исправил в коммите

- group: services
connector:
protocol: http
timeout: 2.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

мало

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Исправил в коммите

- group: services
connector:
protocol: http
timeout: 2.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

тоже мало

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Исправил в коммите

is_customizable: false
author: DeepPavlov
description: Annotator that accesses summarization services
ram_usage: 1G
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

так много? в докер-компоуз файле 256мб

Copy link
Contributor Author

@Kolpnick Kolpnick Jul 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Исправил в коммите

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

256мб, не гб 😦

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Не заметил)) Но данный файл всё равно был удалён при изменении карточек в коммите

@smilni
Copy link
Contributor

smilni commented Jun 30, 2023

еще -- подмержь свежий дев, разреши конфликт. посмотри на то, как изменились карточки, сделай по образцу сервисов и аннотаторов в деве. добавь карточки в папку components, порты в файл components.tsv (проверь чтоб твои были незаняты, кстати)

Comment on lines 533 to 535
"directory": "annotators/summarization_annotator",
"container": "summarization-annotator",
"endpoint": "respond"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

посмотри как это сделано в других сервисах, сделай по аналогии. карточки сейчас обязательны для всего

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

то же самое про русский дрим

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Исправил в коммите

@smilni
Copy link
Contributor

smilni commented Jul 13, 2023

пожалуйста, подмержь свежий дев! и в целом каждый раз когда вносишь изменения, 1) подмерживай дев, 2) разрешай конфликты, если они есть (щас их много)

@smilni
Copy link
Contributor

smilni commented Jul 13, 2023

Общий комментарий по архитектуре -- ты не совсем правильно работаешь с батчами.
Во-первых, в форматтерах по аналогии с другими (utt_sentseg_punct_dialog, full_dialog, etc.) возвращай [{"dialogs": [dialog]}]
И тогда в аннотаторе соответственно доставай и обрабатывай батч (request.json.get('dialogs', [])), а не один диалог

Comment on lines 55 to 56
result = ""
result += prev_summary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

может просто result = prev_summary вместо двух строк?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Исправил в коммите

Comment on lines 58 to 60
if new_summary:
result += " " + new_summary
result = result.strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

я бы заменила на

if new_summary:
  result = f"{result} {new_summary}".strip()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Исправил в коммите

start_time = time.time()
sentences = request.json.get("sentences", [])
logger.debug(f"Sentences: {sentences}")
tokenized_text = tokenizer(sentences, max_length=512, return_tensors="pt", truncation=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

а почему больше 512 не может быть?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Исправил в коммите

Copy link
Collaborator

@dilyararimovna dilyararimovna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

почти все, пару вопросов и все


for dialog, prev_summary in zip(dialogs_batch, summaries_batch):
logger.info(f"summarization-annotator received dialog: {dialog}")
logger.info(f"summarization-annotator received previous summary: {[prev_summary]}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

мб перевести в режим debug эти два лога, а то логи будут быстро разрастаться

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Исправил в коммите

if new_summary:
result = f"{result} {new_summary}".strip()
summarization_attribute.append({"bot_attributes": {"summarized_dialog": result}})
logger.info(f"summarization-annotator output: {summarization_attribute}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ну или оставить только этот принт что ли



def get_summary(dialog):
summary = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

почему дефолтное значение лист, если обычно там строка лежит? надо пустую строку сделать дефолтным значением тогда

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Исправил в коммите

@dilyararimovna dilyararimovna merged commit fe6adc7 into deeppavlov:dev Aug 4, 2023
2 checks passed
smilni pushed a commit that referenced this pull request Aug 8, 2023
* Added abstractive summarization model for English texts

* Added abstractive summarization model for Russian texts

* Added summarization annotator

* Moved rut5 summarizer to dream_russian

* Changed endpoint

* Added model path to Dockerfile

* Updated test

* Updated summarization annotator input

* Updated test

* Changed summarization service url

* Changed test

* Increased timeout

* Updated ram_usage

* Updated ports

* Updated models cards

* Added more info messages

* Fixed path error

* Added summarization output to bot attributes

* Added timeout param to dockerfile

* Updated model cards and ports

* Fixed problem with incorrect batch processing

* Updated summarization save format

* Updated dialog summarization model

* Updated tests

* Minor formatting changes

* Fixed black and flake8 codestyle

* Fixed black codestyle

* Updated models ports

* Small fixes
dilyararimovna added a commit that referenced this pull request Aug 9, 2023
* robot first commit (no cards)

* feat: do not use sentence ranker url from env (#535)

* Feat/ruxglm prompted dist (#528)

* feat: ignore env secret ru

* feat: add access token

* feat: distribution ruxglm

* fix: ruxglm cards

* fix: use use_auth_token

* fix: eos tokens type

* fix: stats cpu ram

* fix: skills cards

* fix: components cards

* fix: eos tokens

* fix: eos tokens

* fix: eos tokens

* fix: eos tokens

* fix: cards and table

* fix: ADDITIONAL_EOS_TOKENS

* fix: ADDITIONAL_EOS_TOKENS

* fix: codestyle

* fix: universal

* fix: dream persona ru skill name

* fix: prompt selector ru

* fix: replacement

* fix: prompt selecrto

* fix: use params

* fix: timeout and history

* fix: trye very short persona

* fix: increase timeout

* fix: sub replacement tokens correctly

* fix: sub replacement tokens correctly

* fix: use stopping criteria

* fix: typing

* fix: revert long persona

* fix: duplicate spaces

* fix: correct components for russian distribution

* fix: proxy for russian distribution

* feat: universal distr for ru

* feat: universal distr for ru

* fix: remove extra

* fix: working configs

* fix: configs

* feat: ruxglm prompted dists

* fix: component cards

* fix: container name

* fix: remove extra space after new line

* fix: remove extra space after new line

* feat: tests for dream ruxglm

* fix: proxy and ru lang

* fix: change port of universal ru

* fix: rights on file

* fix: tests skills

* fix: test for resp selector

* fix: tests for proxied components

* fix: remove do sample true

* fix: generative params

* feat: used sentence ranker url

* feat: utilized default llm

* Feat/ru prompted dists (#532)

* feat: ignore env secret ru

* feat: add access token

* feat: distribution ruxglm

* fix: ruxglm cards

* fix: use use_auth_token

* fix: eos tokens type

* fix: stats cpu ram

* fix: skills cards

* fix: components cards

* fix: eos tokens

* fix: eos tokens

* fix: eos tokens

* fix: eos tokens

* fix: cards and table

* fix: ADDITIONAL_EOS_TOKENS

* fix: ADDITIONAL_EOS_TOKENS

* fix: codestyle

* fix: universal

* fix: dream persona ru skill name

* fix: prompt selector ru

* fix: replacement

* fix: prompt selecrto

* fix: use params

* fix: timeout and history

* fix: trye very short persona

* fix: increase timeout

* fix: sub replacement tokens correctly

* fix: sub replacement tokens correctly

* fix: use stopping criteria

* fix: typing

* fix: revert long persona

* fix: duplicate spaces

* fix: correct components for russian distribution

* fix: proxy for russian distribution

* feat: universal distr for ru

* feat: universal distr for ru

* fix: remove extra

* fix: working configs

* fix: configs

* feat: ruxglm prompted dists

* fix: component cards

* fix: container name

* first dist (no cards)

* fix: remove extra space after new line

* fix: remove extra space after new line

* feat: tests for dream ruxglm

* fix: proxy and ru lang

* fix: change port of universal ru

* fix: rights on file

* fix: tests skills

* fix: test for resp selector

* multiskill_ru_assistant

* fix: tests for proxied components

* fairytale and action stories dists

* journalist helper dist

* fairytale fixes

* one more fix

* action stories cards

* add quotation marks

* fairytale cards

* storyteller cards

* journalist helper cards

* multiskill ru cards

* agent services cards

* minor fixes

* fix: utilize sentence ranker url

---------

Co-authored-by: dilyararimovna <[email protected]>

* update components.tsv (#537)

* update components.tsv

* tabulation

* Feat/rugpt 3.5 distribution (#534)

* feat: ignore env secret ru

* feat: add access token

* feat: distribution ruxglm

* fix: ruxglm cards

* fix: use use_auth_token

* fix: eos tokens type

* fix: stats cpu ram

* fix: skills cards

* fix: components cards

* fix: eos tokens

* fix: eos tokens

* fix: eos tokens

* fix: eos tokens

* fix: cards and table

* fix: ADDITIONAL_EOS_TOKENS

* fix: ADDITIONAL_EOS_TOKENS

* fix: codestyle

* fix: universal

* fix: dream persona ru skill name

* fix: prompt selector ru

* fix: replacement

* fix: prompt selecrto

* fix: use params

* fix: timeout and history

* fix: trye very short persona

* fix: increase timeout

* fix: sub replacement tokens correctly

* fix: sub replacement tokens correctly

* fix: use stopping criteria

* fix: typing

* fix: revert long persona

* fix: duplicate spaces

* fix: correct components for russian distribution

* fix: proxy for russian distribution

* feat: universal distr for ru

* feat: universal distr for ru

* fix: remove extra

* fix: working configs

* fix: configs

* feat: ruxglm prompted dists

* fix: component cards

* fix: container name

* fix: remove extra space after new line

* fix: remove extra space after new line

* feat: tests for dream ruxglm

* fix: proxy and ru lang

* fix: change port of universal ru

* fix: rights on file

* fix: tests skills

* fix: test for resp selector

* fix: tests for proxied components

* feat: rugpt-3.5 by sber in universal russian distribution

* fix; wait for it

* fix: models card

* fix: models card

* fix: add to list

* fix: change port

* fix: change port

* fix: change size to correct

* feat: instruction how to add a new model

* fix ru prompt selector, remove unused component (#538)

* feat: replace oasst12b with gptjt (#541)

* Feat/utilize rugpt35 (#540)

* feat: utilize rugpt35

* feat: tests for jounrlist rugpt35

* feat: tests for jounrlist rugpt35

* fix: rights for tfile

* feat: names

* fix: ru_dists_names_and_prompts (#543)

* rename ruxglm to u

* more renaming

* tabs

* tabs

* some more renaming

* short prompt

* many cards and name changes

* fix typo

* fixes for Dilya

* tiny fix

* tiny fix

* huge name check

* names

* typo prompt

* fix: no tests for non existing skills

---------

Co-authored-by: dilyararimovna <[email protected]>

* fix: cards for ru dists (#544)

* fix: rugpt35 config and envs (#546)

* Summarization models (#393)

* Added abstractive summarization model for English texts

* Added abstractive summarization model for Russian texts

* Added summarization annotator

* Moved rut5 summarizer to dream_russian

* Changed endpoint

* Added model path to Dockerfile

* Updated test

* Updated summarization annotator input

* Updated test

* Changed summarization service url

* Changed test

* Increased timeout

* Updated ram_usage

* Updated ports

* Updated models cards

* Added more info messages

* Fixed path error

* Added summarization output to bot attributes

* Added timeout param to dockerfile

* Updated model cards and ports

* Fixed problem with incorrect batch processing

* Updated summarization save format

* Updated dialog summarization model

* Updated tests

* Minor formatting changes

* Fixed black and flake8 codestyle

* Fixed black codestyle

* Updated models ports

* Small fixes

* Models table upd (#539)

* Fix requirements.txt (#84)

* fix itsdangerous requirements

* pin itsdangerous requirements for all flask==1.1.1 servers

* updated MODELS.md table: added info about models' licensing and commercial use + merged link+name cols to improve overall readability and decrease redundancy

* Update MODELS.md

fixed "is" for better consistency

* fix: format table and add new models back

* fix: sizes of models on gpu

* updated table

---------

Co-authored-by: Andrii.Hura <[email protected]>
Co-authored-by: mtalimanchuk <[email protected]>
Co-authored-by: Dilyara Baymurzina <[email protected]>

* fix: anthropic model params (#547)

* fix summarization annotator card (#549)

* add cards for prompted robot

* ports and n_utt

* port

* increase WAIT_HOSTS_TIMEOUT in cards

---------

Co-authored-by: Dilyara Zharikova (Baymurzina) <[email protected]>
Co-authored-by: Maxim Talimanchuk <[email protected]>
Co-authored-by: Nikolay <[email protected]>
Co-authored-by: Anastásis <[email protected]>
Co-authored-by: Andrii.Hura <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants