Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New multitask 9in1 #207

Closed
wants to merge 171 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
171 commits
Select commit Hold shift + click to select a range
bc5fd39
Merge pull request #1 from deeppavlov/dev
dimakarp1996 Oct 10, 2022
48e6581
Update utils.py
dimakarp1996 Oct 10, 2022
8972e01
Update utils.py
dimakarp1996 Oct 10, 2022
8c39b84
Update requirements.txt
dimakarp1996 Oct 10, 2022
572df57
Update Dockerfile
dimakarp1996 Oct 10, 2022
4dfacef
Update README.md
dimakarp1996 Oct 10, 2022
a13e7c4
Update test.py
dimakarp1996 Oct 10, 2022
4c945fe
Update combined_classifier.json
dimakarp1996 Oct 10, 2022
c1ed326
Update server.py
dimakarp1996 Oct 10, 2022
6cf60b8
Update utils.py
dimakarp1996 Oct 10, 2022
380eabf
Update universal_templates.py
dimakarp1996 Oct 10, 2022
05514a5
Update dev_requirements.txt
dimakarp1996 Oct 10, 2022
c85894b
Update test_data.json
dimakarp1996 Oct 10, 2022
7dadef8
Update requirements.txt
dimakarp1996 Oct 10, 2022
cc7d873
Update requirements.txt
dimakarp1996 Oct 10, 2022
25733a6
Update requirements.txt
dimakarp1996 Oct 10, 2022
3241e89
Update requirements.txt
dimakarp1996 Oct 10, 2022
e07ece7
Update templates.py
dimakarp1996 Oct 10, 2022
20eb2f5
Update requirements.txt
dimakarp1996 Oct 10, 2022
a1842cd
Update requirements.txt
dimakarp1996 Oct 10, 2022
40caace
Update test_dialog.json
dimakarp1996 Oct 10, 2022
1abf00c
Update data.json
dimakarp1996 Oct 10, 2022
586b56c
Update scenario.py
dimakarp1996 Oct 10, 2022
d89d10d
Update test.py
dimakarp1996 Oct 10, 2022
cb7fae9
Update tests.json
dimakarp1996 Oct 10, 2022
0928cad
Update requirements.txt
dimakarp1996 Oct 10, 2022
d70a84f
Update skill.py
dimakarp1996 Oct 10, 2022
abc710b
Update requirements.txt
dimakarp1996 Oct 10, 2022
f4abd04
Update test_no_annotations.json
dimakarp1996 Oct 10, 2022
14259e8
Update combined_classifier.json
dimakarp1996 Oct 10, 2022
c9bf511
Codestyle using BLACK
dimakarp1996 Oct 10, 2022
68f6167
Update utils.py
dimakarp1996 Oct 10, 2022
21bbed9
Update test.py
dimakarp1996 Oct 10, 2022
a0ce333
Update test.py
dimakarp1996 Oct 10, 2022
fff4d77
Update server.py
dimakarp1996 Oct 10, 2022
cc4b381
Update test.py
dimakarp1996 Oct 10, 2022
f06c637
Update test.py
dimakarp1996 Oct 10, 2022
f68b7fb
Update test.py
dimakarp1996 Oct 10, 2022
93e8048
Update test.py
dimakarp1996 Oct 10, 2022
ea2f422
Update test.py
dimakarp1996 Oct 10, 2022
769623a
Update test.py
dimakarp1996 Oct 10, 2022
80ca056
Update test.py
dimakarp1996 Oct 10, 2022
48de78d
Update test.py
dimakarp1996 Oct 10, 2022
32fbf21
Update test.py
dimakarp1996 Oct 10, 2022
6d56299
Update server.py
dimakarp1996 Oct 10, 2022
60bca11
Update test.py
dimakarp1996 Oct 10, 2022
b804824
Update test.py
dimakarp1996 Oct 10, 2022
ac36219
Update test.py
dimakarp1996 Oct 10, 2022
0081a26
codestyle
Oct 10, 2022
e020aa2
Update utils.py
dimakarp1996 Oct 10, 2022
8e62af7
Renamed topic_classification, deleted unnesessary string
dimakarp1996 Oct 11, 2022
dd68e6b
Speeded up the combined classifier
Oct 11, 2022
bfb9f91
Update Dockerfile
dimakarp1996 Oct 11, 2022
f9ea06b
New version of DeepPavlov
Oct 11, 2022
9324ff5
Clean new combined - with fixed bug in checkout
Oct 11, 2022
cf1487a
Update README.md
dimakarp1996 Oct 11, 2022
01071a8
Further speeded up multitask BERT model
Oct 11, 2022
1751c02
Update Dockerfile
dimakarp1996 Oct 11, 2022
97aca70
I have done my best to speed up the multitask inference.
dimakarp1996 Oct 11, 2022
ef138ce
Update utils.py
dimakarp1996 Oct 26, 2022
68a716b
DeepPavlov version after several fixes. Also, new distil model ( not …
dimakarp1996 Oct 30, 2022
e5a50d0
hh
Oct 30, 2022
058e50d
Tests fixed
Oct 31, 2022
7c1e6c7
Merge branch 'new_multitask_9in1' into new_multitask_9in1_tmp
dimakarp1996 Oct 31, 2022
7f44802
Merge pull request #2 from dimakarp1996/new_multitask_9in1_tmp
dimakarp1996 Oct 31, 2022
8fc2598
Update test.py
dimakarp1996 Oct 31, 2022
7c1517a
codestyle
Oct 31, 2022
2f318bf
Returned cuda cache
dimakarp1996 Oct 31, 2022
9b8268e
integrate new commit
Oct 31, 2022
30987fc
Update Dockerfile
dimakarp1996 Oct 31, 2022
77d70dd
integrate new commit
Oct 31, 2022
486a432
Test change for memory profiling
dimakarp1996 Oct 31, 2022
7568eca
It should work much faster now
Oct 31, 2022
26adf16
It should work much faster now
Oct 31, 2022
f22a81e
It should work much faster now
Oct 31, 2022
a5549f8
It should work much faster now
Oct 31, 2022
d510ff2
Test editings to tackle test_dialog fail
Nov 1, 2022
2240724
Update server.py
dimakarp1996 Nov 2, 2022
621e9a2
Update combined_classifier.json
dimakarp1996 Nov 2, 2022
ad4b8a2
Merge pull request #3 from dimakarp1996/new_multitask_9in1_2
dimakarp1996 Nov 2, 2022
5850bf1
codestyle
Nov 2, 2022
02103ab
Update Dockerfile
dimakarp1996 Nov 2, 2022
5ab3110
Update combined_classifier.json
dimakarp1996 Nov 2, 2022
13f4840
Current test-passing version
Nov 3, 2022
b4d549c
Changed factoid criteria & postprocess for cobot topics and intents
Nov 3, 2022
2dbd218
Changed factoid criteria & postprocess for cobot topics and intents
Nov 3, 2022
579fdeb
Minor test fix - updated "random skills" list
Nov 3, 2022
0d1d0c6
codestyle
Nov 3, 2022
efbefe5
codestyle
Nov 3, 2022
7be7287
Update factoid.py
dimakarp1996 Nov 3, 2022
bebc9f5
Update connector.py
dimakarp1996 Nov 3, 2022
ab73f33
Utilize unified prob threshold in factoid skill.
dimakarp1996 Nov 3, 2022
4e4e9e5
Dilya's suggestion
dimakarp1996 Nov 8, 2022
f1e6f96
Dilya's suggestion
dimakarp1996 Nov 8, 2022
b3c9a35
Dilya's suggestions
dimakarp1996 Nov 8, 2022
80c074a
Dilya's suggestion
dimakarp1996 Nov 8, 2022
d67d998
Dilya's suggestion
dimakarp1996 Nov 8, 2022
917e34e
Dilya's comment
dimakarp1996 Nov 8, 2022
a7a1c04
Update Dockerfile
dimakarp1996 Nov 8, 2022
a6cb4f9
Update Dockerfile
dimakarp1996 Nov 8, 2022
3d26df4
Update combined_classifier.json
dimakarp1996 Nov 8, 2022
70238eb
Update README.md
dimakarp1996 Nov 8, 2022
fc18528
current changes
Nov 8, 2022
294ae09
Merge pull request #4 from dimakarp1996/new_multitask_9in1_tmp2
dimakarp1996 Nov 8, 2022
144527d
Codestyle
dimakarp1996 Nov 8, 2022
6892115
Added dependency to fix bug https://github.com/tiangolo/typer/issues/377
Nov 9, 2022
48cd44c
Merge pull request #5 from deeppavlov/dev
dimakarp1996 Nov 14, 2022
fca08d9
merge dev
dimakarp1996 Nov 14, 2022
c03d064
merge dev
dimakarp1996 Nov 14, 2022
3ac77c4
Update requirements.txt
dimakarp1996 Nov 15, 2022
f362cc7
Update requirements.txt
dimakarp1996 Nov 15, 2022
2ac7858
Update requirements.txt
dimakarp1996 Nov 15, 2022
14db758
Update requirements.txt
dimakarp1996 Nov 15, 2022
02adf0b
Update requirements.txt
dimakarp1996 Nov 15, 2022
41484a8
Update requirements.txt
dimakarp1996 Nov 15, 2022
65fbc98
Update requirements.txt
dimakarp1996 Nov 15, 2022
d7b713e
Update requirements.txt
dimakarp1996 Nov 15, 2022
cd73ef6
Update requirements.txt
dimakarp1996 Nov 15, 2022
8182a8e
Update requirements.txt
dimakarp1996 Nov 15, 2022
1478f0a
Update requirements.txt
dimakarp1996 Nov 15, 2022
c931578
Update requirements.txt
dimakarp1996 Nov 15, 2022
933dce2
Update requirements.txt
dimakarp1996 Nov 15, 2022
8b2e94a
Update requirements.txt
dimakarp1996 Nov 15, 2022
bc74828
Update requirements.txt
dimakarp1996 Nov 15, 2022
eb7981f
Update requirements.txt
dimakarp1996 Nov 15, 2022
4513d3f
Update requirements.txt
dimakarp1996 Nov 15, 2022
d3d1592
Update requirements.txt
dimakarp1996 Nov 15, 2022
b26466c
Update requirements.txt
dimakarp1996 Nov 15, 2022
8b33e45
Update requirements.txt
dimakarp1996 Nov 15, 2022
4367863
Fix broken dependencies
dimakarp1996 Nov 15, 2022
449be70
Update Dockerfile
dimakarp1996 Nov 21, 2022
d50472c
Update dev.yml
dimakarp1996 Nov 21, 2022
8bbdab7
Update combined_classifier.json
dimakarp1996 Nov 21, 2022
484defd
Update requirements.txt
dimakarp1996 Nov 21, 2022
d4ade70
Update requirements.txt
dimakarp1996 Nov 21, 2022
a4d9fcb
Update requirements.txt
dimakarp1996 Nov 21, 2022
4be6db4
Update requirements.txt
dimakarp1996 Nov 21, 2022
80d0ee0
Update requirements.txt
dimakarp1996 Nov 21, 2022
824d11d
Update requirements.txt
dimakarp1996 Nov 21, 2022
7a808df
Update requirements.txt
dimakarp1996 Nov 21, 2022
83c7e86
Update requirements.txt
dimakarp1996 Nov 21, 2022
6f880be
Update requirements.txt
dimakarp1996 Nov 21, 2022
a4e00e6
Update requirements.txt
dimakarp1996 Nov 21, 2022
36132bd
Update requirements.txt
dimakarp1996 Nov 21, 2022
966ccc1
Update requirements.txt
dimakarp1996 Nov 21, 2022
553051c
Update requirements.txt
dimakarp1996 Nov 21, 2022
67ac8a3
Update requirements.txt
dimakarp1996 Nov 21, 2022
4f5407d
Update requirements.txt
dimakarp1996 Nov 21, 2022
e0bff37
Added factoid threshold
dimakarp1996 Nov 21, 2022
d553ceb
Update connector.py
dimakarp1996 Nov 21, 2022
4a1b4af
Update server.py
dimakarp1996 Nov 21, 2022
7273818
Addressed Dilya's comments. Not tested yet
dimakarp1996 Nov 21, 2022
0cadb85
Codestyle
dimakarp1996 Nov 21, 2022
24c4db8
Codestyle
dimakarp1996 Nov 21, 2022
9b4bd4a
Suggested changes
Nov 21, 2022
6ffcd46
current version
Nov 22, 2022
9e454a0
Fixed sentence len
Nov 22, 2022
d34a342
Added setuptools dependency while numpy 1.18.0 not to fail on build
Nov 22, 2022
9c90a19
Merge pull request #9 from deeppavlov/dev
dimakarp1996 Nov 22, 2022
765ab40
Added setuptools dependency while numpy 1.18.0 not to fail on build
Nov 22, 2022
ab40e4e
Merge branch 'new_multitask_9in1' of https://github.com/dimakarp1996/…
Nov 22, 2022
ade5e0a
Still facing bug https://github.com/numpy/numpy/issues/22623 - restri…
Nov 22, 2022
3dcd471
h
Nov 22, 2022
b48785e
Try to fix bug in test_dialog in utils/analyze_downloads.py while imp…
dimakarp1996 Nov 22, 2022
6ae06bd
Merge branch 'dev' into new_multitask_9in1
dimakarp1996 Nov 29, 2022
bf0772e
Update utils.py
dimakarp1996 Nov 30, 2022
f60965c
Update utils.py
dimakarp1996 Nov 30, 2022
2803226
Threshold fixes as siggested by Dilya
Nov 30, 2022
0bca42b
Different thresholds for dp topics as suggested by Dilya
dimakarp1996 Nov 30, 2022
caf17fd
Cosmetic change
Nov 30, 2022
8acf2a3
Merge branch 'dev' into new_multitask_9in1
dimakarp1996 Nov 30, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion annotators/BadlistedWordsDetector/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ sentry-sdk==0.12.3
spacy==3.0.5
click==7.1.2
jinja2<=3.0.3
Werkzeug<=2.0.3
Werkzeug<=2.0.3
2 changes: 1 addition & 1 deletion annotators/BadlistedWordsDetector_ru/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ spacy==3.0.5
click==7.1.2
pymorphy2==0.9.1
jinja2<=3.0.3
Werkzeug<=2.0.3
Werkzeug<=2.0.3
14 changes: 3 additions & 11 deletions annotators/combined_classification/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,14 +1,8 @@
FROM deeppavlov/base-gpu:0.12.1
RUN pip install git+https://github.com/deeppavlov/[email protected]
FROM deeppavlov/base-gpu:0.17.5

#RUN rm DeepPavlov
RUN pip install git+https://github.com/deeppavlov/DeepPavlov.git@a53c42062e4bccf6ec63021ec6bd7b9fbe23f091

#Set up git lfs for your user account: git lfs install
WORKDIR /base
RUN rm -rf DeepPavlov
RUN git clone https://github.com/dimakarp1996/DeepPavlov.git
WORKDIR /base/DeepPavlov
RUN git checkout pal-bert+ner

ARG CONFIG

Expand All @@ -21,9 +15,7 @@ RUN mkdir common

COPY annotators/combined_classification/ ./
COPY common/ common/
RUN ls /tmp

RUN pip install -r requirements.txt
ARG DATA_URL=http://files.deeppavlov.ai/alexaprize_data/pal_bert_7in1/model.pth.tar
ADD $DATA_URL /tmp

CMD gunicorn --workers=1 --bind 0.0.0.0:8087 --timeout=300 server:app
23 changes: 22 additions & 1 deletion annotators/combined_classification/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,22 @@
BERT Base model for 6 tasks - cobot topics cobot dialogact topics cobot dialogact intent emotion sentiment toxic
This model is based on the transformer-agnostic multitask neural architecture. It can solve several tasks similtaneously, almost as good as single-task models.

The models were trained on the following datasets:

**Factoid classification** : For the Factoid task, we used the same Yahoo ConversVsInfo dataset that was used to train the Dream socialbot in Alexa Prize . Note that the valid set in this task was equal to the test set.

**Midas classification** : For the Midas task, we used the same Midas classification dataset that was used to train the Dream socialbot in Alexa Prize . Note that the valid set in this task was equal to the test set.

**Emotion classification** :For the Emotion classification task, we used the emo\_go\_emotions dataset, with all the 28 classes compressed into the seven basic emotions as in the original paper. Note that these 7 emotions are not exactly the same as the 7 emotions in the original Dream socialbot in Alexa Prize: 1 emotion differs (love VS disgust), so the scores are incomparable with the original model. Note that this task is multiclass.

**Topic classification**: For the Topic classification task, we used the dataset made by Dilyara Zharikova. The dataset was further filtered and improved for the final model version, to make the model suitable for DREAM. Note that the original topics model doesn’t account for that dataset changes(which were also about class number) and thus its scores are not compatible with the scores we have.

**Sentiment classification** : For the Sentiment classification task, we used the Dynabench dataset (r1 + r2).

**Toxic classification** : For the toxic classification task, we used the dataset from kaggle <https://www.kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification/datawith> the 7 toxic classes that pose an interest to us. Note that this task is multilabel.

The model also contains 3 replacement models for Amazon services.

The models (multitask and comparative single task) were trained with initial learning rate 2e-5(with validation patience 2 it could be dropped 2 times), batch size 32,optimizer adamW(betas (0.9,0.99) and early stop on 3 epochs. The criteria on early stopping was average accuracy for all tasks for multitask models, or the single-task accuracy for singletask models.

This model(with a distilbert-base-uncased backbone) takes only 2439 Mb for 9 tasks, whereas single-task models with the same backbone for every of these tasks take up almost the same memory(~2437 Mb for every of these 9 tasks).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

это точно будет выглядеть некрасиво - полотном.сделай заголовок и поинты - с помощью *

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

сделано


Loading