Jieba #1074

calmzealA · 2017-11-15T10:12:45Z

Using jieba to extact input tags,before best_match.

For example, suppose there are 10000+ corpus in mongo, while user_input is "where is the apple?"

Before:
The best_match logic get all the corpus in mongo (distinct by in_reponse.text) , the size is about 10000 .Then we compare all this corpus with input. This takes a lot time.

After:
This pull extracts tags of the input. for this example,tags are maybe "apple". Then we add a addtional search option to mongo, using regex ,to select the corpus only related with "apple". the size is more less than 10000 (maybe only 100). Then we compare this 100 corpus with input. This is more fast than before.

Notice:

1.JiaBa required.
2.Best perfomance with Chinese, compatible with english.
3.The first time we load jieba takes about 2s,for each bot.

pass input_statement to storage.

calmzealA · 2017-11-15T13:04:20Z

Find a similar pull with NLTK:
#945

However, mongodb only support full text search with Chinese language in the enterprise edition.

calmzealA and others added 9 commits November 15, 2017 16:08

Jieba tags for mongo

942228b

pass input_statement to storage.

Jieba tags for mongo

a38b7cf

Jieba tags for mongo

2bc6b32

Update django_storage.py

b77225b

Update requirements.txt

1b8cb11

Update mongodb.py

5505079

linting

916e010

linting

d527c38

update mongodb

b25f0c5

gunthercox closed this Nov 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jieba #1074

Jieba #1074

calmzealA commented Nov 15, 2017

calmzealA commented Nov 15, 2017

Jieba #1074

Jieba #1074

Conversation

calmzealA commented Nov 15, 2017

calmzealA commented Nov 15, 2017