Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jieba #1074

Closed
wants to merge 9 commits into from
Closed

Jieba #1074

wants to merge 9 commits into from

Conversation

calmzealA
Copy link

Using jieba to extact input tags,before best_match.

For example, suppose there are 10000+ corpus in mongo, while user_input is "where is the apple?"

Before:
The best_match logic get all the corpus in mongo (distinct by in_reponse.text) , the size is about 10000 .Then we compare all this corpus with input. This takes a lot time.

After:
This pull extracts tags of the input. for this example,tags are maybe "apple". Then we add a addtional search option to mongo, using regex ,to select the corpus only related with "apple". the size is more less than 10000 (maybe only 100). Then we compare this 100 corpus with input. This is more fast than before.

Notice:

1.JiaBa required.
2.Best perfomance with Chinese, compatible with english.
3.The first time we load jieba takes about 2s,for each bot.

@calmzealA
Copy link
Author

Find a similar pull with NLTK:
#945

However, mongodb only support full text search with Chinese language in the enterprise edition.

@gunthercox gunthercox closed this Nov 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants