Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow statements to be filtered by their tags #925

Closed
gunthercox opened this issue Aug 12, 2017 · 11 comments
Closed

Allow statements to be filtered by their tags #925

gunthercox opened this issue Aug 12, 2017 · 11 comments
Assignees
Labels
Milestone

Comments

@gunthercox
Copy link
Owner

This issue is blocked by #924 and can't be worked on until those changes are made.


Changes need to be made to ChatterBot's storage adapters so that statements can be filtered by their tags attribute.

This is an important change that will allow more efficient response generation. After this change is made it will be possible to add a processing step where new input statements are classified based on the labels of categories of the dialog that was used to train the chat bot (adding this additional processing step will have a separate issue and pull request created for it).

@vkosuri
Copy link
Collaborator

vkosuri commented Aug 12, 2017

@gunthercox Master, If you didn't started working on this, I am happy to work on this feature, Is there any ETA to do this?

@vkosuri
Copy link
Collaborator

vkosuri commented Aug 12, 2017

I am planning to write a utility method to find categories something like below

Question: If same/similar statement present in multiple categories are we going considering those?

def find_categories(corpus_paths, input_statement):
    data_file_paths = self.list_corpus_files(corpus_paths)
    categories = [list]
    for file_path in data_file_paths:
        data = read_corpus(file_path):
        for k, v in data.items():
            if input_statement in v:
                categories.append(k)
                
    return categories

@gunthercox
Copy link
Owner Author

@vkosuri Thank you, that would be greatly appreciated.

Yes, multiple categories for a statement should be supported.

Also, if you check the corpora variable on #L119 of trainers.py you should be able to get the categories from it. They are being added in the load_corpus function of chatterbot-corpus.

corpora = self.corpus.load_corpus(corpus_path)
for category in corpora.categories:
    # ...

Thank you again, let me know if anything comes up that I can assist with. 👍

@vkosuri
Copy link
Collaborator

vkosuri commented Aug 13, 2017

Thanks Master, Another question

  1. is None is the default category?
  2. If user didn't specified any category, do we need search whole corpora? If not what's your thoughts?

@gunthercox
Copy link
Owner Author

  1. is None is the default category?

Since a statement can have multiple categories, the default should be an empty list.

  1. If user didn't specified any category, do we need search whole corpora? If not what's your thoughts?

Yes, but that is going to be a different pull request. #925 describes this a bit more.

@Issen007
Copy link

@gunthercox @vkosuri in what area should we start using tags?
I'm playing around with this at the moment and could I add tags example in to the Corpus training data files?

Thanks

@gunthercox
Copy link
Owner Author

@Issen007 Tags will be added to the statements based on the category field from their corpus data (at least when training with the chatterbot-corpus).

This was referenced Aug 18, 2017
@jxfruit
Copy link

jxfruit commented Sep 30, 2017

@vkosuri hi,bro. you are really great. I am caught in problem with efficiency, even i use mongodb to be storage adapter, the response is still too slow. I found you are very active. So can you share some experience of improving efficiency or using methods. thank u very much!!!

@gunthercox gunthercox self-assigned this Oct 10, 2017
@calmzealA
Copy link

Great work! This is a very important feature.
It's slow for now while corpus over 10000 records.

@ghost
Copy link

ghost commented Nov 23, 2017

Might want to consider porting the logic adapters over to C. Even with tags, you will be crunching too much data for python to handle within any reasonable time limit.

@gunthercox
Copy link
Owner Author

I'm going to close this off now that the ability to filter statement results by tags has been added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants