Skip to content

Latest commit

 

History

History
48 lines (37 loc) · 2.3 KB

README.md

File metadata and controls

48 lines (37 loc) · 2.3 KB

intent-reco

Template-based intent recognition system built on word embedding models.

This repository was created as a part of the diploma thesis Semantic Sentence Similarity for Intent Recognition Task. While all the included code works and is well documented, as of now, it might be hard for anyone to actually use it, but feel free to try. It is my intention to convert it into a proper Python library in the future.

Requirements

Install Python (3.6 or higher) dependencies using pipenv:

pip3 install --user pipenv
pipenv install --dev

The intent recognition system is dependent on the used embedding model. These models are loaded using the wrappers in intent_reco/embeddings/ directory.

Currently supported embedding algorithms:

Depending on the embedding algorithm used, it might be needed to install its implementation. You will find the installation instructions for each algorithm on the respective repository.

Model compression

Module intent_reco/model_compression.py includes functions for compressing embedding models. It is able to compress the models by using different versions of vocabulary pruning and vector quantization.

The vector quantization is based on the LBG clustering algorithm, which is implemented in the module intent_reco/utils/lbg.py.

Intent recognition

The resulting intent recognition system is implemented in the module intent_reco/intent_query.py. As of now, it loads an embedding model and a set of intent templates from a JSON file (the templates are further quantized). The user can then write sentences in the command line and the algorithm will output the matched intent, together with the respective template and its cosine similarity to the input sentence.

Examples of an embedding model and a JSON template file can be found in the data directory.