K-Cap 2017 Project
Note: The evaluation results for K-Cap 2017 paper is in "Evaluation results" Folder.
Please read the entire README file before doing anything
Python 2.7 👍
- numpy
- glove_python
- sklearn
- practnlptool
- textrazor
- cPickle
- distance
- nltk
- SocketServer
- urllib
- json
- re
- glove precomputed data files
- download from (http://nlp.stanford.edu/data/glove.6B.zip) the default file used by the code is (glove.6B.50d.txt)
- patty data files (included )
- text-razor API key
PS. File names are self explanatory
- Tagger: POS tagger
- Splitter: split the question into combinations
- Embedder: glove wrapper to convert question into vectors
- Reader: PATTY data reader
- Backend: the complete process of reading PATTY data and create embeddings, with the cosine similarity code
- Frontend: the complete process of reading a question and processing it
- Textrazor_Api: the API wrapper for the textrazor service
- main: where the magic happens
- api: for the web UI interface
- webService: for calling the system as a web service locally
Running the service via ./webService.py [port]
and calling it is simple i.e. (http://localhost/question_url_encoded)
- Response will look like this:
{
"results":[
"result 1",
"result 2", ...
],
"parts":[
"part 1",
"part 2", ...
],
"pos":[
[
"word",
"pos tag"
], ...
],
"relation 1": "dbpedia relation lable",
"relation 2": "dbpedia relation lable",
...
"relation N": "dbpedia relation lable",
"gen_question":"generalized question here",
"question":"the input question"
}
please run the code once using the main file to create the *.dat files that will be just loaded other times which will reduce processing time because not extra processing is done.
running main file is straightforward ./main.py
Please email 📧 to Yaser ([email protected]) or Kuldeep ([email protected]) if you face any problem.