Skip to content

pdib/search-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

search-engine

Information Retrieval Course project. Search engine implementing several models on a small collection of documents.

###Python version The project uses Python 3. It was tested with version 3.4.2.

##Dependencies The project uses several external packages:

  • nltk
  • pyparsing
  • matplotlib
  • pympler

You can retrieve them using pip install --user package-name. (Make sure you're running pip for Python 3)

Matplotlib is likely to cause troubles installing via pip, you may want to refer to the Matplotlib installing FAQ.

##Tests The tests can be run using nosetests3. Simply type nosetests3 at the root of the repository.

##Running ###Evaluation summary To run an evaluation summary on the index (creating the index and running different types of queries), use the stats.py file. You can change evaluation parameters in the stats.py script before running it with:

python stats.py

###REPL client To run an interactive console that lets you run queries or export the index, use the repl.py script. Simply run:

python repl.py

###Indexing INEX This operation WILL take a long time (more than one hour, depending on the number of processes) and will use a huge amount of RAM.

You must download the INEX 2007 collection and extract the archived files into a single folder (this folder must contain all the .xml documents). Edit the inex.py script to add the correct path to the corpus folder and run:

python inex.py

About

Information Retrieval course project.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published