search-engine

Information Retrieval Course project. Search engine implementing several models on a small collection of documents.

###Python version The project uses Python 3. It was tested with version 3.4.2.

##Dependencies The project uses several external packages:

nltk
pyparsing
matplotlib
pympler

You can retrieve them using pip install --user package-name. (Make sure you're running pip for Python 3)

Matplotlib is likely to cause troubles installing via pip, you may want to refer to the Matplotlib installing FAQ.

##Tests The tests can be run using nosetests3. Simply type nosetests3 at the root of the repository.

##Running ###Evaluation summary To run an evaluation summary on the index (creating the index and running different types of queries), use the stats.py file. You can change evaluation parameters in the stats.py script before running it with:

python stats.py

###REPL client To run an interactive console that lets you run queries or export the index, use the repl.py script. Simply run:

python repl.py

###Indexing INEX This operation WILL take a long time (more than one hour, depending on the number of processes) and will use a huge amount of RAM.

You must download the INEX 2007 collection and extract the archived files into a single folder (this folder must contain all the .xml documents). Edit the inex.py script to add the correct path to the corpus folder and run:

python inex.py

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
data		data
images		images
index		index
.gitignore		.gitignore
README.md		README.md
get_deps.sh		get_deps.sh
inex.py		inex.py
repl.py		repl.py
stats.py		stats.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

search-engine

About

Releases

Packages

Languages

pdib/search-engine

Folders and files

Latest commit

History

Repository files navigation

search-engine

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages