Search Engine

This is a single-node toy/demonstration of a search engine distributed system.

Components:

Web Server
- serves a simple html page with search input text box
- on submit the query is logged to an analytics log and the top 10 search results ranked by TF-IDF are returned
Analytics cron job
- reads the analytics log and constructs a Trie with caching to serve autocomplete suggestions
Web Crawler cron job
- Builds an Inverted Index from scraped web pages starting with Hacker News as a seed url

Running

Prerequisites for running:

make
Docker
A web browser

To run the application

make build
docker-compose up
Open a browser to localhost:3000
Start submitting queries
If you want to refresh the search index, run make inverted_index

Note: the autosuggest trie is refreshed every 30 seconds.

Development

Prerequisites for developing:

Python/Pip

Create a virtual environment

python -m venv .venv

Install requirements

make install

Run tests

make test

TODO

Move the web crawler cron job to docker-compose: unfortunately Selenium web_driver is currently not supported in the docker environment, so you'll have refresh the index yourself

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
logs		logs
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Search Engine

Running

Development

TODO

About

Releases

Packages

Languages

License

mweiden/search-engine

Folders and files

Latest commit

History

Repository files navigation

Search Engine

Running

Development

TODO

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages