Learn and predict book authors from words using supervised learning.
For now, clone the repository and run python setup.py install
. It requires Pyton 3.x or greater, so be sure to set up a virtual environment if needed.
Clone the repository and run the following.
# install packages, change accordingly if not Ubuntu
sudo apt-get install python3 python3-dev libblas-dev libatlas-dev liblapack-dev gfortran
# set up virtual environment
pyvenv-3.3 ~/my-python3-env
source ~/my-python3-env/bin/activate
wget https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py -O - | python
easy_install pip
# install dependencies
pip install -r requirements.txt
To use the notebooks:
# install IPython
pip install ipython pyzmq jinja2 tornado
cd notebooks
ln -s ../books_classification .
# this should open a window in the browser
ipython3 notebook --cache-size=0 --pylab inline
- try ggplot2 and ggobi
- plot accuracy in X (training number, absolute) vs Y (number of authors), with color or surface
- add more tests
- use PyTables for persistence, cache and integration with Pandas
- web and DVD interface for Project Gutenberg releases
- integrate extractor parameters with sklearn's grid search
- write documentation/code examples with Sphinx
- bring back support for word associations, from branch "window_optimizations", integrate and try
- try codifying 10-grams or similar by hashing
- try 1-S instead of S for features (specially for vector version, so that zeros aren't discontinuities?)
- use doulbe dispatch and inversion of control instead of decorators to deal with extraction and encoding