Make a directory
git clone https://github.com/EECS486/Bezos.git
Virtual Env acts as a virtual enviornment so that we can virtually install python packages and not overwrite the ones on our system
pip install virtualenv
Go to your directory where you cloned the repository
$ cd Bezos
$ virtualenv -p python3 env
$ source env/bin/activate
- Make sure in home directory
pip install -r requirements.txt
$ source bin/activate
pip install package
pip freeze > requirements.txt
$ deactivate
-
data_analytics_erneh.py - displays analytics for the review and metadata data
-
jsonReviewRead.py - review parser for classification models
-
jsonReviewRead_vBlackfyre.py - review parser for classification models
-
metadata.py - metadata parser for classification models
-
naivebayes.py - naivebayes model classifer
-
porter.py - porter stemmer
-
reviewdata.py - reviewer parser for naive bayes model
-
reviewdataNB.py - review parser for naive bayes model
-
linking_and_metrics.py - links the metadata and review data and calculates analytics for each review
-
modelGeneration.py - generates each classification model and projects helpfulness of amazon reviews
-
output/ - output for naive bayes and classification models
-
plots/ - plots for feature importance
- Download and Extract the contents of this folder into the repository https://drive.google.com/file/d/1QCZXLE9F9BqI2k2y3APi4tPItdREnhMS/view?usp=sharing
- It contains the json review and metadata files used to generate the review data along with pickle files to make the model generation faster
- Open Naive Bayes
- set the line: params = {"stem": False, "stop": True, "condProb": True, 'bigram': True} to what parameters you want to run the naive bayes with
- Stem: stems words
- stop: remove stop words
- condProb: make sure True
- bigram: creates a bigram model instead of a unigram model
- Install Stanford CoreNLP https://stanfordnlp.github.io/CoreNLP/index.html
- Go To Directory Where Installed Stanford CoreNLP
- Follow Instructions to Run Stanford CoreNLP on Port 9000 https://stanfordnlp.github.io/CoreNLP/corenlp-server.html#getting-started
- In linking_and_metrics.py
- Set category to the category you want to process. ie: 'Grocery_and_Gourmet_Food'
- Run data_analytics_erneh.py to generate statics on the whole review set
- In modelGeneration.py
- Set category to the category you want to process. ie: 'Grocery_and_Gourmet_Food'
- Note: pkl files must be generated for said category