domainadaption

0. Available data

Amazon product reviews in four categories: books, dvd, electronics, and kitchen & housewares.
1000 positive, 1000 negative and various unlabeled reviews per category.
Data is available here.

├── sorted_data_acl/
    ├── books/
    │   ├── negative.review
    │   ├── positive.review
    │   ├── unlabeled.review
    ├── dvd/
    │   ├── negative.review
    │   ├── positive.review
    │   ├── unlabeled.review
    ├── electronics/
    │   ├── negative.review
    │   ├── positive.review
    │   ├── unlabeled.review
    |── kitchen_&_houswares/
        ├── negative.review
        ├── positive.review
        ├── unlabeled.review

1. Create Embeddings

1.1. Run preprocess_dataset_for_embeddings.py

This will create a reviews_forEmbedding.txt file in each category folder. The file will contain all reviews (positive, negative and unlabeled) of that categories with one sentence of a review per line. The sentences do not contain any special characters or any punctuation.

1.2. Run sorted_data_acl/merge_reviews.sh

This will merge all the above files into one file and store them in the sorted_data_acl/all/ folder.

1.3. Run create_word_embeddings.sh

This will create word embeddings for each category (including all) of the reviews using GloVe. In particular, this creates the following 4 files in each category folder:

reviews.vocab: word count per category in the format word -> count
reviews.cooccur: cooccurance matrix of words
reviews.cooccur.shuf: sorted cooccurence matrix
reviews.vectors.txt: word embeddings per category in the format word -> vector

1.4. Run build_embedding_dictionary.py

This will create Python dictionaries in the format word -> vector from the files reviews.vectors.txt and store it in the files reviews.vectors.pkl.

2. Transform Text Reviews into Embedded Reviews

2.1. Run preprocess_dataset.py

This will create a reviews_positive.txt, ratings_positive.txt, reviews_negative.txt and ratings_revative.txt files in each category folder. The files will contain the respective reviews and ratings with one review/rating per line.

2.2. Run merge_preprocessed_reviews.py

This will merge the preprocessed reviews from all four categories into the all/ folder.

2.3. Run embed_reviews.py

This will transfrom the text reviews into embedded reviews by converting each word into a vector using the dictionaries from previous steps. The resulting matrices will be stores in reviews_positive.npy and reviews_negative.npy.

3. Classify Sentiments

3.1. Run sentiment_classification.py

This will train a neural network to classify the sentiments in each category.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
exp		exp
sorted_data_acl		sorted_data_acl
v4		v4
v5		v5
README.md		README.md
build_embedding_dictionary.py		build_embedding_dictionary.py
build_integer_dictionary.py		build_integer_dictionary.py
count_wordsNotInVocab.py		count_wordsNotInVocab.py
create_word_embeddings.sh		create_word_embeddings.sh
embed_reviews.py		embed_reviews.py
embed_reviews_hdf5.py		embed_reviews_hdf5.py
embed_reviews_into_integers.py		embed_reviews_into_integers.py
helper.py		helper.py
merge_preprocessed_reviews.py		merge_preprocessed_reviews.py
plot_history.ipynb		plot_history.ipynb
preprocess_dataset.py		preprocess_dataset.py
preprocess_dataset_for_embeddings.py		preprocess_dataset_for_embeddings.py
preprocess_dataset_xml.py		preprocess_dataset_xml.py
requirements.txt		requirements.txt
review_tokenizer.py		review_tokenizer.py
sentiment_classification.py		sentiment_classification.py
sentiment_classification_hdf5.py		sentiment_classification_hdf5.py
sentiment_classification_withIntegers.py		sentiment_classification_withIntegers.py
variables.py		variables.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

domainadaption

0. Available data

1. Create Embeddings

1.1. Run preprocess_dataset_for_embeddings.py

1.2. Run sorted_data_acl/merge_reviews.sh

1.3. Run create_word_embeddings.sh

1.4. Run build_embedding_dictionary.py

2. Transform Text Reviews into Embedded Reviews

2.1. Run preprocess_dataset.py

2.2. Run merge_preprocessed_reviews.py

2.3. Run embed_reviews.py

3. Classify Sentiments

3.1. Run sentiment_classification.py

About

Releases

Packages

Languages

lorenzoritter/domainadaption

Folders and files

Latest commit

History

Repository files navigation

domainadaption

0. Available data

1. Create Embeddings

1.1. Run preprocess_dataset_for_embeddings.py

1.2. Run sorted_data_acl/merge_reviews.sh

1.3. Run create_word_embeddings.sh

1.4. Run build_embedding_dictionary.py

2. Transform Text Reviews into Embedded Reviews

2.1. Run preprocess_dataset.py

2.2. Run merge_preprocessed_reviews.py

2.3. Run embed_reviews.py

3. Classify Sentiments

3.1. Run sentiment_classification.py

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages