Sentiment-Analysis-of-Text-Data-Tweets-

This project addresses the problem of sentiment analysis on Twitter. The goal of this project was to predict sentiment for the given Twitter post using Python. Sentiment analysis can predict many different emotions attached to the text, but in this report, only 3 major were considered: positive, negative and neutral. The training dataset was small (just over 5900 examples) and the data within it was highly skewed, which greatly impacted on the difficulty of building a good classifier. After creating a lot of custom features, utilizing bag-of-words representations and applying the Extreme Gradient Boosting algorithm, the classification accuracy at the level of 58% was achieved. Analysing the public sentiment as firms trying to find out the response of their products in the market, predicting political elections and predicting socioeconomic phenomena like the stock exchange.

Dataset Information

We use and compare various different methods for sentiment analysis on tweets . The training dataset is expected to be a csv file of type tweet_id,sentiment,tweet where the tweet_id is a unique integer identifying the tweet, sentiment is either 1 (positive) or 0 (negative), and tweet is the tweet enclosed in "". Similarly, the test dataset is a csv file of type tweet_id,tweet. Please note that csv headers are not expected and should be removed from the training and test datasets. The input data consisted of two CSV files:

train.csv (5971 tweets)
test.csv (4000 tweets)

Requirements

There are some general library requirements for the project and some which are specific to individual methods. The general requirements are as follows.

python 3x
pandas
numpy
scikit-learn
scipy
nltk
regex
matplotlib
seaborn

The library requirements specific to some methods are:

xgboost for XGBoost.

Note: It is recommended to use Anaconda distribution of Python. It already have lot of prebuild libraries installed.

Information about files

data/text.csv: Training Dataset.
data/train.csv: Testing Dataset.
data/emoticons.txt: Emoticons Dataset.
Sentiment Analysis of Text Data.ipynb: Heart of the beast, contains all the functions, models and results from the project.
report.pdf: report of the project.
proposal.pdf: proposal of the project.
plots: consists of plots produced in project.
Sentiment Analysis of Text Data.html: Jupyter notebook in html format.
result.csv: result of the test data.

How to Run the project

For the sake of simplicity I have added everything in a single Python Notebook file. Follow the Notebook, each cell in the notebook is well commented which will help understand the project steps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment-Analysis-of-Text-Data-Tweets-

Dataset Information

Requirements

Information about files

How to Run the project

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
functions		functions
plots		plots
README.md		README.md
Sentiment Analysis of Text Data.html		Sentiment Analysis of Text Data.html
Sentiment Analysis of Text Data.ipynb		Sentiment Analysis of Text Data.ipynb
Sentiment Analysis of Text Data.py		Sentiment Analysis of Text Data.py
proposal.pdf		proposal.pdf
report.pdf		report.pdf

Aashu-Adhikari/Sentiment-Analysis-of-Text-Data-Tweets-

Folders and files

Latest commit

History

Repository files navigation

Sentiment-Analysis-of-Text-Data-Tweets-

Dataset Information

Requirements

Information about files

How to Run the project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages