This repository contains a comprehensive Sentiment Analysis project designed to classify text data into three sentiment categories: positive, negative, and neutral. Using a variety of machine learning models and natural language processing (NLP) techniques, this project leverages popular libraries such as NumPy, Pandas, Matplotlib, Seaborn, and NLTK for data manipulation, visualization, and analysis.
- Data Preprocessing: Cleans and prepares text data, including tokenization, stopword removal, and lemmatization.
- Model Training: Implements multiple machine learning algorithms, allowing comparison across models. Models included:
- Logistic Regression
- Naïve Bayes Classifier
- Support Vector Classifier (SVC)
- Decision Trees
- Random Forest Classifier
- Model Evaluation: Assesses performance using key metrics such as accuracy, precision, recall, F1-score, and ROC-AUC.
- Visualization: Generates visualizations, including word clouds for each sentiment category, to provide insights into the most common words associated with each sentiment.
To get started with this project, follow these steps:
-
Clone the Repository:
git clone https://github.com/yourusername/sentiment-analysis.git cd sentiment-analysis
-
Install Dependencies: Ensure you have Python 3.x installed. Use the following command to install required libraries:
pip install -r requirements.txt
- Python 3.x
- Jupyter Notebook
- Required Libraries:
numpy
pandas
matplotlib
seaborn
nltk
scikit-learn
wordcloud
- Data Loading: Load your dataset into the Jupyter Notebook. Ensure the dataset has a column for text data and one for sentiment labels.
- Preprocessing: Run preprocessing steps to clean, tokenize, and lemmatize the text data.
- Model Training: Choose and train a model from the available options. Adjust hyperparameters as needed.
- Evaluation: Evaluate model performance using metrics such as accuracy, F1-score, and AUC-ROC.
- Visualization: Generate word clouds and other visualizations to analyze sentiment distribution within the data.
The following example demonstrates how to load a dataset and generate a word cloud for negative sentiment words:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS
# Load your dataset
df = pd.read_csv('data.csv')
# Generate a word cloud for negative sentiment
negative_df = df[df['sentiment'] == 'negative']
# Function to generate a word cloud
def generate_wordcloud(data, title):
text = " ".join(review for review in data['text'])
wordcloud = WordCloud(stopwords=STOPWORDS, background_color="white").generate(text)
plt.figure(figsize=(10, 6))
plt.imshow(wordcloud, interpolation='bilinear')
plt.title(title)
plt.axis("off")
plt.show()
generate_wordcloud(negative_df, 'Negative Sentiment WordCloud')
Contributions are welcome! If you'd like to contribute, please follow these steps:
- Fork the Repository.
- Create a New Branch:
git checkout -b feature-branch
- Make Your Changes and commit them:
git commit -m 'Add new feature'
- Push to the Branch:
git push origin feature-branch
- Create a New Pull Request.
- The Jupyter Development Team for JupyterLab.
- Contributors of the libraries used in this project, such as NLTK, scikit-learn, and Seaborn.
- [Insert any other acknowledgments here].
For questions or suggestions, feel free to reach out:
- Your Name - [email protected]
- GitHub: G.M.Ravindu Dulshan