Speech Emotion Recognition Website

This project is a real-time Speech Emotion Recognition (SER) web application that predicts emotions from audio using an LSTM (Long Short-Term Memory) model. The application is built using four prominent datasets: CREMA-D, RAVDESS, SAVEE, and TESS.

Introduction

The Speech Emotion Recognition (SER) website is designed to identify and analyze human emotions based on speech input in real-time. It uses an LSTM-based deep learning model trained on multiple datasets to achieve high accuracy. The website provides an intuitive interface for users to interact with, allowing them to upload audio files or record their voices directly through the platform.

Datasets

The model is trained on the following datasets:

CREMA-D: Crowd-sourced Emotional Multimodal Actors Dataset.
RAVDESS: Ryerson Audio-Visual Database of Emotional Speech and Song.
SAVEE: Surrey Audio-Visual Expressed Emotion.
TESS: Toronto Emotional Speech Set.

These datasets encompass a wide range of emotions, including neutral, calm, happy, sad, angry, fearful, surprise, and disgust.

Model Architecture

The LSTM model was chosen due to its effectiveness in sequence prediction tasks, particularly in processing and analyzing audio signals over time. The model's architecture includes:

Preprocessing layers to normalize and extract features from audio signals.
LSTM layers for temporal sequence learning.
Fully connected dense layers for emotion classification.

Website Features

Real-Time Emotion Prediction: Users can record or upload their voice, and the website will predict the emotion in real-time.
Interactive Dashboard: A user-friendly interface that displays the detected emotion and confidence scores.
Multiple Language Support: The website can process speech in different languages.
Visualization Tools: Graphical representation of the emotion probabilities and audio waveform.

Installation

To run this project locally, follow these steps:

Clone the repository:

git clone https://github.com/yourusername/ser-website.git
cd ser-website

Install dependencies:
```
pip install -r requirements.txt
```
Download and prepare the datasets: Ensure the datasets (CREMA-D, RAVDESS, SAVEE, TESS) are available in the data/ directory.
Train the model (optional): If you wish to train the model from scratch:
```
python train_model.py
```
Run the website:
```
python app.py
```
Access the website: Open a browser and go to http://127.0.0.1:5000/.

Usage

Record Your Voice: Click on the 'Record' button to start capturing your voice through the microphone.
Upload Audio File: Use the upload feature to analyze pre-recorded audio files.
View Results: The detected emotion will be displayed with an emoji for better visualization.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request or report any issues.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Contact

For any questions or suggestions, please contact:

Nikson Nadar - [[email protected]]
GitHub: @Nikson2003

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
SPEECH_FINAL.zip		SPEECH_FINAL.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Emotion Recognition Website

Table of Contents

Introduction

Datasets

Model Architecture

Website Features

Installation

Usage

Contributing

License

Contact

About

Releases

Packages

Nikson2003/Speech-Emotion-Recognition

Folders and files

Latest commit

History

Repository files navigation

Speech Emotion Recognition Website

Table of Contents

Introduction

Datasets

Model Architecture

Website Features

Installation

Usage

Contributing

License

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages