Interapp

Interapp is a program that suggests user actions during a video call built on Machine Learning based on the emotion recognition of the image and audio of a video call.

The purposes of this project is to help user to notify the current emotion of the video calling subject and suggests them the action that could be taken to improve the mutual experiences of the video call. This is also useful for student that is taking video call as a type of interview, knowing the current emotion of the interviewer that initially could have gone unnoticed, might help student to avoid some pitfall of interview and thus getting a better result.

Overview

As a prototype, the python script will capture a specific windows on the screen and take computer own audio output as audio feed. Therefore, it given the script the capability to intergrate with any of the major conference or video call software including Skype, Facebook Messenger Call, Google Duo, etc

The structure of this project can roughly be separated into three part.

Image (Face Sentiment Analysis)

Although one always say he is deep with hiding their feeling, but almost everyone can't help but shown some of the emotion on their faces. By using Opencv library, we can extract faces inside the image efficiently thanks to its well implemented face detection function. Later the image with faces present is cropped then feed to the face emotion classifier built by Emopy one face at a time.

The result of the classifier which constist of different percentage of similarity of each emotion is transfer to the action suggestion model for further training.

Voice (Real-time Speech Emotion Recognizer)

Speech Emotion or voice sentiment is also a hot topics with a lot of past research working on it. People tend to speak with increasing frequency when they are happy or interested, while talking in normalized pitch when they got bored. Although talking in different language and word, the base frequency of the speech is highly corelate to the speaker mood and emotion.

By doing short-time Fourier Transform (STFT), we are able to obtain the intensity of frequency versus short time chart. This is particularly useful becuase it enable researcher to treat it as a 2D data classifiation problem after signal processing.

In our project, this is assisted by the library OpenVokaturi.

Action Suggestion (Reinforcement Learning using TensorFlow)

By using the output from image and voice analysis, a Neural Network model with 128 hidden nodes and 2 hidden layers is self-trained. During the prototyping stage, user will get suggestion for action during the conversation in video call, eg. "Change topics", "Pay Attention!", etc. Then, user can rate its usefulness relate with the actual situation. The data will be submitted back to the program to further train the model at the end of the conversation so that the program will be even smarter.

Some of the action suggestion including

change topic
engage in discussion
slow down
speed up
stop and listen
open conversation
ask questions
pay attention

Lastly, having the ability to capture image and sound of any program on computer, the program is not restricted on solely video call, but there is a vast possibility of it is capable of.

File details

constant.py: All neccessary constants
doc2vec.py: Vectorise predefined actions.
voice2emo.py: Analyse audio output and extimate the voice sentiment
face2emo.py: Analyse image output and extimate the face sentiment
emo2act.py: Train the "Emotions to Actions" model
main.py: Analyse voice and face sentiment and suggest suitable actions

The prediction flow is: main.py -> face2emo.py & voice2emo.py -> emo2act.py -> doc2vec.py -> output actions

Environment

Windows 10
Python 3.6.6

Run instruction

Clone the project
Run pip install requirements.txt
If you install nltk for the first time:
```
import nltk
nltk.download('punkt')
```
If you change the ACTIONS in constant.py, Run python doc2vec.py to update d2v.model
Run python main.py

Warps things up

Interapp would like to express our gratitude toward these awesome packages that make this project possible

Emopy -- A deep neural net toolkit for emotion analysis via Facial Expression Recognition (FER) by thoughtworksarts
Vokaturi --understand the emotion in a speaker’s voice
TensorFlow --An open source machine learning framework for everyone
Keras --An open-source software library that provides a Python interface for artificial neural networks.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
models		models
readme_doc		readme_doc
src		src
vokaturi		vokaturi
.gitattributes		.gitattributes
.gitignore		.gitignore
Emo2Act.py		Emo2Act.py
README.md		README.md
classes.py		classes.py
constant.py		constant.py
doc2vec.py		doc2vec.py
face2emo.py		face2emo.py
main.py		main.py
presentation.pptx		presentation.pptx
requirements.txt		requirements.txt
voice2emo.py		voice2emo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Interapp

Overview

Image (Face Sentiment Analysis)

Voice (Real-time Speech Emotion Recognizer)

Action Suggestion (Reinforcement Learning using TensorFlow)

File details

Environment

Run instruction

Warps things up

About

Releases

Packages

Contributors 2

Languages

pakzan/Interapp

Folders and files

Latest commit

History

Repository files navigation

Interapp

Overview

Image (Face Sentiment Analysis)

Voice (Real-time Speech Emotion Recognizer)

Action Suggestion (Reinforcement Learning using TensorFlow)

File details

Environment

Run instruction

Warps things up

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages