GitHub - raphael-abreu/bimodal_audioset: Supplementary materials for 2018 IJCNN paper "A Bimodal Learning Approach to Assist Multi-sensory Effects Synchronization"

This repository contains code that was used on the paper "A Bimodal Learning Approach to Assist Multi-sensory Effects Synchronization" accepted at IJCNN 2018.

The code is divided into two parts:

Audioset dataset download -- Used to download audioset data from youtube and divide into audio and video
Bimodal neural network architecture and experiments -- Neural network that uses both audio and video as inputs -- Experiments that show the Bimodal architecture performed better in our cases

Audioset dataset download

get_dataset_from_youtube.py Directy download from youtube the videos and audio files of youtube audioset. Inside the script you will note the following hardcoded files : 2000max_subset_unbalanced_train_segments.csv, subset_eval_segments.csv,etc... This csv files are derived from the audioset found here

usage:

python get_dataset_from_youtube.py --train --eval

data/extract.sh Used to extract subsets of the full audioset dataset. Used to alleviate the unbalance between classes. This script reads how many features you want to extract from the audioset and the class names of the audioset labels that you want to extract

In our project tried to limit to a max of 2000 examples per label (thus 2000max in the csv).

TODO

Remove hardcored values and files

Bimodal neural network architecture and experiments

Use Keras + Tensorflow to create a neural network to predict the labels associated with the audio and video samples combined.

The Bimodal architecture is presented on Multimodal.ipynb We also ran experiments on audio and video only networks to check if the bimodal network as really an improvement. These experiments can be found on the other jupyter notebooks (such as 2000video.ipynb). Also we ran a lot of visualization to check the learning process of the bimodal network, the visualizations can be found on KERAS-VIS_activation_maximization.ipynb and on (viz.ipynb)[viz.ipynb]

You can view the class model activations for the video network in https://www.youtube.com/watch?v=dTVbsootmiA

TODO

Cleanup legacy code
Improve file structure

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Jupyter antigos		Jupyter antigos
data		data
plots/gifs		plots/gifs
segments		segments
.gitignore		.gitignore
2000audio.ipynb		2000audio.ipynb
2000audio_concat.ipynb		2000audio_concat.ipynb
2000video.ipynb		2000video.ipynb
2000video_concat.ipynb		2000video_concat.ipynb
KERAS-VIS_activation_maximization.ipynb		KERAS-VIS_activation_maximization.ipynb
Multimodal.ipynb		Multimodal.ipynb
README.md		README.md
get_dataset_from_youtube.py		get_dataset_from_youtube.py
loss.pdf		loss.pdf
requirements.txt		requirements.txt
somar_predicoes.ipynb		somar_predicoes.ipynb
utils.py		utils.py
utils.pyc		utils.pyc
viz.ipynb		viz.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audioset dataset download

TODO

Bimodal neural network architecture and experiments

TODO

About

Releases

Packages

Languages

raphael-abreu/bimodal_audioset

Folders and files

Latest commit

History

Repository files navigation

Audioset dataset download

TODO

Bimodal neural network architecture and experiments

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages