Music Augmentation and Denoising For Peak-Based Audio Fingerprinting

This repository provides the code to reproduce the paper:

@article{akesbi2023music,
  title={Music Augmentation and Denoising For Peak-Based Audio Fingerprinting},
  author={Akesbi, Kamil and Desblancs, Dorian and Martin, Benjamin},
  journal={arXiv preprint arXiv:2310.13388},
  year={2023}
}

which can be found here.

This work is going to be presented at the Late-Breaking Demo Session of ISMIR 2023.

Setup

Datasets

In order to use the music augmentation pipeline, you will need to download the following datasets:

Room Impulse Responses

We use the MIT impulse response (IR) survey dataset. You can download the dataset here.

Background Noise

We build our own background noise dataset by mixing samples from past Acoustic scence datasets from the DCASE challenge. We select:

The TUT Acoustic scenes 2017. Development here and evaluation dataset here.
The TUT Urban Acoustic Scenes 2018 Mobile. Development dataset here, and evaluation dataset here.
The 2020 TAU Urban Acoustic Scenes 2020 Mobile challenge. Development dataset here, and evaluation dataset here.

Music Dataset

We use the MTG-Jamendo dataset to train different music denoising models. Download the dataset here.

Audio Fingerprinting Dataset

We use the Free Music Archive (FMA) Large as our reference database to evaluate the performances of Audio Fingerprinting systems to noisy query snipets. Download the dataset here.

Running some Code

You can specify the path to the datasets folders in docker/install/.env.

From there, you can build and launch your Docker environment for experiments using the following commands:

docker-compose -f docker/install/docker-compose.yaml build
docker-compose -f docker/install/docker-compose.yaml up -d
docker-compose -f docker/install/docker-compose.yaml run python /bin/bash

Your code will then use the following structure:

workspace/ 
    src/
    noise_databases/
        mit_ir_survey/
        dcase/
            tut_2017_development/
            tut_2017_evaluation/
            tut_2018_development_mobile/
            tut_2018_evaluation_mobile/
            tut_2020_development_mobile/
            tut_2020_evaluation_mobile/
    fma/
    mtg-jamendo-dataset/
    queries/

You can then install the dependencies needed using poetry:

cd src/
poetry install 
poetry shell
poetry run python ...

Music Augmentation Pipeline:

The augmentation pipeline is composed of several transformations applied to an audio input. It is designed to reproduce degradations caused by room responses, background noise, recording devices and loud speakers.

Augmented Audios Generation

To generate an augmented music recording from a clean music snipet, you can use the following script:

from augmentation import AugmentFP
from training.parameters import WAVEFORM_SAMPLING_RATE, DURATION

af = AugmentFP(List[noise_paths], WAVEFORM_SAMPLING_RATE)

waveform, sr = torchaudio.load("path_to_audio")
waveform = waveform.mean(axis=0)
waveform = torchaudio.transforms.Resample(sr, WAVEFORM_SAMPLING_RATE)(waveform)

nb_samples_segment = WAVEFORM_SAMPLING_RATE * DURATION
start = random.randint(0, waveform.shape[0] - nb_samples_segment)
waveform = waveform[start : start + nb_samples_segment].unsqueeze(0)

aug = af(waveform)

Streamlit integration:

The pipeline and its different parameters can also be tested in a user-friendly interface using streamlit. The script to access the streamlit interface is:

streamlit run streamlit_app/app.py --server.port=8501 --server.address=0.0.0.0 --server.fileWatcherType=None

which results in an interface that looks like...

Experiments

Music Denoising

To train the models (UNet on magnitude spectrograms or Demucs on raw audio waveforms):

python -m training.train --model=unet

In order to visualize the tensorboard logs:

tensorboard --logdir=monitoring/ --port=6006

Model weights of two pretrained models can be found in the following Google Drive: https://drive.google.com/file/d/1wAV5EP3oh-V-Q3k-Qf6BEdJjQjZATZJ4/view?usp=sharing.

In particular, we provide the pretrained weights of:

A UNet trained to denoise magnitude spectrograms of 8kHz audio signals.
A Demucs trained to denoise 8 kHz raw audio waveforms.

Use the models to generate audios and spectrograms:

python -m training.generate_audios --model=unet

Audio Fingerprinting

We evaluate the robustness of AFP systems to the distortions our augmentation pipeline generates. We use two popular open-source systems: Audfprint and Dejavu.

We use the FMA large dataset as our reference database. To preprocess it, use:

python testing/fma_preprocessing.py

We can then generate 10000 eight-second audio queries using:

python -m testing.generate_queries --queries=cleans
python -m testing.generate_queries --queries=augmented

Audfprint:

To index the FMA large on Audfprint, use:

python -m testing.audfpring_exps --action=index

To obtain results on Audfprint (specify demucs or unet model):

python -m testing.audfpring_exps --action=identification_rate --model=unet
python -m testing.audfpring_exps --action=peaks_metrics --model=unet

Dejavu:

To index the FMA large on Dejavu, use:

python -m testing.dejavu_exps --action=index

To obtain results on Dejavu (specify demucs or unet model):

python -m testing.dejavu_exps --action=identification_rate --model=unet
python -m testing.dejavu_exps --action=peaks_metrics --model=unet

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
afp		afp
augmentation		augmentation
docker		docker
images		images
streamlit_app		streamlit_app
testing		testing
training		training
.flake8		.flake8
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Music Augmentation and Denoising For Peak-Based Audio Fingerprinting

Setup

Datasets

Room Impulse Responses

Background Noise

Music Dataset

Audio Fingerprinting Dataset

Running some Code

Music Augmentation Pipeline:

Augmented Audios Generation

Streamlit integration:

Experiments

Music Denoising

Audio Fingerprinting

Audfprint:

Dejavu:

About

Releases

Packages

Contributors 2

Languages

License

deezer/musicFPaugment

Folders and files

Latest commit

History

Repository files navigation

Music Augmentation and Denoising For Peak-Based Audio Fingerprinting

Setup

Datasets

Room Impulse Responses

Background Noise

Music Dataset

Audio Fingerprinting Dataset

Running some Code

Music Augmentation Pipeline:

Augmented Audios Generation

Streamlit integration:

Experiments

Music Denoising

Audio Fingerprinting

Audfprint:

Dejavu:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages