This project focuses on real-time recognition of instrumental playing techniques using advanced machine learning models. It enables the automatic identification of various playing techniques in real time from a solo instrument's audio stream input. This repository includes tools for preparing datasets, training models, evaluating their performance, and real-time inference.
Lead Developer: Nicolas Brochec, Tokyo University of the Arts, ERC Reach.
Contributor: Marco Fiorini, IRCAM-STMS, CNRS, Sorbonne UniversitΓ©, ERC Reach.
Clone this repository and navigate to the folder.
git clone https://github.com/nbrochec/realtimeIPTrecognition/
cd realtimeIPTrecognition
Create a conda environment with Python 3.11.7
conda create --name IPT python=3.11.7
source activate base
conda activate IPT
Make sure that portaudio
is installed on your computer.
On Linux:
sudo apt-get install portaudio19-dev
On MacOS using Homebrew:
brew install portaudio
Install dependencies.
pip install -r requirements.txt
Install PyAudio separately.
pip install pyaudio
βββ πdata
βββ πdataset
βββ πraw_data
βββ πtest
βββ πtrain
βββ πexternals
βββ πpytorch_balanced_sampler
βββ __init__.py
βββ sampler.py
βββ utils.py
βββ πmodels
βββ __init__.py
βββ layers.py
βββ models.py
βββ utils.py
βββ πutils
βββ __init__.py
βββ augmentation.py
βββ constants.py
βββ rt.py
βββ utils.py
βββ check_io.py
βββ LICENCE
βββ preprocess.py
βββ README.md
βββ requirements.txt
βββ realtime.py
βββ train.py
You can drag and drop the folder containing your training audio files into the /data/dataset/raw_sample/train/
folder and your test audio files into the /data/dataset/raw_sample/test/
folder.
For IPT classes, test and train folders must share the same name. The class label is retrieved from the name of your IPT class folders.
βββ πtest
βββ πmyTestDataset
βββ πIPTclass_1
βββ audiofile1.wav
βββ audiofile2.wav
βββ ...
βββ πIPTclass_2
βββ audiofile1.wav
βββ audiofile2.wav
βββ ...
βββ ...
βββ πtrain
βββ πmyTrainingDataset
βββ πIPTclass_1
βββ audiofile1.wav
βββ audiofile2.wav
βββ ...
βββ πIPTclass_2
βββ audiofile1.wav
βββ audiofile2.wav
βββ ...
βββ ...
You can use multiple training datasets. They must share the same names for IPT classes as well.
βββ πtrain
βββ πmyTrainingDataset1
βββ πIPTclass_1
βββ πIPTclass_2
βββ ...
βββ πmyTrainingDataset2
βββ πIPTclass_1
βββ πIPTclass_2
βββ ...
βββ ...
Use screen
to access multiple separate login session insde a single terminal window.
Open a screen.
screen -S IPT
conda activate IPT
cd realtimeIPTrecognition
To preprocess your datasets, use the following command. The only required argument is --name
.
python preprocess.py --name project_name
Argument | Description | Possible Values | Default Value |
---|---|---|---|
--name |
Name of the project. | String | None |
--train_dir |
Specify train directory. | String | train |
--test_dir |
Specify test directory. | String | test |
--val_dir |
Specify val directory. | String | val |
--val_split |
Specify from which dataset the validation set will be generated. | train , test |
train |
--val_ratio |
Amount of validation samples. | 0 <= Float value < 1 | 0.2 |
If --val_dir
is not specified, the validation set will be generated from the folder specified with --val_split
.
A CSV file will be saved in the /data/dataset/
folder with the following syntax:
project_name_dataset_split.csv
There are many different configurations for training your model. The only required argument is --name
.
To train your model use the following command.
python train.py --name project_name
You can use the following arguments if you want to test different configurations.
Argument | Description | Possible Values | Default Value |
---|---|---|---|
--name |
Name of the project. | String | |
--device |
Specify the hardware on which computation should be performed. | cpu , cuda , mps |
cpu |
--gpu |
Specify which GPU to use. | Integer | 0 |
--config |
Name of the model's architecture. | v1 , v2 , v3 |
v2 |
--sr |
Sampling rate for downsampling the audio files. | Integer (Hz) | 24000 |
--segment_overlap |
Overlap between audio segments. Increase the data samples by a factor 2. | True , False |
False |
--fmin |
Minimum frequency for Mel filters. | Integer (Hz) | 0 |
--lr |
Learning rate. | Float value > 0 | 0.001 |
--batch_size |
Specify Batch Size | Integer value > 0 | 128 |
--epochs |
Number of training epochs. | Integer value > 0 | 100 |
--offline_augment |
Use offline augmentations generated from original audio files using detuning, gaussian noise and time stretching. Stored in a Pytorch Dataset. | True , False |
True |
--online_augment |
Specify which online augmentations to use. Applied in the training loop. Each augmentation has 50% chance to be applied. | pitchshift , timeshift , polarityinversion , hpf , lpf , clipping ,bitcrush , airabso , aliasing , mp3comp , trim |
None |
--padding |
Pad the arrays of audio samples with zeros. minimal only pads when audio file length is shorter than the model input length. |
full , minimal , None |
minimal |
--early_stopping |
Number of epochs without improvement before early stopping. | Integer value > 0, or None |
None |
--reduce_lr |
Reduce learning rate if validation plateaus. | True , False |
False |
--export_ts |
Export the model as a TorchScript file (.ts format). |
True , False |
True |
--save_logs |
Save logs results to disk. | True , False |
True |
Training your model will create a runs
folder with the name of your project.
Detach from current screen ctrl
+A
+D
.
Open a new screen.
screen -S monitor
conda activate IPT
cd realtimeIPTrecognition
You can monitor the training using tensorboard. Confusion matrix and results will be uploaded to tensorboard after training.
tensorboard --logdir . --bind_all
If you are working on a remote ssh server, use the following command to connect on the server, and monitor with tensorboard from your internet browser.
ssh -L 6006:localhost:6006 user@server
A project folder with the date and time attached will be created such as project_name_date_time
.
After training, the script automatically saves the model checkpoints in the /runs/project_name_date_time/
folder.
If you use --export_ts True
, the .ts
file will be saved in the same folder.
βββ πruns
βββ πproject_name_date_time
The results and the confusion matrix will be saved to disk as a CSV file in the logs
directory.
βββ πlogs
βββ πproject_name_date_time
βββ cm_project_name_date_time.csv
βββ results_project_name_date_time.csv
To run your model in real time, you need first to check available audio input devices of your computer with the script check_io.py
.
python check_io.py
This will display a list of the devices and their respective ID. The use of BlackHole to route the audio stream from Max to PyAudio is recommended.
Input Device ID 0 - MacBook Pro Microphone
Input Device ID 1 - BlackHole 2ch
Input Device ID 2 - BlackHole 16ch
Once you have found your device ID, use the command python realtime.py
to run your model in real time. The arguments --name
, --input
, and --channel
are required.
The script will automatically run the most recent model saved in the runs
folder.
python realtime.py --name your_project --input 0 --channel 1
Argument | Description | Possible Values | Default Value |
---|---|---|---|
--name |
Name of the project. | String | None |
--input |
Specify the audio device ID. | String | None |
--channel |
Specify the channel of the audio device. | String | None |
--device |
Specify the hardware on which computation should be performed. | cpu , cuda , mps |
cpu |
--gpu |
Specify which GPU to use. | Integer | 0 |
--buffer_size |
Specify audio buffer size. | Integer | 256 |
--moving_average |
Window size for smoothing predictions with a moving average. | Integer | 5 |
--port |
Specify UDP port. | Integer | 5005 |
Predictions [0, n_class-1] are sent via UDP through selected port (default is 5005) with a /class
address.
Use a UDP receiver to retrieve the predictions as integers.
If you use this code in your research, please cite the following papers.
@inproceedings{brochec:hal-04642673,
TITLE = {{Microphone-based Data Augmentation for Automatic Recognition of Instrumental Playing Techniques}},
AUTHOR = {Brochec, Nicolas and Tanaka, Tsubasa and Howie, Will},
URL = {https://hal.science/hal-04642673},
BOOKTITLE = {{International Computer Music Conference (ICMC 2024)}},
ADDRESS = {Seoul, South Korea},
YEAR = {2024},
MONTH = Jul,
PDF = {https://hal.science/hal-04642673/file/Brochec_Microphone_based_Data_Augmentation_for_Automatic_Recognition_of_Instrument_Playing_Techniques_.pdf},
HAL_ID = {hal-04642673},
HAL_VERSION = {v1},
}
@inproceedings{fiorini:hal-04635907,
TITLE = {{Guiding Co-Creative Musical Agents through Real-Time Flute Instrumental Playing Technique Recognition}},
AUTHOR = {Fiorini, Marco and Brochec, Nicolas},
URL = {https://hal.science/hal-04635907},
BOOKTITLE = {{Sound and Music Computing Conference (SMC 2024)}},
ADDRESS = {Porto, Portugal},
YEAR = {2024},
MONTH = Jul,
KEYWORDS = {AI ; Co-creativity ; Instrumental playing techniques ; Multi-agent system ; Somax2},
PDF = {https://hal.science/hal-04635907/file/SMC2024_GUIDING_CO_CREATIVE_MUSICAL_AGENTS_THROUGH_REAL_TIME_FLUTE_INSTRUMENTAL_PLAYING_TECHNIQUE_RECOGNITION_CAMERA_READY.pdf},
HAL_ID = {hal-04635907},
HAL_VERSION = {v1},
}
β’ Nicolas Brochec and Tsubasa Tanaka. Toward Real-Time Recognition of Instrumental Playing Techniques for Mixed Music: A Preliminary Analysis. International Computer Music Conference (ICMC 2023), Oct 2023, Shenzhen, China.
β’ Nicolas Brochec and Will Howie. GFDatabase: A Database of Flute Playing Techniques (version 1.1). Zenodo, 2024.
This project uses code from the pytorch_balanced_sampler repository created by Karl Hornlund.
This work is supported by the ERC Reach (Raising Co-creativity in Cyber-Human Musicianship), hosted at IRCAM, directed by GΓ©rard Assayag.