A project to detect speaker characteristics by machine learning experiments with a high-level interface.
The idea is to have a framework (based on e.g. sklearn and torch) that can be used to rapidly and automatically analyse audio data and explore machine learning models based on that data.
- NEW with nkululeko: Ensemble learning
- NEW: Finetune transformer-models
- The latest features can be seen in the ini-file options that are used to control Nkululeko
- Below is a Hello World example that should set you up fastly, also on Google Colab, and with Kaggle
- Here's a blog post on how to set up nkululeko on your computer.
- Here is a slack channel to discuss issues related to nkululeko. Please click the link if interested in contributing.
- Here's a slide presentation about nkululeko
- Here's a video presentation about nkululeko
- Here's the 2022 LREC article on nkululeko
Here are some examples of typical output:
Per default, Nkululeko displays results as a confusion matrix using binning with regression.
The point when overfitting starts can sometimes be seen by looking at the results per epoch:
Using the explore interface, Nkululeko analyses the importance of acoustic features:
And can show the distribution of specific features per category:
A t-SNE plot can give you an estimate of whether your acoustic features are useful at all:
Sometimes, you only want to take a look at your data:
In some cases, you might wonder if there's bias in your data. You can try to detect this with automatically estimated speech properties by visualizing the correlation of target labels and predicted labels.
Nkululeko estimates the uncertainty of model decisions (only for classifiers) with entropy over the class probabilities or logits per sample.
The documentation, along with extensions of installation, usage, INI file format, and examples, can be found nkululeko.readthedocs.io.
Create and activate a virtual Python environment and simply run
pip install nkululeko
We excluded some packages from the automatic installation because they might depend on your computer and some of them are only needed in special cases. So if the error
module x not found
appears, please try
pip install x
For many packages, you will need the missing torch package. If you don't have a GPU (which is probably true if you don't know what that is), please use
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
else, you can use the default:
pip install torch torchvision torchaudio
Some functionalities require extra packages to be installed, which we didn't include automatically:
- the SQUIM model needs a special torch version:
pip uninstall -y torch torchvision torchaudio pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
- the spotlight adapter needs spotlight:
pip install renumics-spotlight sliceguard
Some examples for ini-files (which you use to control nkululeko) are in the tests folder.
Nkululeko works by specifiying
Basically, you specify your experiment in an "ini" file (e.g. experiment.ini) and then call one of the Nkululeko interfaces to run the experiment like this:
python -m nkululeko.nkululeko --config experiment.ini
A basic configuration looks like this:
[EXP]
root = ./
name = exp_emodb
[DATA]
databases = ['emodb']
emodb = ./emodb/
emodb.split_strategy = speaker_split
target = emotion
labels = ['anger', 'boredom', 'disgust', 'fear']
[FEATS]
type = ['praat']
[MODEL]
type = svm
[EXPL]
model = tree
plot_tree = True
Read the Hello World example for initial usage with Emo-DB dataset.
Here is an overview of the interfaces/modules:
All of them take --config <my_config.ini> as an argument.
-
nkululeko.nkululeko: do machine learning experiments combining features and learners
-
nkululeko.ensemble: combine several nkululeko experiments and report on late fusion results
- --config: which experiments (INI files) to combine
- --method (optional): majority_voting, mean (default), max, sum, uncertainty, uncertainty_weighted, confidence_weighted, performance_weighted
- --threshold: uncertainty threshold (1.0 means no threshold)
- --weights: weights for performance_weighted method (could be from previous UAR, ACC)
- --outfile (optional): name of CSV file for output (default: ensemble_result.csv)
- --no_labels (optional): indicate that no ground truth is given
-
nkululeko.multidb: do multiple experiments, comparing several databases cross and in itself
-
nkululeko.demo: demo the current best model on the command line
- --list (optional) list of input files
- --file (optional) name of input file
- --folder (optional) parent folder for input files
- --outfile (optional) name of CSV file for output
-
nkululeko.test: predict a given data set with the current best model
-
nkululeko.explore: perform data exploration
-
nkululeko.augment: augment the current training data
-
nkululeko.aug_train: augment the current training data and do a training including this data
-
nkululeko.predict: predict features like SNR, MOS, arousal/valence, age/gender, with DNN models
-
nkululeko.segment: segment a database based on VAD (voice activity detection)
-
nkululeko.resample: check on all sampling rates and change to 16kHz
-
nkululeko.nkuluflag: a convenient module to specify configuration parameters on the command line. Usage:
$ python -m nkululeko.nkuluflag.py [-h] [--config CONFIG] [--data [DATA ...]] [--label [LABEL ...]] [--tuning_params [TUNING_PARAMS ...]] [--layers [LAYERS ...]] [--model MODEL] [--feat FEAT] [--set SET] [--with_os WITH_OS] [--target TARGET] [--epochs EPOCHS] [--runs RUNS] [--learning_rate LEARNING_RATE] [--drop DROP]
There's my blog with tutorials:
- Introduction
- Nkulueko FAQ
- How to set up your first nkululeko project
- Setting up a base nkululeko experiment
- How to import a database
- Comparing classifiers and features
- Use Praat features
- Combine feature sets
- Classifying continuous variables
- Try out / demo a trained model
- Perform cross-database experiments
- Meta parameter optimization
- How to set up wav2vec embedding
- How to soft-label a database
- Re-generate the progressing confusion matrix animation wit a different framerate
- How to limit/filter a dataset
- Specifying database disk location
- Add dropout with MLP models
- Do cross-validation
- Combine predictions per speaker
- Run multiple experiments in one go
- Compare several MLP layer layouts with each other
- Import features from outside the software
- Export acoustic features
- Explore feature importance
- Plot distributions for feature values
- Show feature importance
- Augment the training set
- Visualize clusters of acoustic features
- Visualize your data distribution
- Check your dataset
- Segmenting a database
- Predict new labels for your data from public models and check bias
- Resample
- Get some statistics on correlation and effect-size
- Automatic generation of a latex/pdf report
- Inspect your data with Spotlight
- Automatically stratify your split sets
- re-name data column names
- Oversample the training set
- Compare several databases
- Tweak the target variable for database comparison
- How to run multiple experiments in one go
- How to finetune a transformer-model
- Ensemble (combine) classifiers with late-fusion
- NEW: Here's a Google colab that runs this example out-of-the-box, and here is the same with Kaggle
- I made a video to show you how to do this on Windows
- Set up Python on your computer, version >= 3.8
- Open a terminal/command line/console window
- Test python by typing
python
, python should start with version >3 (NOT 2!). You can leave the Python Interpreter by typing exit() - Create a folder on your computer for this example, let's call it
nkulu_work
- Get a copy of the Berlin emodb in audformat and unpack inside the folder you just created (
nkulu_work
) - Make sure the folder is called "emodb" and does contain the database files directly (not box-in-a-box)
- Also, in the
nkulu_work
folder:- Create a Python environment
python -m venv venv
- Then, activate it:
- under Linux / mac
source venv/bin/activate
- under Windows
venv\Scripts\activate.bat
- if that worked, you should see a
(venv)
in front of your prompt
- under Linux / mac
- Install the required packages in your environment
pip install nkululeko
- Repeat until all error messages vanish (or fix them, or try to ignore them)...
- Create a Python environment
- Now you should have two folders in your nkulu_work folder:
- emodb and venv
- Download a copy of the file exp_emodb.ini to the current working directory (
nkulu_work
) - Run the demo
python -m nkululeko.nkululeko --config exp_emodb.ini
- Find the results in the newly created folder exp_emodb
- Inspect
exp_emodb/images/run_0/emodb_xgb_os_0_000_cnf.png
- This is the main result of your experiment: a confusion matrix for the emodb emotional categories
- Inspect
- Inspect and play around with the demo configuration file that defined your experiment, then re-run.
- There are many ways to experiment with different classifiers and acoustic feature sets, all described here
The framework is targeted at the speech domain and supports experiments where different classifiers are combined with different feature extractors.
- Classifiers: Naive Bayes, KNN, Tree, XGBoost, SVM, MLP
- Feature extractors: Praat, Opensmile, openXBOW BoAW, TRILL embeddings, Wav2vec2 embeddings, audModel embeddings, ...
- Feature scaling
- Label encoding
- Binning (continuous to categorical)
- Online demo interface for trained models
Here's a rough UML-like sketch of the framework (and here's the real one done with pyreverse).
Currently, the following linear classifiers are implemented (integrated from sklearn):
- SVM, SVR, XGB, XGR, Tree, Tree_regressor, KNN, KNN_regressor, NaiveBayes, GMM and the following ANNs (artificial neural networks)
- MLP (multi-layer perceptron), CNN (convolutional neural network)
Here's an animation that shows the progress of classification done with nkululeko
Nkululeko can be used under the MIT license.
Contributions are welcome and encouraged. To learn more about how to contribute to nkululeko, please refer to the Contributing guidelines.
If you use it, please mention the Nkululeko paper:
F. Burkhardt, Johannes Wagner, Hagen Wierstorf, Florian Eyben and Björn Schuller: Nkululeko: A Tool For Rapid Speaker Characteristics Detection, Proc. Proc. LREC, 2022
@inproceedings{Burkhardt:lrec2022,
title = {Nkululeko: A Tool For Rapid Speaker Characteristics Detection},
author = {Felix Burkhardt and Johannes Wagner and Hagen Wierstorf and Florian Eyben and Björn Schuller},
isbn = {9791095546726},
journal = {2022 Language Resources and Evaluation Conference, LREC 2022},
keywords = {machine learning,speaker characteristics,tools},
pages = {1925-1932},
publisher = {European Language Resources Association (ELRA)},
year = {2022},
}