Paper | Website | Hugging Face Dataset
Run the following from the root of the repository:
conda env create -f environment.yaml
Will create a conda environment named
syntheory
The environment may be activated with:
conda activate syntheory
If there are problems installing from environment.yaml
, the main dependencies we need are
pytorch
: (see link
): LINKtransformers
(pip install transformers
): LINKjukemirlib
(pip install git+https://github.com/rodrigo-castellon/jukemirlib.git
): LINKffmpeg-python
(pip install ffmpeg-python
): LINKmido
(pip install mido
): LINKzarr
(pip install zarr
): LINK
You may need to install the following dependencies, if you don't already have them.
Requires ffmpeg
.
On Linux, through a package manager like apt
:
apt install ffmpeg
Requires ffmpeg
.
Can be installed through homebrew (on MacOS) with:
brew install ffmpeg
(If running on M1 mac, must also install libsndfile
):
brew install libsndfile
Note, even after installing libsndfile
, OSError: sndfile library not found
might be raised. To fix this, run:
conda install -c conda-forge libsndfile
First, install cargo. Then, run
bash repo/compile_synth.sh
We need this to turn MIDI into a .wav
file. A bit more information on this step can be found here: README.md
The following datasets exist in the dataset/synthetic
folder:
chords
: ~18.7 GB (13,248 samples)chord_progressions
: ~29.61 GB (20,976 samples)intervals
: ~56.1 GB (39,744 samples)notes
: ~14.02 GB (9,936 samples)scales
: ~21.82 GB (15,456 samples)tempos
: ~5.68 GB (4,025 samples)time_signatures
: ~1.48 GB (1,200 samples)
Any of these can be generated by running the below command from the root of the repository:
python dataset/synthetic/<NAME_OF_DATASET>.py
where <NAME_OF_DATASET>
is one of the following datasets: chords
, chord_progressions
, intervals
, notes
, scales
, tempos
, and time_signatures
.
This will create a folder in: data/<NAME_OF_DATASET>
.
Each folder will contain:
info.csv
.wav
files.mid
files
Some are quite large.
You can also download our dataset through Hugging Face here.
To download a particular concept (e.g. notes), run the following script:
from datasets import load_dataset
notes = load_dataset("meganwei/syntheory", "notes")
You can also access our dataset in streaming mode instead of downloading the entire dataset to disk by running the following (for each desired concept):
from datasets import load_dataset
notes = load_dataset("meganwei/syntheory", "notes", streaming=True)
print(next(iter(notes)))
We have a short guide that explains how to use this codebase to create custom datasets. We encourage the community to create more complex and diverse concept definitions.
Note
We hope for SynTheory to be more than a static dataset - it is a framework and procedure for creating music theoretic concept understanding datasets.
Custom Dataset Instruction Guide
After creating a synthetic dataset, run
python embeddings/extract_embeddings.py --config embeddings/emb.yaml
This will use a specific configuration to extract embeddings for each .wav
file and save them to a zarr file. You can specify the models and concepts to extract embeddings from by editing models
and concepts
in the configuration file.
For each dataset and model combination, this will produce a csv file named data/<NAME_OF_DATASET>/<NAME_OF_DATASET>_<MODEL_HASH>_embeddings_info.csv
. Each file contains information about each embedding for that particular model and dataset.
Due to the time it may take to extract these embeddings, the dataset is partitioned into shards, each responsible for extracting up to some constant number of embeddings. This will start a SLURM job for each shard.
To launch probing experiments, run
python probe/run_probes.py
You can specify which models and concepts you're probing by modifying the sweep_config
argument. The sweep configurations are defined in probe/probe_config.py
. Currently, we include configs on probing the handcrafted features, MusicGen Audio Encoder, Jukebox, and MusicGen decoder language models across all concepts.
In our implementation, we performed hyperparameter search for handcrafted features (CHROMA
, MELSPEC
, MFCC
, HANDCRAFT
) and MusicGen Audio Encoder. For Jukebox and MusicGen Decoder models, we used a fixed set of hyperparameters and employ a layer selection process. You're welcome to adapt the models, concepts, and hyperparameters towards your own needs by modifying SWEEP_CONFIGS
in probe/probe_config.py
.
Our implementation involves logging our probing results to a Weights & Biases project. Before you run the script, make sure to log into your wandb account by providing your wandb API key and running wandb login
.
When analyzing your results on Weights & Biases, the desired probing metric is under primary_eval_metric
.
From the root of the repository, one may run tests with:
bash repo/test.sh
This will call pytest and pytest-cov to produce a coverage report.
If you find this useful, please cite us in your work.
@inproceedings{Wei2024-music,
title={Do Music Generation Models Encode Music Theory?},
author={Wei, Megan and Freeman, Michael and Donahue, Chris and Sun, Chen},
booktitle={International Society for Music Information Retrieval},
year={2024}
}