Skip to content

brown-palm/syntheory

Repository files navigation

Do Music Generation Models Encode Music Theory? (ISMIR 2024)

Paper | Website | Hugging Face Dataset

Installation Instructions

Virtual Environment

Run the following from the root of the repository:

conda env create -f environment.yaml

Will create a conda environment named syntheory

The environment may be activated with:

conda activate syntheory

If there are problems installing from environment.yaml, the main dependencies we need are

  • pytorch: (see link): LINK
  • transformers (pip install transformers): LINK
  • jukemirlib (pip install git+https://github.com/rodrigo-castellon/jukemirlib.git): LINK
  • ffmpeg-python (pip install ffmpeg-python): LINK
  • mido (pip install mido): LINK
  • zarr (pip install zarr): LINK

Dependencies

You may need to install the following dependencies, if you don't already have them.

Linux

Requires ffmpeg.

On Linux, through a package manager like apt:

apt install ffmpeg

MacOS

Requires ffmpeg.

Can be installed through homebrew (on MacOS) with:

brew install ffmpeg

(If running on M1 mac, must also install libsndfile):

brew install libsndfile

Note, even after installing libsndfile, OSError: sndfile library not found might be raised. To fix this, run:

conda install -c conda-forge libsndfile

Dataset Generation

Compiling the Synthesizer

First, install cargo. Then, run

bash repo/compile_synth.sh

We need this to turn MIDI into a .wav file. A bit more information on this step can be found here: README.md

SynTheory Datasets

The following datasets exist in the dataset/synthetic folder:

  • chords: ~18.7 GB (13,248 samples)
  • chord_progressions: ~29.61 GB (20,976 samples)
  • intervals: ~56.1 GB (39,744 samples)
  • notes: ~14.02 GB (9,936 samples)
  • scales: ~21.82 GB (15,456 samples)
  • tempos: ~5.68 GB (4,025 samples)
  • time_signatures: ~1.48 GB (1,200 samples)

Any of these can be generated by running the below command from the root of the repository:

python dataset/synthetic/<NAME_OF_DATASET>.py

where <NAME_OF_DATASET> is one of the following datasets: chords, chord_progressions, intervals, notes, scales, tempos, and time_signatures.

This will create a folder in: data/<NAME_OF_DATASET>.

Each folder will contain:

  • info.csv
  • .wav files
  • .mid files

Some are quite large.

SynTheory Dataset via Hugging Face

You can also download our dataset through Hugging Face here.

To download a particular concept (e.g. notes), run the following script:

from datasets import load_dataset

notes = load_dataset("meganwei/syntheory", "notes")

You can also access our dataset in streaming mode instead of downloading the entire dataset to disk by running the following (for each desired concept):

from datasets import load_dataset

notes = load_dataset("meganwei/syntheory", "notes", streaming=True)

print(next(iter(notes)))

Making ✨ Custom 🔥 Datasets

We have a short guide that explains how to use this codebase to create custom datasets. We encourage the community to create more complex and diverse concept definitions.

Note

We hope for SynTheory to be more than a static dataset - it is a framework and procedure for creating music theoretic concept understanding datasets.

Custom Dataset Instruction Guide

Embedding Extraction

After creating a synthetic dataset, run

python embeddings/extract_embeddings.py --config embeddings/emb.yaml

This will use a specific configuration to extract embeddings for each .wav file and save them to a zarr file. You can specify the models and concepts to extract embeddings from by editing models and concepts in the configuration file.

For each dataset and model combination, this will produce a csv file named data/<NAME_OF_DATASET>/<NAME_OF_DATASET>_<MODEL_HASH>_embeddings_info.csv. Each file contains information about each embedding for that particular model and dataset.

Due to the time it may take to extract these embeddings, the dataset is partitioned into shards, each responsible for extracting up to some constant number of embeddings. This will start a SLURM job for each shard.

Probing Experiments

To launch probing experiments, run

python probe/run_probes.py

You can specify which models and concepts you're probing by modifying the sweep_config argument. The sweep configurations are defined in probe/probe_config.py. Currently, we include configs on probing the handcrafted features, MusicGen Audio Encoder, Jukebox, and MusicGen decoder language models across all concepts.

In our implementation, we performed hyperparameter search for handcrafted features (CHROMA, MELSPEC, MFCC, HANDCRAFT) and MusicGen Audio Encoder. For Jukebox and MusicGen Decoder models, we used a fixed set of hyperparameters and employ a layer selection process. You're welcome to adapt the models, concepts, and hyperparameters towards your own needs by modifying SWEEP_CONFIGS in probe/probe_config.py.

Our implementation involves logging our probing results to a Weights & Biases project. Before you run the script, make sure to log into your wandb account by providing your wandb API key and running wandb login.

When analyzing your results on Weights & Biases, the desired probing metric is under primary_eval_metric.

Running Tests

From the root of the repository, one may run tests with:

bash repo/test.sh

This will call pytest and pytest-cov to produce a coverage report.

BibTeX

If you find this useful, please cite us in your work.

@inproceedings{Wei2024-music,
        title={Do Music Generation Models Encode Music Theory?},
        author={Wei, Megan and Freeman, Michael and Donahue, Chris and Sun, Chen},
        booktitle={International Society for Music Information Retrieval},
        year={2024}
}