MusicGen: Add Stereo Model #27084

sanchit-gandhi · 2023-10-26T14:13:38Z

What does this PR do?

The original MusicGen model generates mono (1-channel) outputs. It does this by predicting a set of 4 codebooks at each generation step:

[codebook_1, codebook_2, codebook_3, codebook_4]

After generating, the sequence of predicted codebooks is passed through the EnCodec model to get the final waveform.

This PR adds the MusicGen stereo model. It works by predicting two sets of codebooks at each step. One set of codebooks corresponds to the left channel, the other set corresponds to the right channel. The sets of codebooks are interleaved as follows:

[left_codebook_1, right_codebook_1, left_codebook_2, right_codebook_2, ..., left_codebook_4, right_codebook_4]

After generating, the sequence of generated codebooks are partitioned into their left/right parts, and then each sequence passed through EnCodec to get the left/right waveform respectively.

sanchit-gandhi · 2023-10-26T14:14:39Z

src/transformers/models/musicgen/configuration_musicgen.py

@@ -75,6 +75,9 @@ class MusicgenDecoderConfig(PretrainedConfig):
            The number of parallel codebooks forwarded to the model.
        tie_word_embeddings(`bool`, *optional*, defaults to `False`):
            Whether input and output word embeddings should be tied.
+        audio_channels (`int`, *optional*, defaults to 1


Following the EnCodec naming here:

transformers/src/transformers/models/encodec/configuration_encodec.py

Lines 50 to 51 in d7cb5e1

audio_channels (`int`, *optional*, defaults to 1):

Number of channels in the audio data. Either 1 for mono or 2 for stereo.

Note that the EnCodec model used is still 1-channel (mono) - it's just the MusicGen model that works in a 2-channel fashion.

HuggingFaceDocBuilderDev · 2023-10-26T14:44:31Z

The documentation is not available anymore as the PR was closed or merged.

ArthurZucker

Thanks looks very clean 😉

src/transformers/models/musicgen/convert_musicgen_transformers.py

src/transformers/models/musicgen/modeling_musicgen.py

ArthurZucker · 2023-10-27T09:08:24Z

Is the format we gave (interleaving) forced by BC? Otherwise storing in two list or tuples would be better IMO (kind of like how audio are stored no?)
my only "complain" here

src/transformers/models/musicgen/modeling_musicgen.py

…tereo

sanchit-gandhi · 2023-10-27T17:23:42Z

The original model is designed to predict them in an interleaved way:

[left_codebook_1, right_codebook_1, left_codebook_2, right_codebook_2, ..., left_codebook_4, right_codebook_4]

We could change this to predict left first, then right:

[left_codebook_1, left_codebook_2, ..., left_codebook_4, right_codebook_1, right_codebook_2, ..., right_codebook_4]

Which would require re-shaping the LM head weights, and duplicating the pattern mask along the row dimension. Overall I think the complexity would be similar to the interleaved way we have now.

But predicting two sets of codebooks as two separate tuples would break compatibility with the existing mono musicgen, or otherwise complicate the code since we'll have different inputs / sampling logic depending on whether we're mono / stereo.

ArthurZucker · 2023-11-06T09:12:32Z

Awesome thanks for explaining

* [MusicGen] Add stereo model * safe serialization * Update src/transformers/models/musicgen/modeling_musicgen.py * split over 2 lines * fix slow tests on cuda

[MusicGen] Add stereo model

7ac1fd5

sanchit-gandhi commented Oct 26, 2023

View reviewed changes

safe serialization

72b0f8c

sanchit-gandhi changed the title ~~[MusicGen] Add stereo model~~ MusicGen Update Oct 26, 2023

sanchit-gandhi requested review from ydshieh and ArthurZucker and removed request for ydshieh October 26, 2023 14:44

ArthurZucker reviewed Oct 27, 2023

View reviewed changes

src/transformers/models/musicgen/convert_musicgen_transformers.py Show resolved Hide resolved

src/transformers/models/musicgen/modeling_musicgen.py Outdated Show resolved Hide resolved

src/transformers/models/musicgen/modeling_musicgen.py Outdated Show resolved Hide resolved

ArthurZucker approved these changes Oct 27, 2023

View reviewed changes

sanchit-gandhi commented Oct 27, 2023

View reviewed changes

src/transformers/models/musicgen/modeling_musicgen.py Outdated Show resolved Hide resolved

sanchit-gandhi and others added 3 commits October 27, 2023 18:16

Update src/transformers/models/musicgen/modeling_musicgen.py

02ae8f6

split over 2 lines

fa30060

Merge remote-tracking branch 'origin/musicgen-stereo' into musicgen-s…

0581b45

…tereo

fix slow tests on cuda

d31044e

sanchit-gandhi merged commit f16ff0f into huggingface:main Nov 8, 2023
18 checks passed

sanchit-gandhi deleted the musicgen-stereo branch November 8, 2023 13:26

sanchit-gandhi changed the title ~~MusicGen Update~~ MusicGen: Add Stereo Model Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MusicGen: Add Stereo Model #27084

MusicGen: Add Stereo Model #27084

sanchit-gandhi commented Oct 26, 2023

sanchit-gandhi Oct 26, 2023

HuggingFaceDocBuilderDev commented Oct 26, 2023 •

edited

Loading

ArthurZucker left a comment

ArthurZucker commented Oct 27, 2023

sanchit-gandhi commented Oct 27, 2023

ArthurZucker commented Nov 6, 2023

	audio_channels (`int`, optional, defaults to 1):
	Number of channels in the audio data. Either 1 for mono or 2 for stereo.

MusicGen: Add Stereo Model #27084

MusicGen: Add Stereo Model #27084

Conversation

sanchit-gandhi commented Oct 26, 2023

What does this PR do?

sanchit-gandhi Oct 26, 2023

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Oct 26, 2023 • edited Loading

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker commented Oct 27, 2023

sanchit-gandhi commented Oct 27, 2023

ArthurZucker commented Nov 6, 2023

HuggingFaceDocBuilderDev commented Oct 26, 2023 •

edited

Loading