Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MusicGen: Add Stereo Model #27084

Merged
merged 6 commits into from
Nov 8, 2023

Conversation

sanchit-gandhi
Copy link
Contributor

What does this PR do?

The original MusicGen model generates mono (1-channel) outputs. It does this by predicting a set of 4 codebooks at each generation step:

[codebook_1, codebook_2, codebook_3, codebook_4]

After generating, the sequence of predicted codebooks is passed through the EnCodec model to get the final waveform.

This PR adds the MusicGen stereo model. It works by predicting two sets of codebooks at each step. One set of codebooks corresponds to the left channel, the other set corresponds to the right channel. The sets of codebooks are interleaved as follows:

[left_codebook_1, right_codebook_1, left_codebook_2, right_codebook_2, ..., left_codebook_4, right_codebook_4]

After generating, the sequence of generated codebooks are partitioned into their left/right parts, and then each sequence passed through EnCodec to get the left/right waveform respectively.

@@ -75,6 +75,9 @@ class MusicgenDecoderConfig(PretrainedConfig):
The number of parallel codebooks forwarded to the model.
tie_word_embeddings(`bool`, *optional*, defaults to `False`):
Whether input and output word embeddings should be tied.
audio_channels (`int`, *optional*, defaults to 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the EnCodec naming here:

audio_channels (`int`, *optional*, defaults to 1):
Number of channels in the audio data. Either 1 for mono or 2 for stereo.

Note that the EnCodec model used is still 1-channel (mono) - it's just the MusicGen model that works in a 2-channel fashion.

@sanchit-gandhi sanchit-gandhi changed the title [MusicGen] Add stereo model MusicGen Update Oct 26, 2023
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Oct 26, 2023

The documentation is not available anymore as the PR was closed or merged.

@sanchit-gandhi sanchit-gandhi requested review from ydshieh and ArthurZucker and removed request for ydshieh October 26, 2023 14:44
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks looks very clean 😉

@ArthurZucker
Copy link
Collaborator

Is the format we gave (interleaving) forced by BC? Otherwise storing in two list or tuples would be better IMO (kind of like how audio are stored no?)
my only "complain" here

@sanchit-gandhi
Copy link
Contributor Author

The original model is designed to predict them in an interleaved way:

[left_codebook_1, right_codebook_1, left_codebook_2, right_codebook_2, ..., left_codebook_4, right_codebook_4]

We could change this to predict left first, then right:

[left_codebook_1, left_codebook_2, ..., left_codebook_4, right_codebook_1, right_codebook_2, ..., right_codebook_4]

Which would require re-shaping the LM head weights, and duplicating the pattern mask along the row dimension. Overall I think the complexity would be similar to the interleaved way we have now.

But predicting two sets of codebooks as two separate tuples would break compatibility with the existing mono musicgen, or otherwise complicate the code since we'll have different inputs / sampling logic depending on whether we're mono / stereo.

@ArthurZucker
Copy link
Collaborator

Awesome thanks for explaining

@sanchit-gandhi sanchit-gandhi merged commit f16ff0f into huggingface:main Nov 8, 2023
18 checks passed
@sanchit-gandhi sanchit-gandhi deleted the musicgen-stereo branch November 8, 2023 13:26
@sanchit-gandhi sanchit-gandhi changed the title MusicGen Update MusicGen: Add Stereo Model Nov 8, 2023
EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 19, 2023
* [MusicGen] Add stereo model

* safe serialization

* Update src/transformers/models/musicgen/modeling_musicgen.py

* split over 2 lines

* fix slow tests on cuda
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants