Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow deterministic generations #175

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jn-jairo
Copy link
Contributor

@jn-jairo jn-jairo commented Apr 27, 2023

Edit: some changes got implemented in another commit, so I updated the description to represents only the current changes.


This PR adds the set_seed(seed) to allow deterministic generations:

  • Use seed = set_seed() or seed = set_seed(0) to generate and set a random seed, the seed is returned.
  • Use set_seed(seed) to set a specific seed number.
  • Use set_seed(-1) to disable the deterministic process and go back to fully non-deterministic.

BE AWARE: the seed affects torch, numpy and python, so if you are running other softwares that require non-deterministic random values, remember to call set_seed(-1) after you generate the audio.

Example:

from bark import SAMPLE_RATE, generate_audio, preload_models, set_seed
from scipy.io.wavfile import write as write_wav
import numpy as np

preload_models()

prompt = "I have a silky smooth voice, and today I will tell you about the exercise regimen of the common sloth."

set_seed(123)
audio_array_1 = generate_audio(prompt)
write_wav("/path/to/audio_1.wav", SAMPLE_RATE, audio_array_1)

set_seed(123)
audio_array_2 = generate_audio(prompt)
write_wav("/path/to/audio_2.wav", SAMPLE_RATE, audio_array_2)

# BE AWARE: the seed affects torch, numpy and python,
# so if you are running other softwares that require non-deterministic random values,
# remember to call `set_seed(-1)` after you generate the audio.
set_seed(-1)

assert(np.array_equal(audio_array_1, audio_array_2))
"""

Copy link

@santiarias santiarias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very cool. I am doing this ad-hoc, so hopefully this idea gets implemented.

@mcamac
Copy link
Contributor

mcamac commented May 2, 2023

thanks for this- keeping open so it's on our radar

@jn-jairo jn-jairo changed the title Allow consistency and deterministic generations Allow deterministic generations May 3, 2023
@jn-jairo
Copy link
Contributor Author

jn-jairo commented May 3, 2023

I think this PR isn't necessary anymore, I found a package that does the same thing, now I am using it.

UM-ARM-Lab/pytorch_seed

from bark import SAMPLE_RATE, generate_audio, preload_models
from scipy.io.wavfile import write as write_wav
import numpy as np
import pytorch_seed

preload_models()

prompt = "I have a silky smooth voice, and today I will tell you about the exercise regimen of the common sloth."

with pytorch_seed.SavedRNG(123):
    audio_array_1 = generate_audio(prompt)
write_wav("/path/to/audio_1.wav", SAMPLE_RATE, audio_array_1)

with pytorch_seed.SavedRNG(123):
    audio_array_2 = generate_audio(prompt)
write_wav("/path/to/audio_2.wav", SAMPLE_RATE, audio_array_2)

assert(np.array_equal(audio_array_1, audio_array_2))

@gkucsko
Copy link
Contributor

gkucsko commented May 4, 2023

neat thanks! out of curiosity, what do you need the seed for? is shouldn't really help with consistency right? like, if you change the text prompt the results will be completely different regardless of seed, no?

@jn-jairo
Copy link
Contributor Author

jn-jairo commented May 4, 2023

neat thanks! out of curiosity, what do you need the seed for? is shouldn't really help with consistency right? like, if you change the text prompt the results will be completely different regardless of seed, no?

It helps to get the same voice and intonation using the history_prompt + the seed used to create that history_prompt. So I use the seed + history_prompt together, because even with the history_prompt if the seed is different the voice is not exact the same, sometime it sounds too different, but the pair seed + history_prompt fix it.

And the seed also helps to get better consistency in a list of prompts (long text), if we have too many prompts on the list after some prompts it starts to sound too different.

The pair seed + prompt always gets the same voice, if we change the seed or prompt the voice will be different.

So, to find a voice I choose a prompt that fits the voice I want, then I generate multiple audios changing the seed and saving the seed + history_prompt, and I choose the best one.

To generate other prompts in sequence (long text) I set the saved seed + history_prompt for the first prompt on the list, then for the other prompts I set the saved seed + the history_prompt return by the output_full=True of the first prompt, because it helps to keep consistency. With that process the voice sounds the same and keeps the same intonation for the whole audio.

@apollner
Copy link

apollner commented Jul 6, 2023

Sorry I'm a bit confused, at the end how do you use deterministic generation here? do you need pytorch_seed? is there a way just with pytorch?

@jn-jairo
Copy link
Contributor Author

jn-jairo commented Jul 6, 2023

Sorry I'm a bit confused, at the end how do you use deterministic generation here? do you need pytorch_seed? is there a way just with pytorch?

Yes you can do it with just pytorch, the pytorch_seed is just a helper function to set and manage the seed.

The reason for using the same seed is simple, the random numbers dictates the generations, if you use the same random numbers (seed) in all generations in a long text it will have more similar results.

Just to let that clear, I think this PR isn't necessary anymore, it is still open as a reference while the suno team researches that topic.

While this is an option to achieve better consistence, now I think it should be better if the bark stays as simple as possible and away from specialized changes, there are a lot of projects that use bark and they can use this approach if they want to, without the need to have it as a builtin feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants