prosody of sample output. fmin, fmax. Glowtts #1091

michaellin99999 · 2022-01-10T10:35:40Z

michaellin99999
Jan 10, 2022

Does fmin and fmax in the audio parameters have an affect on the prosody of the generated sample?
I have them set on 220-3200 and the sample output sounds monotone. I trained glowtts with 700k steps and 20 hours of female dataset. and connected it to Griffin-Lim to produce the sample.

, We are trying to utilize GlowTTS to synthesize speaker voice and our data set consists of 22 hours of female speaking.We made sure that our dataset has good quality audio ,(I have attached some dataset sentences) dataset (the files I attached with encoded names)

However, when we trained glowtts from scratch and added griffin-lim and inference at 700k steps, we found that the output sounds incredibly monotone and robotic. (sample 1 and sample 2) We do not know why this has happened.

When we trained glowtts, we made the following adjustments. "mel_fmin set to 220 and mel_fmax set to 3200" We found these values to produce the best audio quality utilizing the coqui audio parameter testing environment. I listed our config below.

I am really stuck on what could have caused this problem. Could there be a reasoning for this? or is it because we changed the mel_fmin and mel_fmax to too low?

dataset audio clips.zip

GlowTTS Output samples with GriffinLim.zip

michaellin99999 · 2022-01-10T12:11:42Z

michaellin99999
Jan 10, 2022
Author

What I mean by monotone, is that in the dataset, the speaker has intonation ups and downs. but when glowtts + Griffin Lim the output is completely lack of intonation ups and down

3 replies

erogol Jan 11, 2022
Maintainer

fmin fmax has no effect on the prosody of the model. Prosody is determined by the text encoder. It is really hard to give feedback without training the model myself.

michaellin99999 Jan 12, 2022
Author

fmin fmax has no effect on the prosody of the model. Prosody is determined by the text encoder. It is really hard to give feedback without training the model myself.

is there a recommended text encoder that we should use?
are you refering to this section of the config? the following is what we have,

"hidden_channels_encoder": 192,
"hidden_channels_decoder": 192,
"hidden_channels_duration_predictor": 256,
"use_encoder_prenet": true,
"encoder_type": "rel_pos_transformer",
"encoder_params": {
"kernel_size":3,
"dropout_p": 0.1,
"num_layers": 6,
"num_heads": 2,
"hidden_channels_ffn": 768,
"input_length": null
},

what we see is that we can select between: 1. gated_conv 2. residual_conv_bn 3. time_depth_separable 4. rel_pos_transformers (default). is there anyway we can decide which one is better for us? I cant seem to find any documentation or reference information to explain what these are

or do we need to adjust the encoder parameters?

michaellin99999 Jan 12, 2022
Author

fmin fmax has no effect on the prosody of the model. Prosody is determined by the text encoder. It is really hard to give feedback without training the model myself.

and what does fmin and fmax mean exactly? se set these values based on the environment provided where we found that voices sound best under these two parameters for our speaker. does fmin and fmax mean that any spectrogram outside this range is completely filtred out or ignored during training?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prosody of sample output. fmin, fmax. Glowtts #1091

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

prosody of sample output. fmin, fmax. Glowtts #1091

michaellin99999 Jan 10, 2022

Replies: 1 comment · 3 replies

michaellin99999 Jan 10, 2022 Author

erogol Jan 11, 2022 Maintainer

michaellin99999 Jan 12, 2022 Author

michaellin99999 Jan 12, 2022 Author

michaellin99999
Jan 10, 2022

Replies: 1 comment 3 replies

michaellin99999
Jan 10, 2022
Author

erogol Jan 11, 2022
Maintainer

michaellin99999 Jan 12, 2022
Author

michaellin99999 Jan 12, 2022
Author