mel_fmax does not cover all frequency #7

seungwonpark · 2019-10-23T06:14:37Z

Looks like waveglow's default configuration doesn't allow mel-spectrogram to represent all range of frequency (0~11025Hz): https://github.com/NVIDIA/waveglow/blob/master/config.json

This is a plot of librosa.filters.mel(22050, 1024, 80, fmin=0.0, fmax=8000.0).

I think was the reason why waveglow and our implementation of melgan doesn't look to generate high-frequency audio.

The text was updated successfully, but these errors were encountered:

seungwonpark · 2019-10-23T08:00:52Z

Running another experiment by setting mel_fmax=11025.0.
I had to run preprocess.py again since all mel-spectrograms need to be calculated again.

rishikksh20 · 2019-10-23T17:23:17Z

@seungwonpark Actually this is how vocoder works efficiently, we always consider frequency between 0 to 8000 from wavenet to wavernn all vocoder models in between this frequency range, this helps model to consider vocal frequency (bandwidth allocated for a single voice-frequency transmission channel is usually 4 kHz) over other frequencies.
Per the Nyquist–Shannon sampling theorem, the sampling frequency (8 kHz) must be at least twice the highest component of the voice frequency via appropriate filtering prior to sampling at discrete times (4 kHz) for effective reconstruction of the voice signal.
So 8kHz is enough to model any voice.
Meanwhile we do lose some environmental crispness by doing this, but you only notice a minute difference when you heard sound with Good noise cancellation headphone.

seungwonpark · 2019-10-24T01:38:05Z

@rishikksh20 Thanks for sharing your insight!
I will be doing an ablation study on this, but I think we can close this issue for now since it's not really critical, as you've explained.

seungwonpark closed this as completed Oct 24, 2019

seungwonpark mentioned this issue Oct 28, 2019

Some notable differences with official implementation #17

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mel_fmax does not cover all frequency #7

mel_fmax does not cover all frequency #7

seungwonpark commented Oct 23, 2019

seungwonpark commented Oct 23, 2019

rishikksh20 commented Oct 23, 2019

seungwonpark commented Oct 24, 2019

mel_fmax does not cover all frequency #7

mel_fmax does not cover all frequency #7

Comments

seungwonpark commented Oct 23, 2019

seungwonpark commented Oct 23, 2019

rishikksh20 commented Oct 23, 2019

seungwonpark commented Oct 24, 2019