Good ISTFT configuration for lower-resolution inputs #53

janvainer · 2024-06-10T07:45:56Z

Hi I am trying to train Vocos on features with lower time resolution.

The vanilla params are:

        n_fft: 1024
        hop_length: 256

what my experiments use:

        n_fft: 1920
        hop_length: 480

The result speaks but the voice quality is much worse compared to the vanilla training.
Could you please recommend some adjustments that might help with the output quality for the mentioned stft setup?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Good ISTFT configuration for lower-resolution inputs #53

Good ISTFT configuration for lower-resolution inputs #53

janvainer commented Jun 10, 2024

Good ISTFT configuration for lower-resolution inputs #53

Good ISTFT configuration for lower-resolution inputs #53

Comments

janvainer commented Jun 10, 2024