Multi-speaker FastSpeech 2 - PyTorch Implementation ⚡

This is a PyTorch implementation of Microsoft's FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.
Now supporting about 900 speakers in 🔥 LibriTTS for multi-speaker text-to-speech.

Datasets 🐘

This project supports 2 muti-speaker datasets:

🔥 Single-Speaker

LJSpeech

🔥 Multi-Speaker

LibriTTS
VCTK

Config

Configurations are in:

config/dataset.yaml
config/hparams.py

Please modify the dataest and mfa_path in hparams.

In this repo, we're using MFA v1. Migrating to MFA v2 is a TODO item.

Steps

preprocess.py
train.py
synthesize.py

1. Preprocess

File Structures:

[DATASET] / wavs / speaker / wav_files [DATASET] / txts / speaker / txt_files

wav_dir : the folder containing speaker dirs ( [DATASET] / wavs )
txt_dir : the folder containing speaker dirs ( [DATASET] / txts )
save_dir : the output directory (e.g. "./processed" )
--prepare_mfa : create mfa_data
--mfa : create textgrid files
--create_dataset : generate mel, phone, f0 ....., metadata.json

Example commands:

LJSpeech:

#run the script for organizing LJSpeech first
python ./script/organizeLJ.py

python preprocess.py /storage/tts2021/LJSpeech-organized/wavs /storage/tts2021/LJSpeech-organized/txts ./processed/LJSpeech --prepare_mfa --mfa --create_dataset

LibriTTS:

python preprocess.py /storage/tts2021//LibriTTS/train-clean-360 /storage/tts2021//LibriTTS/train-clean-360 ./processed/LibriTTS --prepare_mfa --mfa --create_dataset

VCTK:

python preprocess.py /storage/tts2021/VCTK-Corpus/wav48/ /storage/tts2021/VCTK-Corpus/txt ./processed/VCTK --prepare_mfa --mfa --create_dataset

metadata.json includes:

spker table
traning data
validation data

2. Train

data_dir : the preprocessed data directory
--comment: some comments

Example commands:

LJSpeech:

python train.py ./processed/LJSpeech --comment "Hello LJSpeech"

LibriTTS:

python train.py ./processed/LibriTTS --comment "Hello LibriTTS"

VCTK:

python train.py ./processed/VCTK --comment "Hello VCTK"

3. Synthesize

--ckpt_path: the checkpoint path
--output_dir: the directory to put the synthesized audios

Example commands:

python synthesize.py --ckpt_path ./records/LJSpeech_2021-11-22-22:42/ckpt/checkpoint_125000.pth.tar --output_dir ./output

References 📔

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech, Y. Ren, et al.
FastSpeech: Fast, Robust and Controllable Text to Speech, Y. Ren, et al.
xcmyz's FastSpeech implementation
rishikksh20's FastSpeech2 implementation
TensorSpeech's FastSpeech2 implementation
NVIDIA's WaveGlow implementation
seungwonpark's MelGAN implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Multi-speaker FastSpeech 2 - PyTorch Implementation ⚡

Datasets 🐘

🔥 Single-Speaker

🔥 Multi-Speaker

Config

Steps

1. Preprocess

File Structures:

Example commands:

metadata.json includes:

2. Train

Example commands:

3. Synthesize

Example commands:

References 📔

Files

README.md

Latest commit

History

README.md

File metadata and controls

Multi-speaker FastSpeech 2 - PyTorch Implementation ⚡

Datasets 🐘

🔥 Single-Speaker

🔥 Multi-Speaker

Config

Steps

1. Preprocess

File Structures:

Example commands:

metadata.json includes:

2. Train

Example commands:

3. Synthesize

Example commands:

References 📔