Skip to content

Latest commit

 

History

History
109 lines (85 loc) · 3.37 KB

README.md

File metadata and controls

109 lines (85 loc) · 3.37 KB

Multi-speaker FastSpeech 2 - PyTorch Implementation ⚡



Datasets 🐘

This project supports 2 muti-speaker datasets:

🔥 Single-Speaker

  • LJSpeech

🔥 Multi-Speaker

  • LibriTTS

  • VCTK

Config

Configurations are in:

  • config/dataset.yaml
  • config/hparams.py

Please modify the dataest and mfa_path in hparams.

In this repo, we're using MFA v1. Migrating to MFA v2 is a TODO item.

Steps

  1. preprocess.py
  2. train.py
  3. synthesize.py

1. Preprocess

File Structures:

[DATASET] / wavs / speaker / wav_files [DATASET] / txts / speaker / txt_files

  • wav_dir : the folder containing speaker dirs ( [DATASET] / wavs )
  • txt_dir : the folder containing speaker dirs ( [DATASET] / txts )
  • save_dir : the output directory (e.g. "./processed" )
  • --prepare_mfa : create mfa_data
  • --mfa : create textgrid files
  • --create_dataset : generate mel, phone, f0 ....., metadata.json

Example commands:

  • LJSpeech:
#run the script for organizing LJSpeech first
python ./script/organizeLJ.py

python preprocess.py /storage/tts2021/LJSpeech-organized/wavs /storage/tts2021/LJSpeech-organized/txts ./processed/LJSpeech --prepare_mfa --mfa --create_dataset
  • LibriTTS:
python preprocess.py /storage/tts2021//LibriTTS/train-clean-360 /storage/tts2021//LibriTTS/train-clean-360 ./processed/LibriTTS --prepare_mfa --mfa --create_dataset
  • VCTK:
python preprocess.py /storage/tts2021/VCTK-Corpus/wav48/ /storage/tts2021/VCTK-Corpus/txt ./processed/VCTK --prepare_mfa --mfa --create_dataset

metadata.json includes:

  1. spker table
  2. traning data
  3. validation data

2. Train

  • data_dir : the preprocessed data directory
  • --comment: some comments

Example commands:

  • LJSpeech:
python train.py ./processed/LJSpeech --comment "Hello LJSpeech" 
  • LibriTTS:
python train.py ./processed/LibriTTS --comment "Hello LibriTTS" 
  • VCTK:
python train.py ./processed/VCTK --comment "Hello VCTK"

3. Synthesize

  • --ckpt_path: the checkpoint path
  • --output_dir: the directory to put the synthesized audios

Example commands:

python synthesize.py --ckpt_path ./records/LJSpeech_2021-11-22-22:42/ckpt/checkpoint_125000.pth.tar --output_dir ./output

References 📔