Text to Speech with FastSpeech2

FastSpeech2 article and FastSpeech article.

Example

Inference result is audio, but Github supports only video+audio formats.

https://github.com/tgritsaev/fastspeech2/assets/34184267/80b357d5-6a8f-492d-a550-d8c83645e2f2

You can also download a folder with tts-results from Google Drive, it includes 27 audios with different length, pitch and energy for the first three inputs from test_model/input.txt.

Installation guide

Use python3.9

conda create -n fastspeech2 python=3.9 && conda activate fastspeech2

Install libraries

pip3 install -r requirements.txt

Download data

bash scripts/download_data.sh

Preprocess data: save pitch and energy

python3 scripts/preprocess_data.py

Download my final FastSpeech2 checkpoint

python3 scripts/download_checkpoint.py

Train

Run for training

python3 train.py -c configs/train.json

Final model was trained with train.json config.

Test

Run for testing

python3 test.py

test.py include such arguments:

Config path: -c, --config, default="configs/test.json"
Create multiple audio variants with different length, pitch and energy -t, --test, default=False
Increase or decrease audio speed: -l, --length-control, default=1
Increase or decrease audio pitch: -p, --pitch-control, default=1
Increase or decrease audio energy: -e, --energy-control, default=1
Checkpoint path: -cp, --checkpoint, default="test_model/tts-checkpoint.pth"
Input texts path: -i, --input, test_model/input.txt
Waveglow weights path: -w, --waveglow, default="waveglow/pretrained_model/waveglow_256channels.pt"

Results will be saved in the test_model/results, you can see example in this folder.

Wandb Report

https://api.wandb.ai/links/tgritsaev/rkir8sp9 (English only)

Credits

This repository is based on a heavily modified fork of pytorch-template repository. FastSpeech2 impementation is based on the code from HSE "Deep Learning in Audio" course seminar and official FastSpeech2 repository.

fastspeech2

Commits

Update README.md

fd

fd

fd

fd

fd

README