GitXplorerGitXplorer
t

fastspeech2

public
1 stars
0 forks
0 issues

Commits

List of commits on branch main.
Verified
29c1a91c24c0f447a27edbc6d16761190c2d5ff1

Update README.md

committed a year ago
Unverified
f47557f65106455da47d982c3ca6f571c56cc95c

fd

committed a year ago
Unverified
bf959d65477cdf101be563fcc1bacd59aecfd347

fd

committed a year ago
Unverified
830bbe9b4d36d6d44e94d57aa9c96f29912c0ec0

fd

committed a year ago
Unverified
1fd7dcaf731f2f23ac92036c1aeb8272b10ddf47

fd

committed a year ago
Unverified
f1dda7c8971140628da13c54fac64a7f5afad1e9

fd

committed a year ago

README

The README file for this repository.

Text to Speech with FastSpeech2

FastSpeech2 article and FastSpeech article.

Example

Inference result is audio, but Github supports only video+audio formats.

https://github.com/tgritsaev/fastspeech2/assets/34184267/80b357d5-6a8f-492d-a550-d8c83645e2f2

You can also download a folder with tts-results from Google Drive, it includes 27 audios with different length, pitch and energy for the first three inputs from test_model/input.txt.

Installation guide

  1. Use python3.9
conda create -n fastspeech2 python=3.9 && conda activate fastspeech2
  1. Install libraries
pip3 install -r requirements.txt
  1. Download data
bash scripts/download_data.sh
  1. Preprocess data: save pitch and energy
python3 scripts/preprocess_data.py
  1. Download my final FastSpeech2 checkpoint
python3 scripts/download_checkpoint.py

Train

  1. Run for training
python3 train.py -c configs/train.json

Final model was trained with train.json config.

Test

  1. Run for testing
python3 test.py

test.py include such arguments:

  • Config path: -c, --config, default="configs/test.json"
  • Create multiple audio variants with different length, pitch and energy -t, --test, default=False
  • Increase or decrease audio speed: -l, --length-control, default=1
  • Increase or decrease audio pitch: -p, --pitch-control, default=1
  • Increase or decrease audio energy: -e, --energy-control, default=1
  • Checkpoint path: -cp, --checkpoint, default="test_model/tts-checkpoint.pth"
  • Input texts path: -i, --input, test_model/input.txt
  • Waveglow weights path: -w, --waveglow, default="waveglow/pretrained_model/waveglow_256channels.pt"

Results will be saved in the test_model/results, you can see example in this folder.

Wandb Report

https://api.wandb.ai/links/tgritsaev/rkir8sp9 (English only)

Credits

This repository is based on a heavily modified fork of pytorch-template repository. FastSpeech2 impementation is based on the code from HSE "Deep Learning in Audio" course seminar and official FastSpeech2 repository.