GitXplorerGitXplorer
t

fastspeech2

public
1 stars
0 forks
0 issues

Commits

List of commits on branch main.
Unverified
456b5fa6dcb8d8d98fdb7f5bb8b9d3a723703eb3

fd

committed a year ago
Unverified
b988d28bdb99834777ecb3e5e92fd661896a780a

fd

committed a year ago
Unverified
2b0a00120274f8b4932816a09c0d968819a70c05

Merge branch 'main' of https://github.com/tgritsaev/fastspeech2 into main

committed a year ago
Verified
14d31326e44d5ac3608eaacb4eec5d83367e76a8

Update README.md

ttgritsaev committed a year ago
Verified
7e67f02ddea36051d53088a8d4ac388d957e9524

Update README.md

ttgritsaev committed a year ago
Verified
475f449813441bb19f41e7bcff4dcc5b2de5349e

Update README.md

ttgritsaev committed a year ago

README

The README file for this repository.

Text to Speech with FastSpeech2

FastSpeech2 article and FastSpeech article.

Example

Inference result is audio, but Github supports only video+audio formats.

https://github.com/tgritsaev/fastspeech2/assets/34184267/80b357d5-6a8f-492d-a550-d8c83645e2f2

You can also download a folder with tts-results from Google Drive, it includes 27 audios with different length, pitch and energy for the first three inputs from test_model/input.txt.

Installation guide

  1. Use python3.9
conda create -n fastspeech2 python=3.9 && conda activate fastspeech2
  1. Install libraries
pip3 install -r requirements.txt
  1. Download data
bash scripts/download_data.sh
  1. Preprocess data: save pitch and energy
python3 scripts/preprocess_data.py
  1. Download my final FastSpeech2 checkpoint
python3 scripts/download_checkpoint.py

Train

  1. Run for training
python3 train.py -c configs/train.json

Final model was trained with train.json config.

Test

  1. Run for testing
python3 test.py

test.py include such arguments:

  • Config path: -c, --config, default="configs/test.json"
  • Create multiple audio variants with different length, pitch and energy -t, --test, default=False
  • Increase or decrease audio speed: -l, --length-control, default=1
  • Increase or decrease audio pitch: -p, --pitch-control, default=1
  • Increase or decrease audio energy: -e, --energy-control, default=1
  • Checkpoint path: -cp, --checkpoint, default="test_model/tts-checkpoint.pth"
  • Input texts path: -i, --input, test_model/input.txt
  • Waveglow weights path: -w, --waveglow, default="waveglow/pretrained_model/waveglow_256channels.pt"

Results will be saved in the test_model/results, you can see example in this folder.

Wandb Report

https://api.wandb.ai/links/tgritsaev/rkir8sp9 (English only)

Credits

This repository is based on a heavily modified fork of pytorch-template repository. FastSpeech2 impementation is based on the code from HSE "Deep Learning in Audio" course seminar and official FastSpeech2 repository.