GitXplorerGitXplorer
s

tacotron-tts-cpp

public
76 stars
25 forks
1 issues

Commits

List of commits on branch master.
Unverified
adb3d62913590aac3b8f1f1434c44c525dbbf8d3

Add note on --config=monolithic for building libtensorflow_cc.

ssyoyo committed 5 years ago
Unverified
dc84445752211f5a0181a26fca177dbc933e81fc

Update README.

ssyoyo committed 5 years ago
Unverified
9dc5612afe9c35fd48cbd66c729ae52ebac44b2c

number_to_words experiment(Very early stage).

ssyoyo committed 5 years ago
Unverified
55c65b1ea5156c37c446bbc3dbd8a6a8405952e3

Cosmetics.

ssyoyo committed 6 years ago
Unverified
45a52b9a3b7b592757f5c5286f3a40b848917baa

Audio postprocessing in C++.

ssyoyo committed 6 years ago
Unverified
254824600afa641ed1f4f633d23d00726e11f9fe

Implement some audio postprocessing functions.

ssyoyo committed 6 years ago

README

The README file for this repository.

Text-to-speech in (partially) C++ using Tacotron model + Tensorflow

Running Tacotron model in TensorFlow C++ API.

Its good for running TTS in mobile or embedded device.

Code is based on keithito's tacotron implementation: https://github.com/keithito/tacotron

Status

Experimental.

Python preprocessing is required to generate sequence data from a text.

Requirment

  • TensorFlow r1.8+
  • Ubuntu 16.04 or later
  • C++ compiler + cmake

Dump graph.

In keithito's tacotron repo, append tf.train.write_graph to Synthesizer::load to save TensorFlow graph.

class Synthesizer:
  def load(self, checkpoint_path, model_name='tacotron'):

    ...

    # write graph
    tf.train.write_graph(self.session.graph.as_graph_def(), "models/", "graph.pb")

Freeze graph

Freeze graph for example:

freeze_graph \
        --input_graph=models/graph.pb \
        --input_checkpoint=./tacotron-20180906/model.ckpt \
        --output_graph=models/tacotron_frozen.pb \
        --output_node_names=model/griffinlim/Squeeze

Example freeze graph file is included in this repo.

Build

Edit libtensorflow_cc.so path(Assume you build TensorFlow from source code) in bootstrap.sh, then

$ ./bootstrap.sh
$ build
$ make

Note on libtensorflow_cc

Please make sure building libtensorflow_cc with --config=monolithic. Otherwise you'll face undefined symbols error at linking stage.

https://www.tensorflow.org/install/source#preconfigured_configurations

Run

Prepare sequence JSON file. Sequence can be generated by using text_to_sequence() function in keithito's tacotron repo.

See sample/sequence01.json for generated example.

Then,

$ ./tts -i ../sample/sequence01.json -g ../tacotron_frozen.pb output.wav

example output01.wav and processed01.wav is included in sample/

Optional parameter

You can specify hyperparameter settings(JSON format) using -h option. See sample/hparams.json for example.

$ ./tts -i ../sample/sequence01.json -h ../sample/hparams.json -g ../tacotron_frozen.pb output.wav

Performance

Currently TensorFlow C++ code path only uses single CPU core, so its slow. Time for synthesis is roughly 10x slower on 2018's CPU than synthesized audio length(e.g. 60 secs for 6 secs audio).

TODO

  • Write all TTS pipeline fully in C++
    • [ ] Text to sequence(Issue #1)
      • [ ] Convert to lower case
      • [ ] Expand abbreviation
      • [ ] Normalize numbers(number_to_words. python inflect equivalent)
      • [ ] Remove extra whitespace
    • [ ] Use CPU implementation of Griffin-Lim

License

MIT license.

Pretrained model used for freezing graph is obtained from keithito's repo.

Third party licenses

  • json.hpp : MIT license
  • cxxopts.hpp : MIT license
  • dr_wav : Public domain