TTS - Speech generation | TALP :: Language and Speech Technologies and Applications

TTS - Speech generation

You are here

TTS - Speech generation

TEXT TO SPEECH SYNTHESIS

Research with learning-from-data statistical methods got a strong increase in the first decade of the new century when a research line on speech synthesis work shifted to statistical machine learning techniques.

After decades of small year-by-year increase in performance, the introduction of deep learning techniques is producing big steps towards human or even super-human performance in tasks as text-to-speech.

Our recent work on using DNNs has focused on carrying out multiple speaker speech synthesis, speaker adaptation and speaker interpolation with multi-output recursive (RNN-LSTM) networks. Recently, we have produced more expressive speech including at the input semantic features derived automatically from raw text or applying transfer learning. Our goal for the forthcoming years is to explore end-to-end architectures which can, in one side derive automatically the linguistic features from raw text, and on the other side, generate directly the speech waveform, without the quality loss that the parametric representation imposes.

A seminal work in speech enhancement uses generative adversarial DNNs. The work will be continued and extended aiming not only at reducing the noise but also to other distorsions of the signal as non-linear distortion, chopped speech, etc.

Demos

Demos

Projects

Research Projects

Projects

Innovation Projects

Other Projects

Resources

Resources

Staff AE Main researcher detail

Main Researcher

Antonio Bonafonte Cávez

Staff AE detail Other researchers

Researchers

More information

More information

Copyright © 2017 - Designed by Madstudio

Scroll to Top