TALP Talk: Generative adversarial networks (GAN) applied to Speech Enhancement

Generative adversarial networks (GAN) are a type of artificial intelligence algorithms used in unsupervised machine learning, implemented by a system of two neural networks competing against each other in a zero-sum game framework. They were first introduced by Ian Goodfellow et al. in 2014. [Wikipedia]

Next Wednesday, Santi Pascual will present his work on GAN applied to Speech Enhancement.
Wed.  May 17th, at 11.15.
Room D5-007 (UPC, Campus Nord)

SEGAN: Speech Enhancement Generative Adversarial Network

Current speech enhancement techniques operate on the spectral domain and/or exploit some higher-level feature. The majority of them tackle a limited number of noise conditions and rely on first-order statistics. To circumvent these issues, deep networks are being increasingly used, thanks to their ability to learn complex functions from large example sets. In this work, we propose the use of generative adversarial networks for speech enhancement. In contrast to current techniques, we operate at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them. We evaluate the proposed model using an independent, unseen test set with two speakers and 20 alternative noise conditions. The enhanced samples confirm the viability of the proposed model, and both objective and subjective evaluations confirm the effectiveness of it. With that, we open the exploration of generative architectures for speech enhancement, which may progressively incorporate further speech-centric design choices to improve their performance.


Here are some samples: http://veu.talp.cat/segan/

Neural Machine Translation

Marta Ruiz will present his work on GAN applied to Speech Enhancement.
May 18th, at 12.
Telefonica i+d (Plaça d'Ernest Lluch i Martin, 5, 08019 Barcelona)




Neural Machine Translation (MT) is starting to become a standard both in industry and in academics. The new paradigm is entirely based on an end-to-end deep learning architecture .




Neural MT was proposed in 2014 using a simple sequence-to-sequence architecture. This architecture has evolved introducing bidirectional recurrent neural networks, attention-based mechanism, convolutional networks and multi-task training. Several of these advances have been applied to other tasks such as image/video captioning or sentence entailment.




In this talk, we will explain the neural MT architecture from its basis to the recent proposals. We will review Google and Systran’s systems in production as well as top systems presented in the popular international evaluation campaign of WMT showing the good performance of the system compared to state-of-the-art statistical/rule-based MT systems.  







Marta R. Costa-jussà is a Telecommunication Engineer by the Universitat Politècnica de Catalunya (UPC, Barcelona). She received her PhD from the UPC in 2008. Her research experience is mainly in Machine Translation. She has worked at LIMSI-CNRS (Paris), Barcelona Media Innovation Center (Barcelona), Universidade de São Paulo, Institute for Infocomm Research (Singapore) and Instituto Politécnico Nacional (Mexico). Her research experience/results include: participation in 17 research projects; publication of over 100 papers in international journals/conferences; cooperation with more than 5 companies as scientific consultant; organization of 12 workshops/conferences in the area.Currently, she is a Ramón y Cajal Research Fellow at UPC and she is leading the DeepVoice project.

Deep Dive in Deep Learning with TensorFlow



7:00 - Doors open. Welcome. Networking. Beer.

7:15 - Convolutional Neural Networks for NLP.

7:45 - Q&A break.

8:00 - Introduction in Generative Adversarial Networks.

8:30 - Q&A break

8:45 - Networking.

Convolutional Neural Networks for NLP.

Convolutional Neural Networks have proven very effective in classification tasks. Initially created for computer vision for image recognition and classification, they were adopted also in natural language processing (NLP). We will make a short introduction in Convolutional Neural Networks and will explain how they apply to NLP problems.


Carlos Segura has a Telecommunications Engineering background and received his PhD from the UPC in 2011 in multimedia signal processing. He has been doing research in Deep Learning for the last 3 years applied to computer vision, speech processing and more lately in natural language processing and dialog systems. He currently works at Telefónica I+D as an associate researcher.

Silvia Necsulescu, passionate about algorithms and foreign languages, got a PhD in Natural Language Processing from UPF Barcelona addressing the automatic extraction of semantically related words. She works as an NLP Scientist on problems about automatic text classification.

Introduction to Generative Adversarial Networks.

Generative Adversarial Networks are a recent type of generative model framed within Deep Learning. They allow us to model very high dimensional distributions, making them very effective to generate novel samples for complex domains. They have been applied successfully in computer vision, where these systems generate images of high quality out of detailed descriptions. Other fields also are adopting this adversarial methodology for its proven effectiveness, like speech processing. In this talk GANs will be introduced, as well as their current applications in different domains. Moreover, there will be a coding example to see how we can construct a GAN to solve a toy example in TensorFlow.


Santiago Pascual, graduated in Telecommunications Engineering in 2016 at Telecom BCN@UPC. He has been working in Deep Learning research for more than two years, and more specifically in speech and language processing with these methodologies. Nonetheless, he also likes working in multimodal technologies, in an end2end fashion whenever it is possible. He is currently a PhD candidate at TALP@UPC, working in architectures for end2end speech processing with deep learning.


  • Join us

    Job Offers and Research Positions

  • Awards

    Scientific awards Best: Publications, Thesis, Demos, ...

  • Seminars

    Framework to promote research related with Natural Language Processing or Speech Processing.

  • Thesis

    TALP PhD dissertation

  • Events

    Information about Courses, Conferences, workshops, talks, ...

  • Call For Papers

    Relevant Conference, workshops or special issues call for papers announcement

  • Relevant Publications
  • Members

    Professors, students and colaborators visiting TALP.

    New members incorporation and former members carreer.

  • Projects

    TALP research and innovation projects

  • Press

    Selected articles and reports about the TALP Research Center taken from the international press

  • Formation

Additional information