ASR - Audio and Speech Recognition

  • Audio and speech recognition and acoustic event detection & localization

    Research with learning-from-data statistical methods for speech recognition started in the group already in the late eighties, it was continued and enlarged in the nineties for both speech and speaker recognition, and it got a strong increase in the first decade of the new century when a research line on machine translation was started in the group, and the speech synthesis work shifted to statistical machine learning techniques. In the last years most research areas in the group have incorporated deep learning with neural nets, so deep neural networks (DNN) have become a kind of backbone for our current and future research activities, similarly to what has occurred to a large portion of the main research groups from our area.

    The goals in speech recognition are the development of new architectures based on end-to-end deep learning techniques, as an alternative to traditional HMM-based speech recognition systems, and the generation of new joint training procedures for the acoustic and language model, and more powerful language models based on recursive DNNs.


    In speech recognition, the following lines of research have been undertaken:

    • Large vocabulary systems
    • Generation of confidence measures
    • Multi-dialect and multilingual recognition systems
    • Robust systems that combat noise and environmental variations
    • Integration in multimedia environments
    • Speech recognition in rooms with microphone arrays
    • Language Modeling
    • Integration in speech translation systems


    One of the ultimate goals of this research is to improve the performance of large vocabulary automatic speech recognition systems to obtain high quality speech-to-speech multilingual translation systems. A specific application would be the translation of parliamentary speeches.

    Another main goal is acoustic scene analysis, aiming to describe the sequences of entities that are conveyed by the acoustic signals produced in a given environment, and to determine the time positions of those entities as well as the spatial locations of their sources. Currently, the work is focused to several research topics: acoustic event detection in a neonatal intensive care unit, marine mammal localization with hydrophone arrays, and high-level audio description for hearing-impaired visual interpretation.
Scroll to Top