In the field of human-machine interaction, it is becoming increasingly more important that computers adapt to human needs: they should form an integral part of the way humans communicate without requiring exacting efforts by users. This implies a need for multimodal user interfaces that have robust perceptive capacities and that use non-intrusive sensors. The TALP Center is working on a set of acoustic scene analysis systems that have a number of perceptive and cognitive functionalities. To do so, it is researching speech and audio processing technologies that make it possible to identify speakers, recognize speech, localize and separate acoustic sources, detect and classify noise, etc.
A recently built intelligent room in building D5 at the Center is the testing ground for these applications. It is equipped with audio and video equipment and is designed for lecturers to give presentations and seminars. We are working to make advances in the multimodal approach, specifically in the integration of audio and video platforms, which is being done by taking advantage of a collaboration that is already in place with the Image Processing Group from the Department of Signal Theory and Telecommunications at the UPC. This research is currently being undertaken under the auspices of the European framework project CHIL and the CICyT ACESCA project.