The development of technologies able to automatically recognize speakers through their voices has been the subject of growing interest over the past few years due to its numerous applications: access control, financial and commercial operations, the audio indexing of meetings and radio and television programs, and police investigations. This field of research involves identifying or checking the identity of speakers on the one hand, and determining the separation boundaries in a signal between various speakers (speaker segmentation) on the other hand.
The TALP Center is basically devoted to the following lines of research
For speaker recognition, some research work was recently initiated using DNN to discriminatively model target speakers using either i-vectors or feature vectors for target and impostor. Two main contributions to make DNNs efficient were proposed, namely impostor selection and network adaptation. The proposed system showed a very good performance on the international NIST SRE i-vector challenge. Based on a recursive DNN (LSTM), a system for speaker segmentation using both acoustic and language modelling was also developed. This technology has been transferred for automatic annotation in a real call-center. In the next three years the main objective will be to enhance the previously developed deep speaker recognition system using more sophisticated deep learning techniques, working with different levels of speech features, or even directly from the raw speech signals, and proposing new impostor selection algorithms according to the structure of the background and training data.