Interface is a Emotional Speech Synthesis Database recorded in a quiet environment. The database is designed for general study of emotional speech and analysis of emotion characteristics for speech synthesis purposes. It contains from 150 to 190 utterances for each of the 6 MPEG4 emotional styles in 4 languages (English, French, Spanish and Slovene). Each language contains utterances from two professional actors (one male and one female speakers). The database is produced and owned by the different speech partners (UMB for English and Slovene, L&H for French, UPC for Spanish) and has been sponsored by the European Commission in the scope of the INTERFACE project.
Recordings have been made using an electrodynamics microphone AKG 320. Speech signals were first recorded at 32 kHz and down sampled to the required 16kHz samples.
The Spanish database was recorded in a silent room. A wall with a glass window divides the room in two parts. The speaker reads the sentences displayed directly from the PC. To avoid extra noises, the display, PC and recording system were placed in one side of the silent room and the speaker in the other side.
Two operators supervised the recordings at recording time. One of them checked the utterances correspond exactly to the text to be read. The other operator checked the recording system.
To study only the influence of emotions on speech, the same speakers are asked to utter the same speech material in different emotional styles. Basic specifications concern neutral style and the 6 MPEG4 emotions proposed for video analysis: anger, sadness, joy, surprise, disgust and fear.
The Spanish corpus contains 184 different sentences. Corpus includes numbers, words, sentences in affirmative, exclamatory or interrogative forms and paragraphs.
Item identifier |
Corpus contents |
1 to 100 |
Affirmative sentences including short and longer ones |
101 to 134 |
Interrogative and (5) stressed sentences. |
135 to 150 |
Paragraphs |
151 to 160 |
Digits |
161 to 184 |
Isolated Words |
The contents of the speech files were supervised at recording time. If mispronunciations or other deviation from script were detected, the recordings were redone. No mispronunciations are expected. The contents and transcriptions of the complete database are included in the documentation.
Speakers are required to be professional actors (one male one female speaker). No further specifications were given (age/accent). The speakers belong to the "Catalonia" region.
This database is commercially available.
Copyright © 2017 - Designed by Madstudio