Interface

Emotional speech synthesis database

Interface is a Emotional Speech Synthesis Database recorded in a quiet environment. The database is designed for general study of emotional speech and analysis of emotion characteristics for speech synthesis purposes. It contains from 150 to 190 utterances for each of the 6 MPEG4 emotional styles in 4 languages (English, French, Spanish and Slovene). Each language contains utterances from two professional actors (one male and one female speakers). The database is produced and owned by the different speech partners (UMB for English and Slovene, L&H for French, UPC for Spanish) and has been sponsored by the European Commission in the scope of the INTERFACE project.

Recording platforms

Recordings have been made using an electrodynamics microphone AKG 320. Speech signals were first recorded at 32 kHz and down sampled to the required 16kHz samples.

Recording environment

The Spanish database was recorded in a silent room. A wall with a glass window divides the room in two parts. The speaker reads the sentences displayed directly from the PC. To avoid extra noises, the display, PC and recording system were placed in one side of the silent room and the speaker in the other side.

Two operators supervised the recordings at recording time. One of them checked the utterances correspond exactly to the text to be read. The other operator checked the recording system.

DATABASE CONTENTS DEFINITION

Emotional styles definition

To study only the influence of emotions on speech, the same speakers are asked to utter the same speech material in different emotional styles. Basic specifications concern neutral style and the 6 MPEG4 emotions proposed for video analysis: anger, sadness, joy, surprise, disgust and fear.

The Spanish corpus contains 184 different sentences. Corpus includes numbers, words, sentences in affirmative, exclamatory or interrogative forms and paragraphs.

Item identifier	Corpus contents
1 to 100	Affirmative sentences including short and longer ones
101 to 134	Interrogative and (5) stressed sentences.
135 to 150	Paragraphs
151 to 160	Digits
161 to 184	Isolated Words

Transcription

The contents of the speech files were supervised at recording time. If mispronunciations or other deviation from script were detected, the recordings were redone. No mispronunciations are expected. The contents and transcriptions of the complete database are included in the documentation.

Speaker information

Speakers are required to be professional actors (one male one female speaker). No further specifications were given (age/accent). The speakers belong to the "Catalonia" region.

Availability

This database is commercially available.

Information

asuncion.moreno@upc.edu

Search form

You are here