Emotional speech synthesis database

Interface is a Emotional Speech Synthesis Database recorded in a quiet environment. The database is designed for general study of emotional speech and analysis of emotion characteristics for speech synthesis purposes. It contains from 150 to 190 utterances for each of the 6 MPEG4 emotional styles in 4 languages (English, French, Spanish and Slovene). Each language contains utterances from two professional actors (one male and one female speakers). The database is produced and owned by the different speech partners (UMB for English and Slovene, L&H for French, UPC for Spanish) and has been sponsored by the European Commission in the scope of the INTERFACE project.

 

Recording platforms

 

Recordings have been made using an electrodynamics microphone AKG 320. Speech signals were first recorded at 32 kHz and down sampled to the required 16kHz samples.

 

Recording environment

 

The Spanish database was recorded in a silent room. A wall with a glass window divides the room in two parts. The speaker reads the sentences displayed directly from the PC. To avoid extra noises, the display, PC and recording system were placed in one side of the silent room and the speaker in the other side.

 
 

 Two operators supervised the recordings at recording time. One of them checked the utterances correspond exactly to the text to be read. The other operator checked the recording system.

 

DATABASE CONTENTS DEFINITION

Emotional styles definition

 
 
 

To study only the influence of emotions on speech, the same speakers are asked to utter the same speech material in different emotional styles. Basic specifications concern neutral style and the 6 MPEG4 emotions proposed for video analysis: anger, sadness, joy,  surprise, disgust and fear.

 
 

The Spanish corpus contains 184 different sentences. Corpus includes numbers, words, sentences in affirmative, exclamatory or interrogative forms and paragraphs.

 

 

 
 

Item identifier

Corpus contents

1 to 100

Affirmative sentences including short and longer ones

101 to 134

Interrogative and (5) stressed sentences.

135 to 150

Paragraphs

151 to 160

Digits

161 to 184

Isolated Words

 

Transcription

 

The contents of the speech files were supervised at recording time. If mispronunciations or other deviation from script were detected, the recordings were redone.  No mispronunciations are expected. The contents and transcriptions of the complete database are included in the documentation.

 
 

Speaker information

 
 

Speakers are required to be professional actors (one male one female speaker). No further specifications were given (age/accent). The speakers belong to the "Catalonia" region.

 
 
 

Availability

 

This database is commercially available.

 

Information

 

asuncion.moreno@upc.edu

Scroll to Top