Spanish Speech Recognition Resources

This database comprises telephone recordings from 1000 speakers recorded directly over the fixed PSTN using two analogue lines. Signals were sampled at 8 KHz and mu-law encoded without automatic gain control.

 
 

The SALA Spanish Venezuelan Database for Fixed Telephone Network was recorded within the scope of the SALA project and supported by the Spanish Government. The design of the corpus and the collection was performed at the Universidad de los Andes (ULA), Mérida Venezuela. Transcription and formatting was performed at the Technical University of Catalonia (UPC), Spain. The owner of the database is the Technical University of Catalonia (UPC), Spain.

 

Database contents definition

 

Each call within the database will consist of a total of 44 items. All items are read, unless marked as spontaneous. The following table shows the contents and corpus codes of the SALA Spanish Venezuelan Database for fixed telephone network.

 
 

taula

Speaker demographic information

 

Spanish as spoken in Venezuela can be roughly divided in low lands belonging to the coast areas and high lands in the Andes area. The low land can also be divided in: Central, Zuliana, Llanos and Sud-oriental. High land dialect is spoken in the Andes Mountains area.

 

mapa

 

Next table shows the dialectal regions, description, population in millions of people, and number of calls received from each region.

 

 

 

 

Region

 Description

population

RECORDINGS

central

 

Distrito Federal, Miranda, Carabobo, Aragua, Lara, Yaracuy, Falcón

7.2

203

zuliana

Zulia

1.7

211

llanos

Portuguesa, Guárico, Cojedes, Apure, Barinas

3.0

180

sud_oriental

Sucre, Nueva España, Monagas, Anzoátegui, Delta Amacuro, Bolívar y Amazonas

2.6

196

Andes

Tachira Mérida, Trujillo

1.6

210

 
 


Recording site and platform

 
 

The final database will contain telephone recordings from 1000 speakers recorded directly over the fixed PSTN using two analogue lines. Signals were sampled at 8 KHz and mu-law encoded without automatic gain control.

 

Recordings took place at the Universidad de los Andes, Mérida, Venezuela. The main characteristics of the recording platform are:

 

 
 

Interface:

Analogue

Board:

Dialogic Proline 2V.

Computer:

Pentium III PC at 450 MHz, 128 MB RAM. 13 GBytes IDE Hard disk.

DOS:

Windows 98.

Programming Interface:

Dialogic System Software

Software:

 Application Software written in C (UPC ADA-D program)

Lines:

2

 

Transcription

 

The transcription included in this database is an orthographic, lexical transcription with a few details that represent audible acoustic events (speech and non speech) present in the corresponding waveform files. The extra marks contained in the transcription aid in interpreting the text form of the utterance. Transcriptions were made in two passes: one pass in which words are transcribed, and a second pass in which the additional details are added.

 
 

Non-Speech Acoustic Events have been arranged into 4 categories  and transcribed. The first two categories of acoustic events originate from the speaker, and the other two categories originate from another source. The 4 categories are:

 
 

[fil]: Filled pause. These sounds can well be modeled in a filled pause model in speech recognisers. Examples of filled pauses: uh, um, er, ah, mm.

 
 

[spk]: Speaker noise. All kinds of sounds and noises made by the calling speaker that are not part of the prompted text, e.g. lip smack, cough, grunt, throat clear, tongue click, loud breath, laugh, loud sigh.

 
 

[sta]: Stationary noise. This category contains background noise that is not intermittent and has a more or less stable amplitude spectrum. Examples: car noise, road noise, channel noise, GSM noise, voice babble (cocktail-party noise), public place background noise, street noise.

 
 

[int]: Intermittent noise. This category contains noises of an intermittent nature. These noises typically occur only once (like a door slam), or have pauses between them (like phone ringing), or change their color over time (like music). Examples: music, background speech, baby crying, phone ringing, door slam, door bell, paper rustle, cross talk.

 

The Lexicon

 

The lexicon is an alphabetically ordered list of distinct lexical items (essentially words in our case) which occur in the corpus with the corresponding pronunciation information. Each distinct word has a separate entry. As the lexicon is derived from the corpus it uses the same alphabetic encoding for special and accented characters as used in the transcriptions (ISO-8859). ISO 8859 Latin 1.

 
 

We have included a frequency count for each entry in the lexicon e.g. to help indicate rare words whose transcriptions are perhaps less important or reliable.

 
 

The pronunciation lexicon was produced after the transcription phase; it contain, alphabetically sorted, all words found in the transcription (one occurrence for each word), their number of occurrences and the list of their phonemic representations. The words appear in the lexicon exactly as they appear in the transcription. The lexicon is case insensitive.

 
 

All the component words have been identified and alphabetically sorted; all fragments, mispronunciations and non speech events have been removed, and only one occurrence of each word have been selected.

 
 

It is important to mention here that the company/agency names very often don't correspond to Spanish words. These names are presented to the speaker as they are written by the company (capital letters without accents, foreign words, acronyms ...). For those names that usually are spoken in a non standard transcription, we have included the common transcription., i.e. Philips: Standard transcription pilips, Common transcription filis

 
 

The lexicon was transcribed automatically. Proper names and company names were checked manually.

 
 

Availability

 

This database is commercially available.

 
 

Information

 

This email address is being protected from spambots. You need JavaScript enabled to view it.

Additional information