Speaker Recognition Resources

Polycost

The POLYCOST speech database was recorded during January-March 1996 as a common initiative entitled ''Speaker Recognition in Telephony'' within the COST 250 action (http://www.fub.it/cost250). The main purpose of the database is to compare and validate speaker recognition algorithms. The data was collected via international telephone lines, with more than five sessions per speaker, and with English spoken by foreigners.

 


The database contains 1285 calls (around 10 sessions per speaker) recorded by 133 subjects (74 male and 59 female speakers) from 13 different countries. Approximately 10 speakers per country were provided by each partner.
Each session comprises 15 prompts, including one prompt for DTMF detection, 10 prompts with connected digits uttered in English, 2 prompts with sentences uttered in English and 2 prompts in the speaker's mother tongue. One of the prompts in the speaker’s mother tongue consists of free speech.


English:

  • 4 prompts distributed throughout the session in which the speaker pronounces his or her 7-digit client code;
  • 5 prompts distributed throughout the session in which the speaker pronounces a sequence of 10 digits (the same from session to session and from speaker to speaker);
  • 2 prompts in which the speaker pronounces the sentences: ''Joe took father's green shoe bench out'' and ''He eats several light tacos'', as fixed password phrases which are common to all speakers;
  • 1 prompt in which the speaker is supposed to give his or her international phone number.

Mother tongue:

  • 1 prompt in which the speaker gives his or her first name, family name, gender (female/male), town and country;
  • 1 prompt with free speech.

 

 

The database was collected through the European telephone network and was recorded through an ISDN card on XTL SUN platform with an 8 kHz sampling rate. Most of the calls were automatically classified by DTMF detection. Manual classification has been used in the case of no DTMF or wrong DTMF PIN code (circa 10% of the database).
The English prompts are segmented and labelled at the word level (orthographic transcription and word stretches). The prompts in mother tongue are simply labelled (an orthographic transcription will be given). The conventions used for the annotation are those defined within the SpeechDat project.

Character set: ISO-8859-1
Medium: CD-ROMs. The first CD contains speech data from speakers M001-M069, and the second CD contains data from speakers F001-F060 plus M070-M074.
Total size CD1: 636 MB
Total size CD2: 610 MB
File format: A-law, 8 kHz sampling rate, 8 bits/sample, with no file head

Additional information