Spanish Speech Recognition Resources

The Albayzin Spanish corpus consists of 3 sub-corpora of 16 kHz 16 bits signals, recorded by 304 Castillian speakers.


The 3 sub-corpora are:

  • a phonetic corpus with 6,800 utterances of phonetically balanced sentences, including 1000 with phonetic segmentation;
  • a geographic corpus with 6,800 utterances of sentences extracted from a Spanish geographic database; and
  • a "Lombard" corpus with 2,000 utterances from various corpora

It was produced in 1998 after being designed in 1991-93 by a consortium of 6 Spanish research groups led by the UPC's Speech Processing group that currently belongs to the TALP Center. The corpus is being distributed by ELRA.

For more information about Albayzin, click here

Additional information