Multilingual Lexicons

WRPA - Paraphrasing Corpora

Marta Vila, Maria Antonia Martí and Horacio Rodríguez


WRPA stands for “Relational Paraphrase Acquisition from Wikipedia” corpus. The WRPA corpus contains relational paraphrases extracted by the WRPA system from Wikipedia [1]. WRPA contains several sub-corpora:

in English:

WRPA-person is composed of a group of 362 paraphrases expressing the person-date_of_birth relation, 449 paraphrases expressing the person-date of death relation and 965 paraphrases expressing the person-place_of_birth relation.

WRPA-person-2 is composed of a group of 55 paraphrases expressing the person-alternate_name relation, 40 paraphrases for person-charge, 54 for person-child, 238 for person-residence, 233 for person-employee_of, 375 for person-member_of, 555 for person-origin, 40 for person-parent, 62 for person-religion, 94 for person-school_attended, 413 for person-spouse and 532 for person-title.

in Spanish:

WRPA-authorship is composed of 81,101 pairs of paraphrases expressing the authorship relation.

WRPA-authorship-A is composed of 1,000 paraphrase pairs from WRPA-authorship manually annotated with the paraphrase phenomena they contain.

For further reading on the corpus, refer to the README.txt file in the corresponding download package and [1].






Technical Requirements:








[1] M. Vila, M. Antònia Martí and Horacio Rodríguez. Relational paraphrase acquisition from Wikipedia. The WRPA method and corpus (submitted).

This research work is carried out in the framework of the following projects and grants:

  • TEXT-Knowledge 2.0. TIN2009-13391-C04-04
  • KNOW2. TIN2009-14715-C04-04
  • FPU AP2008-02185
Contact: This email address is being protected from spambots. You need JavaScript enabled to view it.

Additional information