Information retrieval (IR), both of texts and multimedia resources, is an important part of the processes of collection indexing and document or passage retrieval (based on previously indexed collections or by means of Internet wrappers).

The TALP Center is actively involved in question answering (Q&A) tasks. As a result of its work, the group has a multilingual question answering system, which it has entered in the 2003 and 2004 TREC competitions – in the open domain category for English – and in the 2003 and 2004 CLEF competition – also in the open domain category but for Spanish. It has also designed a geography demonstrator in Spanish for the ALIADO project for a restricted domain environment and took part in the first GEOCLEF competition in 2005 for the same domain in English. In addition, it is now trying to extend the capacity of its current Q&A system so that it is able to handle oral questions about facts, lists, definitions, information and biographies, and it is also endeavoring to extend the system’s multilingual capacities to Catalan.

The automatic production of summaries is also tackled at various levels: monolingual, multilingual and cross-lingual summaries; mono- and multi-document summaries; text and speech summaries; extract and abstract summaries; general summaries; and guided summaries based on the questions, profiles or interests of users.

Document analysis involves recognizing and extracting written text and pre-processing it (lexical and sentence segmentation, morpho-syntactic analysis and disambiguation, the detection and classification of noun phrases, superficial and deep syntactic analysis, semantic analysis, resolution of cross-references, etc.).

 

The tasks that make up this line of research are:

 
 
 
 
 
  • Classification of documents and passagesClustering of documents
  • Clustering of documents
  • Detection of subject matter in documents and collections
  • Detection of links to and in documents
  • Measurement of distances (semantic or distributional) between language units, etc.
Scroll to Top