Our initial hypothesis is based on the idea that robust and advanced NLP technology can extract what happened to whom, when and where, removing duplication, complementing information, registering inconsistencies and keeping track of the original sources. Any new information can be integrated with the past, distinguishing the new from the old and unfolding story lines in a similar way as people tend to remember the past and access knowledge and information. The difference being that this project can provide access to all original sources and will not forget any details. We will develop tools that allows professional decision-makers to explore these story lines using visual interfaces and interactions to exploit their explanatory power and their systematic structural implications. Likewise, SKATeR can make predictions from the past on future events or explain new events and developments through the past.
The combined expertise brought forward by SKATeR will enable to tackle shortcomings of current machine reading systems. The developed technology will be available to interested companies, thus strengthening the software industry in this area.
Project KNOW2 (TIN2009-14715-C04-01) already enhanced Knowledge Mining technology with the tight collaboration of EHU, UB-UOC and UPC research groups. As a result, KNOW2 provided new forms of multilingual information access (MLIA) combining the last advances in text mining, knowledge acquisition, natural language processing and semantic interpretation. KNOW2 also developed an integrated environment allowing the acquisition of knowledge from specific domains. In addition we are bringing together additional new expertise from UPV-IULA (terminologyi and domain knowledge), UPF-TALN (Sumarization) and UV (now also covering Galician language).
The research groups of SKATeR are all well-known international players in Language Technology, and specially on broad coverage Natural Language Processing. We have a long and successful profile collaborating and developing together large-scale language resources. For instance, our research groups have been jointly involved in the construction and enrichment of wordnets for Spanish,
Basque and Catalan within several national (ITEM, HERMES, SENSEM, VOLEM2, KNOW, TEXT-MESS, KNOW2) and European research projects (ACQUILEX, ACQUILEX-II, EuroWordNet, MEANING, KYOTO, PANACEA, PATHS, X-LIKE). The six involved groups have a strong motivation in developing technologies to enhance the usability of wordnets and using them in real applications.
All our research groups are involved with Parsing, Word Sense Disambiguation and Semantic Role Labeling as demonstrated by our active participation and organization at the SEMEVAL and CoNLL (we organized the last two editions of the CoNLL shared task on Semantic Role Labeling and also the Spanish, Catalan and Basque tasks of the last three editions of SENSEVAL). Regarding advanced NLP applications, we have organized the last editions of the CLEF Robust-WSD, QAst and QA competitions, where we participated with success. We have also participated in recent TREC (Text Retrieval Conference), TAC (Text Analysis Conference), WAC (Web as Corpus) and CLEANEVAL (Cleaning web corpora).
Although there is a broad overlap in their interests, partners have addressed these problems from different, even if complementary, points of view, making the SKATeR collaboration a unique opportunity to produce significant scientific impulse.
SKATeR will develop machine reading systems to provide deep semantic capabilities to large quantities of multilingual data.