Title
Using Frames to converting texts to ontologies. Solutions can be combined?

Speaker
Alma Delia Cuevas (CIC-IPN and UAEM) México

Room
Omega-S208 Campus Nord - UPC

Date
Wed Nov 27, 2013

Time
11:00h

Abstract

To automatically extract knowledge from natural language documents is an interesting chore, since it obtains information in a simple manner, without the need of human interpretation, which often consumes large amounts of time. But for a computer to “understand” a document is a non-trivial task, since natural language is ambiguous, full of synonyms, idioms, anaphora, word declinations, analogies… which persons solve not only through context, but also with previous knowledge, real world experience and common sense. None of these are salient features of a computer.

To obtain knowledge automatically from any text (prose, poetry, news, event descriptions, text books, cooking recipes, descriptive documents, etc.) and to be able to transform it to a representation which a computer can understand and process, is still far from reality. Nevertheless, progress in this acquisition is performed with the use of natural language processing (NLP), information retrieval and knowledge acquisition tools.

Aware of the problem of trying to interpret all types of text, the scope of SERCDD (System for Extracting and Representing Knowledge from Descriptive Documents) is descriptive texts: documents describing tools, plants, geographic places, etc.

This topic presents an analysis method that starts with text which has suffered a semantic analysis, specifically tagging and lemmatization. Using the structure of the sentences found in the text, analysis proceeds trying to identify the relations present in it. For instance, a text describing a carpenter tool will usually contain its definition, a description, common uses, parts and materials forming it, as well as the classification of such tool. The result is a formal representation of the extracted knowledge, embodied in an ontology written in the OM Language. In order to produce it, it is necessary to identify the entities (concepts), relations and properties described in the original text.

Slides
 

Scroll to Top