Probabilistic Inference for Weakly-Supervised Entity-Relation
Learning Word Embeddings for Language Modelling
Audi Primadhanty and Pranava Swaroop Madhyastha
Omega-S208 Campus Nord - UPC
Fri Feb 07, 2014
12:00h - 14:00h
Abstract for Probabilistic Inference for Weakly-Supervised Entity-Relation
We investigate the task of extracting entities and relations from text documents given only a few examples of desired entities and relations. The task is relevant for information extraction in new, open domains where the availability of annotated corpus is negligible or expensive to obtain. We begin with the task of named entity classification by proposing a probabilistic generative model that uses hidden states. the purpose of hidden states is to capture commonalities of the contexts in which entities of different types appear. Our hope is that this model will have improved robustness when it comes to recognize unseen entities. Our aim is to further extend such techniques for extracting relations in any domain for specific target entities and relations in a large unlabeled corpus, requiring only few examples for each entity and relation type.
Abstract for Learning Word Embeddings for Language Modelling
In Natural Language Processing, state-of-the-art systems for tasks such as parsing, semantic role labeling, word-sense disambiguation, etc. make use of lexical features. Most of these systems are trained using annotated corpus, which are used to gather statistics about each lexical item and its linguistic relations. However, even for large annotated corpora, it is unlikely to observe each lexical item in the context of all its possible relations. In this setting, one would like to exploit a notion of word similarity, and assume that similar words have similar behaviour. The focus of this thesis proposal is to formulate statistical models that improve performance on linguistic prediction tasks by making use of distributional word space representations. In particular, we are interested in designing computationally efficient and robust learning algorithms for lexical embeddings that use a combination of both supervised training methods and unsupervised training methods that use a large text corpus to induce a distributional representation. We present preliminary experiments to infer usefulness and proof of concept of the proposed approach.