Document Level Statistical Machine Translation
Omega-S208 Campus Nord - UPC
Mon Feb 10, 2014
Machine Translation (MT) is one of the earliest problems in Natural Language Processing and Artificial Intelligence, which has gained a lot of attention from the industry and research community in the last decade. There are many kind of MT systems and services depending on their usage, linguistic analysis or architecture. Some of them being used everyday by millions users for a variety of purposes. However, most of the current MT systems are designed in a sentence-level fashion, that is, they translate a document assuming independence among sentences, totally ignoring discourse information. This simplified view has an impact in the quality of the resulting translations, which sometimes show poor cohesion and coherence at a document level. Following the path of some recent works ( Tiedemann,2010; Nagard-Koehn, 2010; Hardmeier et al., 2010; Xiao et al., 2011; Hardmeier et al., 2012;), in this research we aim at studying the translation problem at a document level taking into account cohesion and coherence aspects for improving statistical MT quality. Several phenomena will be studied paying special attention to: lexical semantic and topic cohesion, coreference, agreement, discourse structure, etc. In a complementary direction we will study how to take into account the document level aspects of quality in the current automatic MT evaluation measures.