Tutorial - Data-driven text simplification
Date: March 13th 15:30 - 17:30
Location: Poblenou Campus, UPF (Roc Boronat 138, Barcelona). Room 55.309
Tutorial presenters: Sanja Štajner, Horacio Saggion
Free registration HERE
Material:
Updated material will be available soon.
This tutorial is an updated version of the tutorial presented at COLING2018.
References:
Sanja Štajner, Horacio Saggion. Data-Driven Text Simplification. Proceedings of the 27th International Conference on Computational Linguistics.
Abstract
In this tutorial, we aim to provide an extensive overview of automatic text simplification systems proposed so far, the methods they used and discuss the strengths and shortcomings of each of them, providing direct comparison of their outputs. We aim to break some common misconceptions about what text simplification is and what it is not, and how much it has in common with text summarisation and machine translation. We believe that deeper understanding of initial motivations, and an in-depth analysis of existing TS methods would help researchers new to ATS propose even better systems, bringing fresh ideas from other related NLP areas. We will describe and explain all the most influential methods used for automatic simplification of texts so far, with the emphasis on their strengths and weaknesses noticed in a direct comparison of systems outputs. We will present all the existing resources for TS for various languages, including parallel manually produced TS corpora, comparable automatically aligned TS corpora, paraphrase- and synonym- resources, TS-specific sentence-alignment tools, and several TS evaluation resources. Finally, we will discuss the existing evaluation methodologies for TS, and necessary conditions for using each of them.
Topics covered in the tutorial:
Motivation for ATS:
Problems for various NLP tools and applications
Reading difficulties of various target populations
TS projects:
Short description of TS projects (PSET, Simplext, PorSimples, FIRST, SIMPATICO)
Discussion about the TS projects (what they share and in what they differ)
TS resources:
Resources for lexical simplification
Resources for lexico-syntactic simplification
Resources for languages other than English
Evaluation of TS systems:
Automatic evaluation
Human evaluation
Comparison of non-neural TS approaches:
Rule-based systems
Data-driven systems (supervised and unsupervised)
Hybrid systems
Semantically-motivated ATS systems
Neural text simplification (NTS) systems:
State-of-the-art neural text simplification (NTS) systems
Direct comparison of NTS systems
Strengths and weaknesses of NTS systems
NTS systems vs. previously proposed (non-neural) ATS systems (direct comparison)
Current challenges in ATS