TweetMT 2015 -- Tweet Translation Workshop at SEPLN 2015

 

TweetMT is a workshop and shared task on machine translation applied to tweets. It will take place in September, 2015, in Alicante, co-located with SEPLN 2015 (to be confirmed). The objective of the task is to bring together interested researchers to join forces to experiment with and compare different approaches to tweet MT. This workshop is a follow-up to two other workshops organized previously also at SEPLN: TweetNorm2013 and TweetLID2014.

 

The machine translation of tweets is a complex task that greatly depends on the type of data we work with. The translation process of tweets is very different from that of correct texts posted for instance through a content manager. Tweets are often written from mobile devices, which exacerbates the poor quality of the spelling, and include errors, symbols and diacritics. The texts also vary in terms of structure, where the latter include tweet-specific features such as hashtags, user mentions, and retweets, among others. The translation of tweets can be tackled as a direct translation (tweet-to-tweet) or as an indirect translation (tweet normalization to standard text (Kaufmann&Kalita, 2011), text translation and, if needed, tweet generation). Although the first approach looks attractive, the lack of parallel or comparable tweets for the working languages (Petrovic et al., 2010) tends to lead us towards an indirect approach. Some authors also try to gather similar tweets in other languages (CLIR).

 

Work in this area is scarce in the literature but a growing interest is evident (Gotti et al., 2013). An important point of reference is the work done to translate SMS texts during the Haiti earthquake (Munro, 2010).

 

The current task will focus on MT of tweets between languages of the Iberian Peninsula (Basque, Catalan, Galician, Portuguese and Spanish), as well as English. The organizing committee will release development data including parallel tweets that will enable participants to train their systems. For the final evaluation participants will have to submit the automatic translation of a number of tweet corpora in a short period of time. The evaluation will be carried out using automatic distances to the reference corpora.

 

These corpora are not meant to be representative of all types of messages that can be observed in informal communication. This is instead an initial attempt at tackling part of the task which starts by addressing one of its simplest parts. We are planing on using more informal and varied corpora in future tasks as we make progress on these initial issues.

 

The workshop aims to be a forum where researchers will have a chance to compare their methods, systems and results.

 

Important dates

 

  • March 1: Registration opened
  • April 17: Release of the development-set
  • May 12: Registration deadline
  • May 19: Release of the test-set
  • May 21: Result submission deadline
  • May 22-June 12: Manual evaluation. Publication of results
  • July 3: Short paper submission deadline
  • July 31: Papers’ camera ready version
  • September 14 or 15: Workshop

 

Organizing Committee

Iñaki Alegria (UPV/EHU)
Nora Aranberri (UPV/EHU)
Cristina España-Bonet (UPC)
Pablo Gamallo (USC)
Eva Martínez (UPC)
Hugo Oliveira (Universidade de Coimbra)
Iñaki San Vicente (Elhuyar)
Antonio Toral (DCU, Dublin)
Arkaitz Zubiaga (University of Warwick)

Proceedings

The papers of the workshop will be published In the proceedings of “XXXI Congreso de la Sociedad Española de Procesamiento de lenguaje natural”.
Proceedings of the workshop will be also published in the CEUR Workshop Proceedings digital publication service.

 

13/03/2015 - Tatyana Polyakova PhD dissertation

 

Títol de la tesi:

“GRAPHEME-TO-PHONEME CONVERSION IN THE ERA OF GLOBALIZATION”

 

Autora:

Sra. Tatyana Polyakova

 

Director:

Dr. Antonio Bonafonte

 

Data : 13 de març de 2015

 

Hora:  12:00 hores

 

Lloc:  Sala Telensenyament– Edifici B3 – Campus Nord

05/02/2015 SKATER 2nd Workshop at Barcelona

SKATER General meeting to present the research carried out in the framework of the national project:

http://nlp.lsi.upc.edu/skater/

 

Place A6203 (Aulario 6, Campus Nord, UPC)

 

AGENDA:

 

February 5th

10 – 10:30 Horacio Presentación de las jornadas

10:30 – 10:50 Luís Taxonomizing flat terminologies

10:50 – 11:10 Marta Utilización de modelos de usuario para generar preguntas y respuestas personalizadas en inglés, español y catalán

11:10 – 11:30 German Cross-lingual Event Detection

11:30 – 11:50 café

11:50 – 12:10 Zagros Semanticizing temporal expressions from Unstructured Contents

12:10 – 12:30 Eli Automatic Machine Translation Evaluation: a Qualitative Approach

12:30 – 12:50 Francesco Barbieri Figurative Language (SemEval)

12:50 – 13:10 Jordi DIP: Anonimizador de documentos clínicos bilíngües cas/cat

13:10 – 13:30 Canan Semiautomatic construction of a domain ontology from standard ISA88

13:30 – 15:00 comida

15:00 – 15:20 Lluís TextServer: Cloud-based language processing.

15:20 – 15:40 Jorge Scaling-up term acquisition

15:40 – 16:00 Egoitz Laparra Implicit SRL

16:00 – 16:20 Zuhaitz Big Data and NLP

16:20 – 16:40 Marina Lloberas PARTES: Test Suite for Parsing Evaluation

16:40 – 17:00 Xavier Arregui Ber2Tek demos

17:00 – 17:20 café

 

17:20 – 17:40 Xavier Adquisición de conocimiento terminológico en corpus a través de las relaciones de WordNet

17:40 – 18:00 Rodrigo Agerri Semi-supervised Domain adaptation

18:00 – 18:20 Daniel Ferrès Georeferencing Formal and Informal Documents with Knowledge Bases and Language Models

18:20 - 18:40 Montse Marimon demo + descripcion del simplificador definition extraction o taxonomy learing sentiment analysis

 

21:30 Cena

http://www.restaurantcalboter.com/


February 6th
9:30 – 11:30 Planificación del trabajo a realizar en el 2015 (por WP)
11:30 – 11:50 café
11:50 – 14:00 Discusión sobre la continuidad del proyecto en la siguiente convocatoria

Asistentes
UB
- Elisabeth Comellas
- Marina Lloberes
- Marta Coll-Florit
- Lara Gil
- Salvador Climent
- Antoni Oliver
- Irene Castellón

UPC
- Jordi Turmo
- Lluís Padró
- Marta Gatius
- Carme Martín
- Zagros Ardalan
- Javi Farreras
- Alicia
- Maria Fuentes
- Horacio Rodríguez

- Canan Dombayc

- Antonio Espuña
- Moisès Graells

UPF – IULA
- Jorge Vivaldi
- Núria Bel

EHU
- Zuhaitz Beloki
- Egoitz Laparra
- Xavier Arregi
- German Rigau

- Rodrigo Agerri

Uvigo
- Xavier Gómez

UPF – TALN
- Luis Espinosa
- Daniel Ferres
- Montserrat Marimon
- Francesco Barbieri
- Horacio Saggion

 

Subcategories

  • Join us

    Job Offers and Research Positions

  • Awards

    Scientific awards Best: Publications, Thesis, Demos, ...

  • Seminars

    Framework to promote research related with Natural Language Processing or Speech Processing.

  • Thesis

    TALP PhD dissertation

  • Events

    Information about Courses, Conferences, workshops, talks, ...

  • Call For Papers

    Relevant Conference, workshops or special issues call for papers announcement

  • Relevant Publications
  • Members

    Professors, students and colaborators visiting TALP.

    New members incorporation and former members carreer.

  • Projects

    TALP research and innovation projects

  • Press

    Selected articles and reports about the TALP Research Center taken from the international press

  • Formation

Additional information