Textual Processing Tools

FreeLing - Open source suite of Language Analyzers.

Authors:
 TALP
 
 
Description:
 The FreeLing package is a library providing language analysis services. FreeLing is designed to be used as an external library from any application requiring this kind of services. This language analysis tool suite is released under the GNU General Public License of the Free Software Foundation.

 

Functionality:
(Main services offered by FreeLing library)

 

  • Text tokenization
  • Sentence splitting
  • Morphological analysis
  • Suffix treatment, retokenization of clitic pronouns
  • Flexible multiword recognition
  • Contraction splitting
  • Probabilistic prediction of unkown word categories
  • Named entity detection
  • Recognition of dates, numbers, ratios, currency, and physical magnitudes (speed, weight, temperature, density, etc.)
  • PoS tagging
  • Chart-based shallow parsing
  • Named entity classification
  • WordNet based sense annotation
  • Rule-based dependency parsing
Most of these services are provided for all currently supported languages: Spanish, Catalan, Galician, Italian, and English.



Technology:
 C++



Technical Requirements:

 

  • A typical Linux box with usual development tools: bash, make, and a C++ compiler with basic STL support.
  • Enough hard disk space (about 120Mb)
  • Some external libraries are required to compile FreeLing:
              - libpcre (version 4.3 or higher): Perl C Regular Expressions. Included in most usual Linux distributions. You'll need binary and development packages.
              - libdb (version 4.1.25 or higher): Berkeley DB. Included in all usual Linux distributions.
              - libcfg+ (version 0.6.1 or higher): Configuration file and command-line options management. May not be in your linux distribution.
              - Omlet & Fries (libomlet v.0.97 or later, libfries v.0.95 or later): Machine Learning utility libraries, used by Named Entity Classifier. Installation scrips are not very clever yet, so these libraries are required even if you do not plan to use the NEC ability of FreeLing. Available from http://www.lsi.upc.edu/~nlp/omlet+fries

Modules:
The main processing classes in the library are:

 

  • tokenizer: Receives plain text and returns a list of word objects.
  • splitter: Receives a list of word objects and returns a list of sentence objects.
  • maco: Receives a list of sentence objects and morphologically annotates each word object in the given sentences. Includes specific submodules (e.g, detection of date, number, multiwords, etc.) which can be activated at will.
  • tagger: Receives a list of sentence objects and disambiguates the PoS of each word object in the given sentences.
  • parser: Receives a list of sentence objects and associates to each of them a parse_tree object.
  • dependency: Receives a list of parsed sentence objects associates to each of them a dep_tree object.

Innovation:



Development:
 FreeLing was originally written by people in TALP Research Center at Universitat Politècnica de Catalunya. Spanish and Catalan linguistic data were originally developed by people in CLiC, Centre de Llenguatge i Computació at Universitat de Barcelona. Many people further contributed to it by reporting problems, suggesting various improvements, submitting actual code or extending linguistic databases (see web page).



Publications:

 

  • Jordi Atserias and Bernardino Casas and Elisabet Comelles and Meritxell Gonzàlez and Lluís Padró and Muntsa Padró. FreeLing 1.3: Syntactic and semantic services in an open-source NLP library. Proceedings of the fifth international conference on Language Resources and Evaluation (LREC 2006), ELRA. Genoa, Italy. May, 2006.

 

  • Jordi Atserias and Elisabet Comelles and Aingeru Mayor. TXALA un analizador libre de dependencias para el castellano. Procesamiento del Lenguaje Natural, n. 35, pg. 455--456. September, 2005.

 

  • Jordi Atserias and Josep Carmona and Irene Castellón and Sergi Cervell and Montserrat Civit and Lluís Màrquez and Ma Antònia Martí and Lluís Padró and Roberto Placer and Horacio Rodríguez and Mariona Taulé and Jordi Turmo. Morphosyntactic Analysis and Parsing of Unrestricted Spanish Text. Proceedings of the 1st International Conference on Language Resources and Evaluation, LREC, pg. 1267--1274. Granada, Spain. May, 1998.

 

  • Josep Carmona and Sergi Cervell and Lluís Màrquez and Ma Antònia Martí and Lluís Padró and Roberto Placer and Horacio Rodríguez and Mariona Taulé and Jordi Turmo. An Environment for Morphosyntactic Processing of Unrestricted Spanish Text. Proceedings of the 1st International Conference on Language Resources and Evaluation, LREC, pg. 915--922. Granada, Spain. May, 1998.

 

  • Xavier Carreras and Lluís Padró. A Flexible Distributed Architecture for Natural Language Analyzers. Proceedings of the 3rd International Conference on Language Resources and Evaluation, LREC, Las Palmas de Gran Canaria, Spain. 2002.

 

  • Xavier Carreras and Isaac Chao and Lluís Padró and Muntsa Padró. FreeLing: An Open-Source Suite of Language Analyzers. Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC'04), 2004.

Contact: This email address is being protected from spambots. You need JavaScript enabled to view it.

Additional information