Textual Processing Tools

BIOS - A suite of syntactico-semantic analyzers for English. Includes "smart" tokenization, POS tagging (based on TNT), chunking, and Named Entity Recognition and Classification (Java, Linux).

Authors:
 Mihai Surdeanu

References:
 http://www.surdeanu.info/mihai/bios/

Description:
 Suite of Syntactico-Semantic Analyzers. Includes a named-entity recognizer, a syntactic chunker, a POS tagger, and a "smart" tokenizer. All processors are learned using the MiLL machine learning library (see below).

Functionality:

Technology:
 Java

Technical Requirements:
 MiLL machine learning library, TnT tagger, YamChA.

Modules:

  • Smart tokenizer that recognizes abbreviations, SGML tags etc.

  • Part-of-speech (POS) tagger. The POS tagger is implemented as a a wrapper around the TNT tagger by Thorsten Brants.

  • Syntactic chunking using the labels promoted by the CoNLL chunking evaluations.

  • Named-Entity Recognition and Classification (NERC) for the CoNLL entity types plus an additional 11 numerical entity types.

Innovation:

Development:

Publications:

Contact: This email address is being protected from spambots. You need JavaScript enabled to view it.

Additional information