Machine Translation Tools

MARIE - Ngram-based Statistical Machine Translation Decoder

MARIE consists of an Ngram-based statistical machine translation decoder, which aims at being helpfull to the research community in the field of Statistical Machine Translation. It has been developed at the TALP Research Center of the Universitat Politècnica de Catalunya (UPC) by Josep M. Crego as part of his PhD thesis, with the aid of Adrià de Gispert and under the advice of professor José B. Mariño.

Description

Statistical machine translation can be performed using the MARIE decoder when supplied at least a translation model. It was specially design to deal with tuples (bilingual translation units) and a translation model learnt as a typical Ngram language model (Ngram-based SMT), despite of this, MARIE can use phrases (bilingual translation units) and behave as a typical phrase-based decoder (phrase-based SMT).

In order to perform better translations, the decoder can make use of a target language model, a reordering model, a word penalty and any additional translation models, all introduced in the search following a log-linear combination of models.

Tools for building language models are freely available (we recommend the SRI Language Modeling Toolkit). Methods to learn translation models can be found after a brief look at current research papers on SMT.

The decoder is released with a manual which describes its usage and inner workings. Details of the decoder have also been presented in the next international conference (reference MARIEdecoder citing this paper):

How to download

MARIE can be downloaded free of charge under the GNU General Public License.
 
  • MARIE package Version 1.1 (13/12/05) download (305Kb) binary (1.2Mb)

    .- fixed a bug appearing when sorting lists whithin the same group.
    .- Implemented the output word graph format.
    .- burst of 1gram NULLs penalty model (-lNN)
    .- high (>=3) Ngram BM bonus model (-l3gr)
  • MARIE package Version 1.3.1 (24/03/06) binary (1.3Mb)

    .- fixed a bug appearing when sorting lists whithin the same group.
    .- format/units output using (-format -units) in STDOUT and outfile.UNITS
    .- most efficient reading of input files
    .- target Tags using (-fTTM, -ltt, -ftags)
    .- verbose output file in outfile.VERBOSE
    .- added (-ln) units bonus model
    .- input reordering graph (-ingraph)
    .- input can be read from STDIN
    .- to optimize decoding, ngrams are cached (-cache)
    .- output models cost in outfile.UNITS for each tuple (-unitscost)
    .- reading models first than input files (usefull in client/server mode).
    .- more detailed help (-h) option.
  • MARIE package Version 1.3.6 (21/06/07) binary (1.3Mb)

To unpackage just type (under linux OS): tar xvzf marie-vX.Y.Z.tgz. 21 files will appear:
  • marie.cpp tables.cpp tables.h arpa.cpp arpa.h params.cpp params.h makefile ClientSocket.cpp ServerSocket.cpp Socket.cpp Socket.h ClientSocket.h ServerSocket.h SocketException.h (source code)
  • extract-tuples (the binary of a tuple extraction algorithm given a word-to-word alignment in giza++ output format)
  • marie-vX.Y.Z-manualspec.pdf (the manual in pdf)
  • toyTM toyBM toyBM.additional toyTEST (a toy example)

Acknowledgements

This work has been partially supported by the Spanish government, under grant TIC-2002-04447-C02 (Aliado Project), the European Union, under FP6-506738 grant (TC-STAR project) and and the Universitat Politecnica de Catalunya (UPC-RECERCA grant).

We would also like to thank the rest of members of the SMT group in the Signal Theory and Communications Department of the UPC for their comments, suggestions and contributions in the development and testing work: Patrick Lambert, Rafael Banchs, Marta Ruiz and José A. R. Fonollosa.

Send your comments and suggestions to Josep M. Crego.

Additional information