Polish -English Statistical Machine Translation of Medical Texts

doi:10.48550/arXiv.1509.08909

Polish -English Statistical Machine Translation of Medical Texts

This new research explores the effects of various training methods on a Polish to English Statistical Machine Translation system for medical texts. Various elements of the EMEA parallel text corpora from the OPUS project were used as the basis for training of phrase tables and language models and for development, tuning and testing of the translation system. The BLEU, NIST, METEOR, RIBES and TER metrics have been used to evaluate the effects of various system and data preparations on translation results. Our experiments included systems that used POS tagging, factored phrase models, hierarchical models, syntactic taggers, and many different alignment methods. We also conducted a deep analysis of Polish data as preparatory work for automatic data correction such as true casing and punctuation normalization phase.

Publication:

arXiv e-prints

Pub Date:

September 2015

DOI:

10.48550/arXiv.1509.08909

arXiv:

arXiv:1509.08909

Bibcode:

2015arXiv150908909W

Keywords:

Computer Science - Computation and Language;
Computer Science - Information Retrieval;
Statistics - Machine Learning

E-Print:

New Research in Multimedia and Internet Systems, Springer. 09/2014, ISSN: 1867-5662. arXiv admin note: text overlap with arXiv:1509.08874

NASA/ADS

Polish -English Statistical Machine Translation of Medical Texts

Abstract