Abstract
Tools assisting professional translators memorise translated sentences but provide limited functionality for the identification of terms and their translation equivalents in translated texts. In this paper, we propose a word alignment approach aiming to improve efficiency and usability of this functionality, through identification of cognates and exploitation of linguistic knowledge, such as lemmas in bilingual glossaries and dictionaries. Our approach focuses on content words, is applicable to parallel texts of various sizes, and minimises the need for user parameter tuning and preprocessing steps. The method, implemented as the FragmALex system, tackles certain types of divergences between source and target text by creating and grouping links between fragments (word parts, words and word groups). The system output consists of fragment links in their original context. We performed a case study of Dutch and French medical articles, using a medical glossary and a general-purpose dictionary of restricted size. C omparison of the output with a gold standard shows that the addition of the dictionary to the system accounts for a higher increase in recall (completeness of alignment) than the addition of the glossary, while the decrease in precision remains low with either resource.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright (c) 2008 Tom Vanallemeersch Cornelia Wermut