|
-ing words in RBMT: multilingual evaluation and exploration of pre- and post-processing solutions |
Aranberri Monasterio, Nora
|
|
|
|
This PhD dissertation falls within the domain of machine translation and it specifically focuses on the machine translation of IT-domain -ing words into four target languages: French, German, Japanese and Spanish. Claimed to be problematic due to their linguistic flexibility, i.e. -ing words can function as nouns, adjectives and verbs, this
dissertation investigates how problematic -ing words are and explores possible solutions for improvement of their MT output. A corpus-based approach for a better representation of the domain-specific structures where -ing words occur is used. After selecting a significant sample, the -ing
words are classified following a functional categorisation presented by Izquierdo (2006). The sample is machine-translated using a customised RBMT system.
A feature-based human evaluation is then performed in order to obtain information about the specific feature under study. The results showed that 73% of the -ing words
were correctly translated in terms of grammaticality and accuracy for German, Japanese and Spanish. The percentage for French was lower at 52%. These data, combined with a thorough analysis of the MT output, allows for the identification of cross-language and language-specific issues and their characteristics, setting the path
for improvement. The approaches for improvements examined cover both the pre- and post-processing stages of automated translation. For pre-processing, controlled language (CL) and automatic source re-writing (ASR) are explored and evaluated. For post-processing, global search and replace (Global S&R) and statistical post-editing (SPE) methods are tested. CL is reported to reduce -ing word ambiguity but to not achieve substantial machine translation improvement. Regex-based implementations of ASR and Global S&R efforts show considerable translation improvements ranging from 60% to 95% and minimal degradation, ranging from 0% to 18%. The results yielded for SPE show little improvement, or even degradation at both sentence and -ing word level.
|
|
Keyword(s):
|
Machine translating; -ing words; machine translation; evaluation |
Publication Date:
|
2010 |
|
Type:
|
Doctoral thesis |
|
Peer-Reviewed:
|
No |
|
Language(s):
|
English |
|
Institution:
|
Dublin City University |
|
Funder(s):
|
Enterprise Ireland |
|
Citation(s):
|
Aranberri Monasterio, Nora (2010) -ing words in RBMT: multilingual evaluation and exploration of pre- and post-processing solutions. PhD thesis, Dublin City University. |
|
Publisher(s):
|
Dublin City University. School of Applied Language and Intercultural Studies; Dublin City University. Centre for Translation and Textual Studies (CTTS) |
|
File Format(s):
|
application/pdf |
|
Supervisor(s):
|
O'Brien, Sharon |
|
Related Link(s):
|
http://doras.dcu.ie/15093/1/thesis_nora_aranberri_2009.pdf |
|
First Indexed:
2010-03-30 05:08:44 Last Updated:
2015-03-23 05:23:29 |