Research Repository

Structured and Unstructured Cache Models for SMT Domain Adaptation

Louis, Annie P and Webber, Bonnie (2014) Structured and Unstructured Cache Models for SMT Domain Adaptation. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, April 26-30 2014, Gothenburg.

[img]
Preview
Text
E14-1017.pdf

Download (242kB) | Preview

Abstract

We present a French to English translation system for Wikipedia biography articles. We use training data from out- of-domain corpora and adapt the system for biographies. We propose two forms of domain adaptation. The first biases the system towards words likely in biographies and encourages repetition of words across the document. Since biographies in Wikipedia follow a regular structure, our second model exploits this structure as a sequence of topic segments, where each segment discusses a narrower subtopic of the biography domain. In this structured model, the system is encouraged to use words likely in the current segment’s topic rather than in biographies as a whole. We implement both systems using cache based translation techniques. We show that a system trained on Europarl and news can be adapted for biographies with 0.5 BLEU score improvement using our models. Further the structure-aware model out performs the system which treats the entire document as a single segment.

Item Type: Conference or Workshop Item (Paper)
Subjects: P Language and Literature > P Philology. Linguistics
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Science and Health > Computer Science and Electronic Engineering, School of
Depositing User: Jim Jamieson
Date Deposited: 13 Dec 2016 16:32
Last Modified: 13 Dec 2016 16:32
URI: http://repository.essex.ac.uk/id/eprint/18544

Actions (login required)

View Item View Item