Bilingual Terminology Acquisition from Unrelated Corpora

By November 17, 2016,
Page 1023-1029
Author Rogelio Nazar
Title Bilingual Terminology Acquisition from Unrelated Corpora
Abstract This paper presents a simple yet effective technique for the extraction of term equivalents in different languages. In general, techniques for bilingual lexicon extraction have been related to the elaboration of parallel corpora and have yielded accurate results. However, parallel corpora of different domains and languages are not easy to compile. Because of this, some authors have explored techniques to extract a bilingual lexicon from nonparallel but comparable corpora, which are pairs of texts that are not exactly translations of each other but that roughly 'talk about the same things'. This paper describes an algorithm that performs bilingual terminology extraction without the need of large amounts of data; dealing with infrequent units; needing not the corpora to be comparable nor other resources like an initial bilingual lexicon to use as seed words. In spite of its simplicity, the results of this algorithm are comparable to those of the state of the art techniques, however it supersedes them considering that it offers a domain and language independent method specially suitable for the extraction of specialized terminology, which is the most dynamic part of the lexicon and the most difficult to acquire.
Session 5. Lexicography for Specialised Languages - Terminology and Terminography
author = {Rogelio Nazar},
title = {Bilingual Terminology Acquisition from Unrelated Corpora},
pages = {1023-1029},
booktitle = {Proceedings of the 13th EURALEX International Congress},
year = {2008},
month = {jul},
date = {15-19},
address = {Barcelona, Spain},
editor = {Elisenda Bernal, Janet DeCesaris},
publisher = {Institut Universitari de Linguistica Aplicada, Universitat Pompeu Fabra},
isbn = {978-84-96742-67-3},