From Parallel to Comparable Text Corpora

AuthorCarol Peters, Eugenio Picchi
TitleFrom Parallel to Comparable Text Corpora
AbstractWe present a bilingual corpus management system under development in Pisa. The first component of this system was a set of procedures to create and query parallel text archives; we are now studying the implementation of a second set of procedures to interrogate comparable archives. The approach followed is quite different from that used for parallel data and considerably more complex; the results are also very different. In the paper, we describe the strategy we are adopting to retrieve significant data from comparable corpora, and discuss the preliminary results.
SessionPART 1 - Computational Lexicology and Lexicography
author = {Carol Peters, Eugenio Picchi},
title = {From Parallel to Comparable Text Corpora},
pages = {167-171},
booktitle = {Proceedings of the 7th EURALEX International Congress},
year = {1996},
month = {aug},
date = {13-18},
address = {Göteborg, Sweden},
editor = {Martin Gellerstam, Jerker Järborg, Sven-Göran Malmgren, Kerstin Norén, Lena Rogström, Catalina Röjder Papmehl},
publisher = {Novum Grafiska AB},
isbn = {91-87850-14-1},