METRICC: Harnessing comparable corpora for multilingual lexicon development

By November 17, 2016,
Page 389-403
Author Araceli Alonso, Helena Blancafort, Clément De Groc, Chrystel Millon, Geoffrey Williams
Title METRICC: Harnessing comparable corpora for multilingual lexicon development
Abstract Research on comparable corpora has grown in recent years bringing about the possibility of developing multilingual lexicons through the exploitation of comparable corpora to create corpus-driven multilingual dictionaries. To date, this issue has not been widely addressed. This paper focuses on the use of the mechanism of collocational networks proposed by Williams (1998) for exploiting comparable corpora. The paper first provides a description of the METRICC project, which is aimed at the automatically creation of comparable corpora and describes one of the crawlers developed for comparable corpora building, and then discusses the power of collocational networks for multilingual corpus-driven dictionary development.
Session Corpus-driven lexicography
Keywords comparable corpora, focused web crawler, collocational networks, multilingual dictionaries, Cultural Heritage lexicon.
BibTex
@InProceedings{ELX12-024,
author = {Araceli Alonso and Helena Blancafort and Clément De Groc and Chrystel Millon and Geoffrey Williams},
title = {METRICC: Harnessing comparable corpora for multilingual lexicon development},
pages = {389--403},
booktitle = {Proceedings of the 15th EURALEX International Congress},
year = {2012},
month = {aug},
date = {7-11},
address = {Oslo,Norway},
editor = {Ruth Vatvedt Fjeld and Julie Matilde Torjusen},
publisher = {Department of Linguistics and Scandinavian Studies, University of Oslo},
isbn = {978-82-303-2228-4},
}
Download