A tool for Multi-word collocation extraction and visualization in Multilingual Corpora

By November 17, 2016,
Page 755-766
Author Violeta Seretan, Luka Nerima, Eric Wehrli
Title A tool for Multi-word collocation extraction and visualization in Multilingual Corpora
Abstract This document describes an implemented system of collocation extraction which is designed as aid to translation and which will be used in a real translation environment. Its main functionalities are: retrieving multi-word collocations from an existing corpus of documents in a given language (only French and English are supported for the time being); visualizing the list of extracted terms and their contexts by using a concordance tool; retrieving the translation equivalent of the sentences containing the collocations in the existing parallel corpora; and enabling the user to create a sublist of validated collocations to be further used as reference in translation. The approach underlying this system is hybrid, as the extraction method combines the syntactic analysis of texts (for selecting the collocation candidates) with a statistical-based measure for the relevance test (i.e., for candidates ranking according to the collocational strength). We present the underlying approach and methodology, the architecture of the systems, we describe the main system components and provide several experimental results.
Session Phraseology and Collocation
Keywords
BibTex
@InProceedings{ELX04-082,
author = {Violeta Seretan, Luka Nerima, Eric Wehrli},
title = {A tool for Multi-word collocation extraction and visualization in Multilingual Corpora },
pages = {755-766},
booktitle = {Proceedings of the 11th EURALEX International Congress},
year = {2004},
month = {july},
date = {6-10},
address = {Lorient, France},
editor = {Geoffrey Williams and Sandra Vessier},
publisher = {UniversiteĢ de Bretagne-Sud, FaculteĢ des lettres et des sciences humaines},
isbn = {29-52245-70-3},
}
Download