A tool for Multi-word collocation extraction and visualization in Multilingual Corpora

Violeta Seretan; Luka Nerima; Eric Wehrli

A tool for Multi-word collocation extraction and visualization in Multilingual Corpora

By admynNovember 17, 2016Euralex 2004, Publications

Page	755-766
Author	Violeta Seretan, Luka Nerima, Eric Wehrli
Title	A tool for Multi-word collocation extraction and visualization in Multilingual Corpora
Abstract	This document describes an implemented system of collocation extraction which is designed as aid to translation and which will be used in a real translation environment. Its main functionalities are: retrieving multi-word collocations from an existing corpus of documents in a given language (only French and English are supported for the time being); visualizing the list of extracted terms and their contexts by using a concordance tool; retrieving the translation equivalent of the sentences containing the collocations in the existing parallel corpora; and enabling the user to create a sublist of validated collocations to be further used as reference in translation. The approach underlying this system is hybrid, as the extraction method combines the syntactic analysis of texts (for selecting the collocation candidates) with a statistical-based measure for the relevance test (i.e., for candidates ranking according to the collocational strength). We present the underlying approach and methodology, the architecture of the systems, we describe the main system components and provide several experimental results.
Session	Phraseology and Collocation
Keywords
BibTex	@InProceedings{ELX04-082, author = {Violeta Seretan, Luka Nerima, Eric Wehrli}, title = {A tool for Multi-word collocation extraction and visualization in Multilingual Corpora }, pages = {755-766}, booktitle = {Proceedings of the 11th EURALEX International Congress}, year = {2004}, month = {july}, date = {6-10}, address = {Lorient, France}, editor = {Geoffrey Williams and Sandra Vessier}, publisher = {Université de Bretagne-Sud, Faculté des lettres et des sciences humaines}, isbn = {29-52245-70-3}, }
Download

A tool for Multi-word collocation extraction and visualization in Multilingual Corpora

Contact data

EURALEX address

EURALEX is supported by

Quick message