Parallel corpora as a source of defining language-specific lexical items

By Robert LewNovember 23, 2016Euralex 2016, Publications

Page	394-401
Author	Dmitri Sitchinava
Title	Parallel corpora as a source of defining language-specific lexical items
Abstract	The paper presents an attempt to propose an exact method for identifying the so-called “language-specific” lexicon, a controversial notion often reasonably questioned. An aligned bilingual parallel corpus is chosen as an instrument for finding “specificity”, and statistical entropy and other indices are used as markers of the dispersion of translation patterns (viz. stimuli). For example, a word can be deemed (maximally) language-specific if it occurs multiple times in a given bilingual corpus and is translated each time in a different way. A word is minimally (or simply not) language-specific if it is translated each time identically. Some problems relative to the application of this method are discussed. These data can be explicitly used in bilingual dictionaries.
Session	Lexicography and Corpus Linguistics
Keywords	parallel corpora; language-specific lexicon; translation patterns; statistics
BibTex	@InProceedings{ELX2016-042, author={Dmitri Sitchinava}, title={Parallel corpora as a source of defining language-specific lexical items}, pages={394-401}, booktitle={Proceedings of the 17th EURALEX International Congress}, year={2016}, month={sep}, date={6-10}, address={Tbilisi, Georgia}, editor={Tinatin Margalitadze, George Meladze}, publisher={Ivane Javakhishvili Tbilisi University Press}, isbn={978-9941-13-542-2}, }
Download