Corpus as a Means for Study of Lexical Usage Changes

By November 17, 2016,
Page 437-447
Author Michal Křen, Jaroslava Hlaváčová
Title Corpus as a Means for Study of Lexical Usage Changes
Abstract The paper presents a corpus-based method for obtaining ranked wordlists that can characterise lexical usage changes. The method is evaluated on two 100-million representatively balanced corpora of contemporary written Czech that cover two consecutive time periods. Despite similar overall design of the corpora, lexical frequencies have to be first normalised in order to achieve comparability. Furthermore, dispersion information is used to reduce the number of domain-specific items, as their frequencies highly depend on inclusion of particular texts into the corpus. Statistical significance measures are finally used for evaluation of frequency differences between individual items in both corpora. It is demonstrated that the method ranks the resulting wordlists appropriately and several limitations of the approach are also discussed. Influence of corpora composition cannot be completely obliterated and comparability of the corpora is shown to play a key role. Therefore, although highly-ranked items are often found to be related to changes of language usage, their relevance should be cautiously interpreted. In addition to several general language words, the real examples of lexical variation are found to be limited mostly to temporary topics of public discourse or items reflecting recent technological development, thus sketching an overall picture of lifestyle changes.
Session 1. Computational Lexicography and Lexicology
author = {Michal Křen, Jaroslava Hlaváčová},
title = {Corpus as a Means for Study of Lexical Usage Changes},
pages = {437-447},
booktitle = {Proceedings of the 13th EURALEX International Congress},
year = {2008},
month = {jul},
date = {15-19},
address = {Barcelona, Spain},
editor = {Elisenda Bernal, Janet DeCesaris},
publisher = {Institut Universitari de Linguistica Aplicada, Universitat Pompeu Fabra},
isbn = {978-84-96742-67-3},