Using Google books unigrams to improve the update of large monolingual reference dictionaries.

By admynNovember 17, 2016Euralex 2012, Publications

Page	362-366
Author	Alexander Geyken, Lothar Lemnitzer
Title	Using Google books unigrams to improve the update of large monolingual reference dictionaries.
Abstract	This paper describes ongoing work to extend a traditional dictionary using a large opportunistic corpus in combination with a unigram list from the Google Books project. This approach was applied to German with the following resources: the Wörterbuch der Deutschen Gegenwartssprache (WDG, 1961-1977), the German unigram-list of Google Books and the DWDS-E corpus. Both corpus resources were normalized. The subsequent analysis shows that the normalized unigram list has clear complementary information to offer with respect to DWDS-E and that a comparatively small amount of manual work is sufficient to detect a fairly large number of new and relevant dictionary entry candidates.
Session	Corpus-driven lexicography
Keywords	practical lexicography, computational linguistics, corpus statistics, lemma list
BibTex	@InProceedings{ELX12-021, author = {Alexander Geyken and Lothar Lemnitzer}, title = {Using Google books unigrams to improve the update of large monolingual reference dictionaries.}, pages = {362--366}, booktitle = {Proceedings of the 15th EURALEX International Congress}, year = {2012}, month = {aug}, date = {7-11}, address = {Oslo,Norway}, editor = {Ruth Vatvedt Fjeld and Julie Matilde Torjusen}, publisher = {Department of Linguistics and Scandinavian Studies, University of Oslo}, isbn = {978-82-303-2228-4}, }
Download