Using Google books unigrams to improve the update of large monolingual reference dictionaries.

By November 17, 2016,
Page 362-366
Author Alexander Geyken, Lothar Lemnitzer
Title Using Google books unigrams to improve the update of large monolingual reference dictionaries.
Abstract This paper describes ongoing work to extend a traditional dictionary using a large opportunistic corpus in combination with a unigram list from the Google Books project. This approach was applied to German with the following resources: the Wörterbuch der Deutschen Gegenwartssprache (WDG, 1961-1977), the German unigram-list of Google Books and the DWDS-E corpus. Both corpus resources were normalized. The subsequent analysis shows that the normalized unigram list has clear complementary information to offer with respect to DWDS-E and that a comparatively small amount of manual work is sufficient to detect a fairly large number of new and relevant dictionary entry candidates.
Session Corpus-driven lexicography
Keywords practical lexicography, computational linguistics, corpus statistics, lemma list
BibTex
@InProceedings{ELX12-021,
author = {Alexander Geyken and Lothar Lemnitzer},
title = {Using Google books unigrams to improve the update of large monolingual reference dictionaries.},
pages = {362--366},
booktitle = {Proceedings of the 15th EURALEX International Congress},
year = {2012},
month = {aug},
date = {7-11},
address = {Oslo,Norway},
editor = {Ruth Vatvedt Fjeld and Julie Matilde Torjusen},
publisher = {Department of Linguistics and Scandinavian Studies, University of Oslo},
isbn = {978-82-303-2228-4},
}
Download