Automatic example sentence extraction for a contemporary German dictionary

By November 17, 2016,
Page 343-349
Author Jörg Didakowski, Lothar Lemnitzer, Alexander Geyken
Title Automatic example sentence extraction for a contemporary German dictionary
Abstract The integration of illustrative examples into monolingual dictionaries provides an intuitive means for grasping the meaning of a word. Tight space constraints of print media no longer apply with online dictionaries. Thus, the inclusion of examples is obviously a useful complement or substitute for the traditional ways of meaning exemplification. In this article, an approach is presented to automatically extract example sentences from a large German corpus collection. The extraction is done on the basis of the notions of sentence readability and complexity and word usage. The extracted examples are a good pre-selection for further integration into a digitized version of a contemporary German dictionary by lexicographers. A quantitative and qualitative evaluation of the extraction results is presented in the article. The work is related to the dictionary project Digitales Wörterbuch der deutschen Sprache (The Digital Dictionary of the German Language, DWDS in short) which integrates multiple dictionary and corpus resources and language statistics on the German language in a digital lexical information system which can be accessed on-line.
Session Corpus-driven lexicography
Keywords example extraction, digital dictionary, practical lexicography, natural language processing.
BibTex
@InProceedings{ELX12-019,
author = {Jörg Didakowski and Lothar Lemnitzer and Alexander Geyken},
title = {Automatic example sentence extraction for a contemporary German dictionary},
pages = {343--349},
booktitle = {Proceedings of the 15th EURALEX International Congress},
year = {2012},
month = {aug},
date = {7-11},
address = {Oslo,Norway},
editor = {Ruth Vatvedt Fjeld and Julie Matilde Torjusen},
publisher = {Department of Linguistics and Scandinavian Studies, University of Oslo},
isbn = {978-82-303-2228-4},
}
Download