Computational linguistic tools for semi-automatic corpus-based updating of dictionaries

By November 17, 2016,
Page 183-195
Author Ulrich Heid, Wolfgang Worsch, Stefan Evert, Vincent Docherty, Matthias Wermke
Title Computational linguistic tools for semi-automatic corpus-based updating of dictionaries
Abstract We will demonstrate an interface which allows the lexicographer to view the results of an automatic comparison of lexicographic descriptions from existing German dictionaries with corpus data. The second part of the paper will discuss in detail the use made of the raw material in the recent update of Langenscheidt’s Großwörterbuch Deutsch – Englisch, Der kleine Muret-Sanders. The examples in the online-demonstration come from work on entries for headwords with the initial letter “T” in Duden. Das große Wörterbuch der deutschen Sprache (8 vols.; Duden GWDS) and from the German part of Langenscheidts Handwörterbuch Deutsch-Englisch (HWB). Both have been compared with data extracted from large newspaper corpora. The interface makes use of a standard web browser for display of lexical data. The demonstration will be a guided tour of the data collection, from the lexicographic point of view. The first part of this paper provides the metalexicographic baseline, a short summary of the technology used to develop the data collection, a few examples of the types of data made available. The second part deals with the practical lexicographic use of the data collection in the update of Der kleine Muret-Sanders.
Session PART 4 - Corpus-based Dictionary Making
