Computational Metalexicography in Practice – Corpus-based support for the revision of a commercial dictionary

Page333-345
AuthorVincent J. Docherty, Ulrich Heid
TitleComputational Metalexicography in Practice – Corpus-based support for the revision of a commercial dictionary
AbstractIn a cooperation between dictionary publishers and computational linguists, raw material for the revision of the German part of a bilingual German -» English dictionary (Langenscheidts Handwörterbuch Englisch, Neubearbeitung 1991) was produced. In a case study, the entries for headwords with the initial letter "p", then, - between August 1997 and March 1998 - the full dictionary were systematically checked against a 300 million word German newspaper corpus from the late 80s and early 90s. The objective was to find evidence to support updates of the lemma inventory of the dictionary and to enhance the example and collocation coverage. The data production from the corpora is automatic, the (manual, interactive) lexicographic procedures remain unchanged. To this end, standard corpus pre-processing (tokenizing, tagging, lemmatization) and a hierarchical set of query templates for collocation extraction were used. The dictionary was transformed into a specific data format (similar to database entries), and the examples contained in the articles were prepared for automatic querying. The results are of metalexicographic interest: they show the potential of refined macrostructural selection procedures, help to improve the documentation of readings through examples, and, generally, provide an example of the use of standard computational linguistic techniques for dictionary revision. The auxiliary resources constructed from the corpora in the same process - a verb frequency lexicon for German and a collection of noun-verb collocation candidates are useful and relevant in their own right. Similarly, the tools used are mostly generic and thus reusable outside the specific context discussed here.
SessionPART 4 - The Dictionary-Making Process
KeywordsMetalexicography, dictionary analysis, dictionary updates, corpus based semiautomatic lexical acquisition.
BibTex
@InProceedings{ELX98_2-005,
author = {Vincent J. Docherty, Ulrich Heid},
title = {Computational Metalexicography in Practice - Corpus-based support for the revision of a commercial dictionary},
pages = {333-345},
booktitle = {Proceedings of the 8th EURALEX International Congress},
year = {1998},
month = {aug},
date = {4-8},
address = {Liège, Belgium},
editor = {Thierry Fontenelle, Philippe Hiligsmann, Archibald Michiels, André Moulin, Siegfried Theissen},
publisher = {Euralex},
isbn = {2-87233-091-7},
}
Download