Towards Semi-Automatic Dictionary Making – Creating the Frequency Dictionary of Hungarian Verb Phrase Constructions

By November 17, 2016,
Page453-462
AuthorJúlia Pajzs, Bálint Sass
TitleTowards Semi-Automatic Dictionary Making – Creating the Frequency Dictionary of Hungarian Verb Phrase Constructions
AbstractThe paper describes the lexicographical aspects of creating a frequency dictionary by a semi-automatic process. The bulk of the work is made by task specific software. The output of the program is then manually checked, corrected and filtered. The result is a collection of the most frequent Hungarian verb phrase constructions (VPCs), illustrated by corpus examples. This is a corpus driven dictionary, based on the 187,6 million word synchronic Hungarian National Corpus (http://corpus.nytud.hu/mnsz) which was analyzed by a series of programs. Its output is a set of XML format draft entries, which were then hand validated and edited by lexicographers. The dictionary contains the most frequent Hungarian verbs along with their most typical syntactic constructions. At the current phase of the project we decided to collect the most frequent constructions only: their absolute frequency had to be more than 250. The dictionary contains roughly 2300 entries and 6500 VPCs. Each construction is illustrated by a corpus example. The verbal entries are presented in alphabetical order primarily. Different kinds of indices are also included in the printed version. The users of this dictionary envisaged to be mainly linguists, working on Hungarian grammars, lexicographers working on bilingual dictionaries and last but not least: advanced level learners of Hungarian, who want to expand their knowledge on the Hungarian nominal verbal collocation relationships. The dictionary is planned to be published both in printed and electronic format. Parts of the algorithm used for this project could be applied to produce other dictionaries, all the more so, as some of them are actually language independent. It is also highly cost effective: both the programming and the lexicographic work required one person year each.
SessionComputational Lexicography and Lexicology
Keywords
BibTex
@InProceedings{ELX10-034,
author = {Júlia Pajzs, Bálint Sass},
title = {Towards Semi-Automatic Dictionary Making - Creating the Frequency Dictionary of Hungarian Verb Phrase Constructions},
pages = {453-462},
booktitle = {Proceedings of the 14th EURALEX International Congress},
year = {2010},
month = {jul},
date = {6-10},
address = {Leeuwarden/Ljouwert, The Netherlands},
editor = {Anne Dykstra and Tanneke Schoonheim},
publisher = {Fryske Akademy},
isbn = {978-90-6273-850-3},
}
Download