Enriching Georgian Dictionary Entries with Frequency Information

By November 23, 2016,
Page 321-327
Author Sophiko Daraselia, Serge Sharoff
Abstract In this paper we will discuss the integration of corpus analysis into the dictionary making process for the Georgian language. In general, corpus-based lexicography is not a common practice in lexicography in Georgia. This paper presents the first attempt to introduce the corpus-based dictionary wordlist, entries and examples from the Georgian web-corpus – KaWaC. This is a large web corpus of modern Georgian language covering recent 10-15 years of the language development. It contains a wide range of text types, topics and regions from the Internet excluding translations and poetry on the assumption that the language of translation and poetry deviate from the naturally produced language, as the corpus aims to represent naturally occurring modern Georgian language. Within this research we have defined the dictionary wordlist - 10,000 lemmas from the corpus that is a core vocabulary for the Georgian language, compiled the dictionary entries and extracted dictionary examples from the corpus.
Session Lexicography and Corpus Linguistics
Keywords corpus linguistics; web-corpus; corpus-based lexicography; learner’s dictionaries
