Lexicographic Potential of the Georgian Dialect Corpus

By November 23, 2016,
Page 300-309
Author Marina Beridze, David Nadaraia
Title Lexicographic Potential of the Georgian Dialect Corpus
Abstract The project Linguistic Portrait of Georgia envisages various aspects of documentation of Georgian linguistic reality by means of corpus methodologies. This title has been an umbrella for three large-scale projects within the framework of which The Georgian Dialect Corpus – GDC (http://corpora.co) was developed. Presently, the architecture and text base of the corpus have been designed, being permanently developed and updated. Besides, the lexicographic base of the corpus has been organized, agglomerating data from printed dialect dictionaries. The lexical stock of the corpus has been presented based on text, lexicographic and encyclopedic data. The total quantity of tokens in the corpus is estimated to be up to 2 000 000, while the lexicographic base has 60 000 items (lemmas with entries) by now; this quantity is considerably increased owing to phonetic and grammatical variations, frequently associated with a single lexical item.
Session Lexicography and Corpus Linguistics
Keywords Georgian Dialect Corpus; lexicographic base; encyclopedic data
