Lexicographic Potential of the Georgian Dialect Corpus

Marina Beridze; David Nadaraia

Lexicographic Potential of the Georgian Dialect Corpus

By Robert LewNovember 23, 2016Euralex 2016, Publications

Page	300-309
Author	Marina Beridze, David Nadaraia
Title	Lexicographic Potential of the Georgian Dialect Corpus
Abstract	The project Linguistic Portrait of Georgia envisages various aspects of documentation of Georgian linguistic reality by means of corpus methodologies. This title has been an umbrella for three large-scale projects within the framework of which The Georgian Dialect Corpus – GDC (http://corpora.co) was developed. Presently, the architecture and text base of the corpus have been designed, being permanently developed and updated. Besides, the lexicographic base of the corpus has been organized, agglomerating data from printed dialect dictionaries. The lexical stock of the corpus has been presented based on text, lexicographic and encyclopedic data. The total quantity of tokens in the corpus is estimated to be up to 2 000 000, while the lexicographic base has 60 000 items (lemmas with entries) by now; this quantity is considerably increased owing to phonetic and grammatical variations, frequently associated with a single lexical item.
Session	Lexicography and Corpus Linguistics
Keywords	Georgian Dialect Corpus; lexicographic base; encyclopedic data
BibTex	@InProceedings{ELX2016-031, author={Marina Beridze, David Nadaraia}, title={Lexicographic Potential of the Georgian Dialect Corpus}, pages={300-309}, booktitle={Proceedings of the 17th EURALEX International Congress}, year={2016}, month={sep}, date={6-10}, address={Tbilisi, Georgia}, editor={Tinatin Margalitadze, George Meladze}, publisher={Ivane Javakhishvili Tbilisi University Press}, isbn={978-9941-13-542-2}, }
Download

Lexicographic Potential of the Georgian Dialect Corpus

Contact data

EURALEX address

EURALEX is supported by

Quick message