Semantic Annotation of Verbs for the Tatar Corpus

Author Alfiya Galieva, Olga Nevzorova
Title Semantic Annotation of Verbs for the Tatar Corpus
Abstract This paper discusses the problem of developing the metalanguage for linguistic applications and introduces a tag set for the semantic annotation of verbs for the Tatar National Corpus. At present, there are no generally accepted standards for the development of corpus semantic annotation. In many cases it is made by individual researchers or teams for one or another research project, and characteristics of tag sets used in thesauri and electronic corpora differ in many respects. Using available semantic classifications of vocabulary for different languages and relying upon data from Tatar lexicons, we created a model of the semantic system of Tatar verbs and divided them into semantic classes (3,200 words). We distinguished semantic tags of two types: constructional (categorial) tags, independent of semantic classes of verbs, and semantic (thematic) tags, determining semantic classes of verbs. For separating these classes we used the hierarchical and the overlapping classifications, so the same verb may belong to more than one class. The approach is based on the data from explanatory dictionaries of the Tatar language, bilingual Russian-Tatar dictionaries and the system of semantic annotation of the Russian National Corpus. In the current version of our semantic annotation we use 3 categorial and 59 thematic tags.
Session Lexicography and Corpus Linguistics
Keywords Tatar verb; semantics; corpus; semantic annotation
