Dictionary of Valencies Meets Corpus Annotation: A Case of Russian FrameBank

By November 17, 2016,
AuthorOlga Lyashevskaya
TitleDictionary of Valencies Meets Corpus Annotation: A Case of Russian FrameBank
AbstractThe Russian FrameBank project aims at the development of a hybrid lexical resource that links a dictionary of valencies and an annotated corpus. Two types of data present generalized lexical constructions (LexCxs) and their realizations in contemporary written texts (1950-present).
The predicate-argument structure for verbs, nominalizations, adjectives, adverbs, and other lexical units in Russian is mostly encoded in case and prepositional marking while word alignment is determined by information structure. This means that an argument can be found in any part of the sentence and the window for argument detection is infinitely wide. Russian predicates reveal more than 1000 typical morphosyntactic patterns; the number of shallow realizations under certain grammatical and discourse constraints is even greater.
Morphosyntactic patterns are not fully predictable by semantics (Apresjan 1967), and, hence, we can speak here about lexical constructions. The patterns with lexical slots evoked by two or more target lexemes (e.g. idiomatic phrases like vzjal i ‘he suddenly (lit. took and) ’) are also treated as LexCxs. As experiments on unsupervised LexCx retrieval have shown (Toldova et al. 2008, Lashevskaja and Mitrofanova 2009), there is a great need for an open data pool annotated manually for lexical frames. In a wider perspective, the project on tagging the form and meaning pairings is of great significance for lexical and syntactic research, lexicography, and IR tasks.
The dictionary of lexical constructions matches frames evoked by a particular target word into morphosyntactic patterns. The relevant dataset here is semantic explications (roles), lexico-semantic constrains (e.g. human, emotion, etc.), morphosyntactic constraints on the elements, their syntactic ranks.
FrameBank is an offspring project of the Russian National Corpus (http://www.ruscorpora.ru) and involves a large illustrative sample taken from the corpus. The goal of framenet-like corpus annotation is to reveal the diverse realizations of a certain LexCxs in the running text and to mark the elements that correspond to constructional arguments and adjuncts. The corpus part of FrameBank details morphological and syntactic mismatches, violation of lexical and semantic constraints, and focuses on the grammatical constructions that introduce or license the use of elements within a given construction. This is a report on work in progress, which can be followed at http://framebank.ru.
Keywordsframe semantics, FrameNet, Construction Grammar, Russian
