|Author||Hennie van der Vliet, Isa Maks, Piek Vossen, Roxane Segers|
|Title||The Cornetto database: Semantic issues in linking lexical units and synsets|
|Abstract||Cornetto is a lexical semantic database that combines the Dutch Wordnet (Vossen 1998) and the Referentie Bestand Nederlands (Van der Vliet 2007). The Dutch Wordnet (DWN) is similar to the Princeton Wordnet for English (Fellbaum 1998), and the Referentie Bestand Nederlands (RBN) includes frame-like information as in FrameNet (Fillmore, Baker, Sato 2004) as well as information on the combinatorical behaviour of word meanings. The combination of the lexical resources has resulted in a rich relational database that may improve natural language processing technologies.
An important aspect of combining the resources is the alignment of the lexical units (LU’s) and the synsets. Automatic alignment of RBN and DWN resulted in an initial version of the Cornetto database. This version has been further extended both automatically and manually. The resulting data structure is stored in a database that keeps separate collections for LU’s (mainly derived from RBN), synsets (derived from DWN) and, in addition, a formal ontology (SUMO/MILO, see Niles and Pease 2001). These 3 semantic resources represent different viewpoints and layers of linguistic and conceptual information. The resulting resource is freely available for research in the form of an XML database.
In this contribution, we will concentrate on the semantic information in Cornetto. We will discuss the differences in the perspective on semantics in the LU’s and synsets and we will give a brief overview of the differences with regard to semantic information. The merging of the two resources resulted in very rich semantic database. However, combining lexica with different perspectives on semantics causes specific problems in the alignment of LU’s and synsets and leads to findings that shed light on the organization of meaning in the lexicon.
|Session||Computational Lexicography and Lexicology|