Abstract |
This paper presents a part of the network frame of Croatian linguistics which focuses on a new kind of thesaurus, based on morpho-semantic features of words. Instead of the classic (e.g. MULTEXT-EAST) POS tagging of words for grammatical and some semantic categories (e.g. animate), in this paper every word has its hierarchical T-structure which can hold various data types in its branches (string, integer, link, word list, ordered word list etc.), and in that way words and their various occurrence possibilities in a text can be described even better. Moreover, the known WordNet or other semantic structures (e.g. the Croatian Language Portal, terminology repository or a network encyclopedia) can be presented as T-structure nodes in the same way. During this process each word in the definition of an entry is linked to a lexicon, which results in increasing the semantic connectivity of words by at least one order of magnitude (about ten times more semantic relations). Searching through and browsing such a network dictionary brings a new dimension, and words in the dictionary, beside the paradigmatic, also possess all the syntagmatic properties, because the computer processes their appearance in any utterance or sentence as a series of connected nodes (LOD objects). This provides the possibility of storing all data in triplestore (e.g. on the Virtuoso server). |
BibTex |
@InProceedings{ELX2016-027,
author={Marko Orešković, Mirko Čubrilo, Mario Essert},
title={The Development of a Network Thesaurus with Morpho-semantic Word Markups},
pages={273-279},
booktitle={Proceedings of the 17th EURALEX International Congress},
year={2016},
month={sep},
date={6-10},
address={Tbilisi, Georgia},
editor={Tinatin Margalitadze, George Meladze},
publisher={Ivane Javakhishvili Tbilisi University Press},
isbn={978-9941-13-542-2},
} |