Unified Data Modelling for Presenting Lexical Data: The Case of EKILEX

Page 749-761
Author Arvi Tavast, Margit Langemets, Jelena Kallas, Kristina Koppel
Title Unified Data Modelling for Presenting Lexical Data: The Case of EKILEX
Abstract The Institute of the Estonian Language is developing EKILEX, a new dictionary writing system for both semasiological dictionaries and onomasiological termbases. While the long-term vision is to have a single data source that provides consistent information about Estonian, the system also needs to cope with the multitude of existing datasets. In this paper, we present work in progress on modelling the data and importing an initial sample of legacy dictionaries. The data model is based on an m:n relation between words and meanings, which are both unified across dictionaries, even while there still are separate dictionaries in the system. What is dictionary-specific is only the mapping between word and meaning. The importing of dictionaries has revealed various issues with data quality: ambiguities, underspecification, inconsistencies and conflicts. These need to be dealt with, if the long-term vision is to be achieved. We also outline the next steps of human- and machine-readable publishing, corpus connection and quantification (frequency, salience measures, etc.).
Session VARIOUS TOPICS
Keywords data modelling, dictionary portal, interoperability, linked data, Estonian
BibTex
@InProceedings{ELX2018-061,
author={Arvi Tavast, Margit Langemets, Jelena Kallas, Kristina Koppel},
title={Unified Data Modelling for Presenting Lexical Data: The Case of EKILEX},
pages={749-761},
booktitle={Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts},
year={2018},
month={jul},
date={17-21},
address={Ljubljana, Slovenia},
editor={Jaka Čibej, Vojko Gorjanc, Iztok Kosem, Simon Krek},
publisher={Ljubljana University Press, Faculty of Arts},
isbn={978-961-06-0097-8}, }
Download