Automation of Lexicographic Work Using General and Specialized Corpora: Two Case Studies

By November 17, 2016,
Page 355-364
Author Iztok Kosem, Polona Gantar, Nataša Logar, Simon Krek
Title Automation of Lexicographic Work Using General and Specialized Corpora: Two Case Studies
Abstract Due to increasingly large amounts of authentic data to analyse, lexicographers are nowadays looking to language technologies to provide them with not only the tools to analyse the data, but also with tools and methods that ease and speed up the data analysis. One of the most promising avenues of research has been the automation of early stages of the corpus data analysis, with the aim to summarize, and consequently reduce, the amount of corpus data that the lexicographers need to examine. However, most of this research deals with general lexicography; terminology is yet to extensively test these methods. This paper attempts to address this gap by presenting two separate Slovene research projects, one lexicographic (Slovene Lexical Database) and the other terminological (Termis), that used the same method of automatic extraction of corpus data (presented in Kosem et al. 2013). After describing the projects and the corpora use, similarities and differences in the parameter settings and the quality of extracted data in the two projects are presented. We conclude with discussing the further potential of automation in both general and specialised lexicography.
Session Lexicography and Language Technologies
Keywords data extraction; terminology; general language; collocations; dictionary; GDEX
BibTex
@InProceedings{ELX2014-025,
author={Iztok Kosem and Polona Gantar and Nataša Logar and Simon Krek},
title={Automation of Lexicographic Work Using General and Specialized Corpora: Two Case Studies},
pages={355-364},
booktitle={Proceedings of the 16th EURALEX International Congress},
year={2014},
month={jul},
date={15-19},
address={Bolzano, Italy},
editor={Abel, Andrea and Vettori, Chiara and Ralli, Natascia},
publisher={EURAC research},
isbn={978-88-88906-97-3},
}
Download