A Workflow for Supplementing a Latvian-English Dictionary with Data from Parallel Corpora and a Reversed English-Latvian Dictionary

Page 127-135
Author Daiga Deksne, Andrejs Veisbergs
Title A Workflow for Supplementing a Latvian-English Dictionary with Data from Parallel Corpora and a Reversed English-Latvian Dictionary
Abstract The lexicon of contemporary languages is changing rapidly, mostly by acquiring new loans and derivations. The change in lexicon is best reflected in the corpora of contemporary languages. Nowadays many collections of parallel-aligned texts are available electronically. To satisfy user needs for a modern, complete, up-to-date dictionary, we created a workflow for enriching the existing Latvian-English dictionary with data from parallel corpora containing lexis commonly used in contemporary language, as well as data from the reversed English-Latvian dictionary. While revising the existing Latvian-English dictionary, we identified some issues, for example, missing feminine forms of the nouns naming nationalities and occupations, representation of the words with optional parts or spelling variations. The task of dictionary improvement was done semi-automatically by the joint work of a lexicographer, computational linguists and programmers. Such natural language processing tools as a tokenizer, part-of-speech tagger, lemmatizer and spell-checker were used to reduce the manual work. As a result, the number of entries has increased by 32%, and the number of translations by 28%.
Session DICTIONARY-MAKING PROCESS
Keywords electronic dictionaries, parallel corpora, NLP tools, XML format
BibTex
@InProceedings{ELX2018-010,
author={Daiga Deksne, Andrejs Veisbergs},
title={A Workflow for Supplementing a Latvian-English Dictionary with Data from Parallel Corpora and a Reversed English-Latvian Dictionary},
pages={127-135},
booktitle={Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts},
year={2018},
month={jul},
date={17-21},
address={Ljubljana, Slovenia},
editor={Jaka Čibej, Vojko Gorjanc, Iztok Kosem, Simon Krek},
publisher={Ljubljana University Press, Faculty of Arts},
isbn={978-961-06-0097-8}, }
Download