Database of ANalysed Texts of English (DANTE): the NEID database project

By November 17, 2016,
Page 549-556
Author B.T. Sue Atkins, Adam Kilgarriff, Michael Rundell
Title Database of ANalysed Texts of English (DANTE): the NEID database project
Abstract More and more users nowadays prefer electronic dictionaries to paper editions because of their convenience, easier access to different kinds of linguistic data, especially in the case of big professional dictionaries. This is why most of the popular paper dictionaries have their electronic versions. Nevertheless, printed editions are still popular with certain categories of users and even perceived by them as more trustworthy than purely electronic dictionaries. This paper describes the procedure of making a print version of the electronic English-Russian dictionary Lingvo Universal and shows different kinds of problems lexicographers dealt with at each stage of the project. DANTE is a lexicographic project where the end product is not a dictionary but a lexical database resulting from in-depth analysis of corpus data. The users of DANTE are not the dictionary-using public but the lexicographic teams who will develop dictionaries and computer lexicons from it. This project is the source-language analysis stage of the New English-Irish Dictionary (NEID: http://www.forasnagaeilge.ie/), being developed for Foras na Gaeilge, Dublin (FnaG: http://www.forasnagaeilge.ie/). The project was designed and carried out by the Lexicography MasterClass (http://www.lexmasterclass.com). The database covers approximately 50,000 headwords and 45,000 compounds, idioms and phrasal verbs, using over 40 datatypes in their lexical description. It was created in the course of 2.5 years by LexMC’s 15-strong lexicographic team, managed by Valerie Grundy, Managing Editor; the project administration is in the hands of Diana Rawlinson, Project Administrator.
What makes DANTE special is the application of an existing methodology across the whole lexicon, extremely systematically and at an unprecedented level of detail. Amongst other aspects of this project, we describe:
improving the reliability of schedule and workflow by classifying, before the compiling started, over 50,000 headwords according to type and complexity;
the systematic use of 68 model ‘template’ entries;
a new approach to quality control, combining conventional entry-editing by senior team members with the use of complex search scripts that list all entities of a specific type and allow rapid checking for accuracy;
the customisation of the Sketch Engine ( http://www.sketchengine.co.uk/) corpus query software, with a corpus of 1.7bn words;
the use of IDM’s Dictionary Production System (DPS: http://www.idm.fr/products/dictionary_writing_system/27/)
The DANTE database is a rare, possibly unique, beast: a rich and comprehensive lexicographic analysis on linguistic principles, prepared on a substantial budget by a large team of professional lexicographers, and uncompromised by the needs of accessibility to non-linguist users.
Session Reports on Lexicographical and Lexicological Projects
Keywords
BibTex
@InProceedings{ELX10-045,
author = {B.T. Sue Atkins, Adam Kilgarriff, Michael Rundell},
title = {Database of ANalysed Texts of English (DANTE): the NEID database project},
pages = {549-556},
booktitle = {Proceedings of the 14th EURALEX International Congress},
year = {2010},
month = {jul},
date = {6-10},
address = {Leeuwarden/Ljouwert, The Netherlands},
editor = {Anne Dykstra and Tanneke Schoonheim},
publisher = {Fryske Akademy},
isbn = {978-90-6273-850-3},
}
Download