An efficient algorithm for the automatic building of a lexicon from textual corpora

Page129-139
AuthorStefano Federici
TitleAn efficient algorithm for the automatic building of a lexicon from textual corpora
AbstractThe LE-2111 SPARKLE (Shallow Parsing and Knowledge extraction for Language Engineering) project is aimed at the automatic extraction of lexical and semantic information from textual corpora in order to improve the performances of NLP systems. In this paper we describe an algorithm for the extraction of subcategorization patterns for Italian verbs. The extraction procedure is carried out on the basis of an efficient and accurate analogy-based engine and pre- and post-filters based on simple linguistic constraints. Despite the simplicity of the analogy-based algorithm the amount of lost information is negligible, and precision and recall over a set of hand-crafted subcategorization patterns (namely those produced within the LE PAROLE project) is fairly high
SessionPART 2 - Computational Lexicology and Lexicography
Keywordslinguistic knowledge extraction, lexicon building, finite state automata, chunking
BibTex
@InProceedings{ELX98_1-018,
author = {Stefano Federici},
title = {An efficient algorithm for the automatic building of a lexicon from textual corpora},
pages = {129-139},
booktitle = {Proceedings of the 8th EURALEX International Congress},
year = {1998},
month = {aug},
date = {4-8},
address = {Liège, Belgium},
editor = {Thierry Fontenelle, Philippe Hiligsmann, Archibald Michiels, André Moulin, Siegfried Theissen},
publisher = {Euralex},
isbn = {2-87233-091-7},
}
Download