An efficient algorithm for the automatic building of a lexicon from textual corpora

By November 17, 2016,
Page 129-139
Author Stefano Federici
Title An efficient algorithm for the automatic building of a lexicon from textual corpora
Abstract The LE-2111 SPARKLE (Shallow Parsing and Knowledge extraction for Language Engineering) project is aimed at the automatic extraction of lexical and semantic information from textual corpora in order to improve the performances of NLP systems. In this paper we describe an algorithm for the extraction of subcategorization patterns for Italian verbs. The extraction procedure is carried out on the basis of an efficient and accurate analogy-based engine and pre- and post-filters based on simple linguistic constraints. Despite the simplicity of the analogy-based algorithm the amount of lost information is negligible, and precision and recall over a set of hand-crafted subcategorization patterns (namely those produced within the LE PAROLE project) is fairly high
Session PART 2 - Computational Lexicology and Lexicography
Keywords linguistic knowledge extraction, lexicon building, finite state automata, chunking
BibTex
@InProceedings{ELX98_1-018,
author = {Stefano Federici},
title = {An efficient algorithm for the automatic building of a lexicon from textual corpora},
pages = {129-139},
booktitle = {Proceedings of the 8th EURALEX International Congress},
year = {1998},
month = {aug},
date = {4-8},
address = {Liège, Belgium},
editor = {Thierry Fontenelle, Philippe Hiligsmann, Archibald Michiels, André Moulin, Siegfried Theissen},
publisher = {Euralex},
isbn = {2-87233-091-7},
}
Download