An efficient algorithm for the automatic building of a lexicon from textual corpora

Stefano Federici

An efficient algorithm for the automatic building of a lexicon from textual corpora

By admynNovember 17, 2016Euralex 1998 Part 1, Publications

Page	129-139
Author	Stefano Federici
Title	An efficient algorithm for the automatic building of a lexicon from textual corpora
Abstract	The LE-2111 SPARKLE (Shallow Parsing and Knowledge extraction for Language Engineering) project is aimed at the automatic extraction of lexical and semantic information from textual corpora in order to improve the performances of NLP systems. In this paper we describe an algorithm for the extraction of subcategorization patterns for Italian verbs. The extraction procedure is carried out on the basis of an efficient and accurate analogy-based engine and pre- and post-filters based on simple linguistic constraints. Despite the simplicity of the analogy-based algorithm the amount of lost information is negligible, and precision and recall over a set of hand-crafted subcategorization patterns (namely those produced within the LE PAROLE project) is fairly high
Session	PART 2 - Computational Lexicology and Lexicography
Keywords	linguistic knowledge extraction, lexicon building, finite state automata, chunking
BibTex	@InProceedings{ELX98_1-018, author = {Stefano Federici}, title = {An efficient algorithm for the automatic building of a lexicon from textual corpora}, pages = {129-139}, booktitle = {Proceedings of the 8th EURALEX International Congress}, year = {1998}, month = {aug}, date = {4-8}, address = {Liège, Belgium}, editor = {Thierry Fontenelle, Philippe Hiligsmann, Archibald Michiels, André Moulin, Siegfried Theissen}, publisher = {Euralex}, isbn = {2-87233-091-7}, }
Download

An efficient algorithm for the automatic building of a lexicon from textual corpora

Contact data

EURALEX address

EURALEX is supported by

Quick message