Statistical Tools for Corpus Analysis: A Tagger and Lemmatizer for Italian

By November 17, 2016,
Page 501-510
Author Eugenio Picchi
Title Statistical Tools for Corpus Analysis: A Tagger and Lemmatizer for Italian
Abstract We present the most recent addition to the PiSystem, an integrated set of tools for mono- and bilingual corpus creation and manipulation and dictionary construction. The new component is a statistical part-of-speech tagger and lemmatizer. The methodology adopted resembles that of similar procedures for other languages but the PiTagger has been developed to meet the particular requirements of a highly inflected language such as Italian. Texts analysed by the PiTagger can then be directly interrogated using the tagged corpus query procedures included in the system. The philosophy behind a procedure for sense disambiguation now being designed and tested is also briefly described.
Session PART 3 - Lexicographical and lexicological projects
Keywords
BibTex
@InProceedings{ELX94-056,
author = {Eugenio Picchi},
title = {Statistical Tools for Corpus Analysis: A Tagger and Lemmatizer for Italian},
pages = {501-510},
booktitle = {Proceedings of the 6th EURALEX International Congress},
year = {1994},
month = {aug-sep},
date = {30-3},
address = {Amsterdam, the Netherlands},
editor = {Willy Martin, Willem Meijs, Margreet Moerland, Elsemiek ten Pas, Piet van Sterkenburg & Piek Vossen},
publisher = {Euralex},
isbn = {90-900-7537-2},
}
Download