Syntax and Semantics vs. Statistics for Italian Multiword Expressions: Empirical Prototypes and Extraction Strategies

By November 17, 2016,
Page927-937
AuthorLuigi Squillante
TitleSyntax and Semantics vs. Statistics for Italian Multiword Expressions: Empirical Prototypes and Extraction Strategies
AbstractIn this work we present an empirical analysis performed on Italian nominal multiword expressions (MWEs) of the form [noun + adjective] that aims at studying quantitatively their syntactic and semantic features in order to improve their automatic identification and collection. Three indices are proposed, which are able to measure syntactic and semantic frozeness of the expressions on empirical basis in a corpus of about 1.8 million words, composed of Italian texts concerning the domain of physics. The combination of the three indices can be used to create a global measure, that we call Prototypicality Index (PI), which appears to be useful in the automatic extraction of terminological MWEs. The performance of PI at extracting true positives out of a candidate list is compared to those of the well-known statistical association measures Log-likelihood and Pointwise Mutual Information. Our results show how the performance of PI can be comparable to those of association measures, although it does not involve statistical calculations. Thus, PI can be seen as a new option for lexicographers and terminologists to integrate the already available statistical methods when identifying MWEs from texts.
SessionPhraseology and Collocation
Keywordsmultiword expressions; terminology; prototype; extraction; empirical tests
BibTex
@InProceedings{ELX2014-071,
author={Luigi Squillante},
title={Syntax and Semantics vs. Statistics for Italian Multiword Expressions: Empirical Prototypes and Extraction Strategies},
pages={927-937},
booktitle={Proceedings of the 16th EURALEX International Congress},
year={2014},
month={jul},
date={15-19},
address={Bolzano, Italy},
editor={Abel, Andrea and Vettori, Chiara and Ralli, Natascia},
publisher={EURAC research},
isbn={978-88-88906-97-3},
}
Download