Getting Synonym Candidates from Raw Data in the English Lexical Substitution Task

By November 17, 2016,
Page 420-430
Author Diana McCarthy, Bill Keller, Roberto Navigli
Title Getting Synonym Candidates from Raw Data in the English Lexical Substitution Task
Abstract Distributional similarity provides a technique for obtaining semantically related words from corpus data using automated methods that compare the contexts in which the words appear. Such methods can be useful for producing thesauruses, with application to work in lexicography and computational linguistics. However, the most similar words produced using these methods are not always near synonyms, but may be words in other semantic relationships: antonyms, hyponyms or even looser 'topical' relations. This means that manual post-processing of such automatically produced resources to filter out unwanted words may be necessary before they can be used. This paper evaluates the performance of distributional methods for finding synonyms on the English Lexical Substitution Task, a lexical paraphrasing task where it is necessary to generate candidate synonyms for a target word and then select a suitable substitute on the basis of contextual information. We examine the performance of distributional methods for the first step of generating candidate synonyms and leave the second step of choosing a candidate on the basis of context for future work. A number of automated distributional methods are compared to techniques that make use of manually produced thesauruses. We demonstrate that while the performance of such automatic thesaurus acquisition methods is often below manually produced resources, precision can be greatly increased by using two automatic methods in combination. This approach gives precision results that surpass methods that exploit manually constructed resources for the same task, albeit at the expense of coverage. We conclude that such an approach to increase the precision of automatic methods to find near synonyms could improve the use of distributional methods in lexicography.
Session Computational Lexicography and Lexicology
author = {Diana McCarthy, Bill Keller, Roberto Navigli},
title = {Getting Synonym Candidates from Raw Data in the English Lexical Substitution Task},
pages = {420-430},
booktitle = {Proceedings of the 14th EURALEX International Congress},
year = {2010},
month = {jul},
date = {6-10},
address = {Leeuwarden/Ljouwert, The Netherlands},
editor = {Anne Dykstra and Tanneke Schoonheim},
publisher = {Fryske Akademy},
isbn = {978-90-6273-850-3},