Extracting Phraseology for Content Analysis and Document Retrieval

By November 17, 2016,
Page 351-358
Author Thierry Fontenelle
Title Extracting Phraseology for Content Analysis and Document Retrieval
Abstract This paper describes a program which identifies the topic of a text by extracting the most relevant key words and phraseological sequences. The various factors taken into account to generate this list of terms and expressions are described (frequency of occurrence, classification as a function of the number of elements, processing of abbreviations, use of customisable stop lists. . . ). The output can then be used by powerful search engines to retrieve topic-related texts which are believed to display a high degree of repetitivity, an essential criterion for building translation memory databases.
Session PART 8 - Extraction of terminologically relevant multiword expressions
Keywords
BibTex
@InProceedings{ELX00-042,
author = {Thierry Fontenelle},
title = {Extracting Phraseology for Content Analysis and Document Retrieval},
pages = {351-358},
booktitle = {Proceedings of the 9th EURALEX International Congress},
year = {2000},
month = {aug},
date = {8-12},
address = {Stuttgart, Germany},
editor = {Ulrich Heid, Stefan Evert, Egbert Lehmann, Christian Rohrer},
publisher = {Institut für Maschinelle Sprachverarbeitung},
isbn = {3-00-006574-1},
}
Download