Extracting Phraseology for Content Analysis and Document Retrieval

By November 17, 2016,
AuthorThierry Fontenelle
TitleExtracting Phraseology for Content Analysis and Document Retrieval
AbstractThis paper describes a program which identifies the topic of a text by extracting the most relevant key words and phraseological sequences. The various factors taken into account to generate this list of terms and expressions are described (frequency of occurrence, classification as a function of the number of elements, processing of abbreviations, use of customisable stop lists. . . ). The output can then be used by powerful search engines to retrieve topic-related texts which are believed to display a high degree of repetitivity, an essential criterion for building translation memory databases.
SessionPART 8 - Extraction of terminologically relevant multiword expressions
author = {Thierry Fontenelle},
title = {Extracting Phraseology for Content Analysis and Document Retrieval},
pages = {351-358},
booktitle = {Proceedings of the 9th EURALEX International Congress},
year = {2000},
month = {aug},
date = {8-12},
address = {Stuttgart, Germany},
editor = {Ulrich Heid, Stefan Evert, Egbert Lehmann, Christian Rohrer},
publisher = {Institut für Maschinelle Sprachverarbeitung},
isbn = {3-00-006574-1},