Extracting Phraseology for Content Analysis and Document Retrieval

By admynNovember 17, 2016Euralex 2000, Publications

Page	351-358
Author	Thierry Fontenelle
Title	Extracting Phraseology for Content Analysis and Document Retrieval
Abstract	This paper describes a program which identifies the topic of a text by extracting the most relevant key words and phraseological sequences. The various factors taken into account to generate this list of terms and expressions are described (frequency of occurrence, classification as a function of the number of elements, processing of abbreviations, use of customisable stop lists. . . ). The output can then be used by powerful search engines to retrieve topic-related texts which are believed to display a high degree of repetitivity, an essential criterion for building translation memory databases.
Session	PART 8 - Extraction of terminologically relevant multiword expressions
Keywords
BibTex	@InProceedings{ELX00-042, author = {Thierry Fontenelle}, title = {Extracting Phraseology for Content Analysis and Document Retrieval}, pages = {351-358}, booktitle = {Proceedings of the 9th EURALEX International Congress}, year = {2000}, month = {aug}, date = {8-12}, address = {Stuttgart, Germany}, editor = {Ulrich Heid, Stefan Evert, Egbert Lehmann, Christian Rohrer}, publisher = {Institut für Maschinelle Sprachverarbeitung}, isbn = {3-00-006574-1}, }
Download