On the Detection of Neologism Candidates as a Basis for Language Observation and Lexicographic Endeavors: the STyrLogism Project

Andrea Abel; Egon W. Stemle

On the Detection of Neologism Candidates as a Basis for Language Observation and Lexicographic Endeavors: the STyrLogism Project

By Iztok KosemAugust 29, 2018Euralex 2018, Publications

Page	535-544
Author	Andrea Abel, Egon W. Stemle
Title	On the Detection of Neologism Candidates as a Basis for Language Observation and Lexicographic Endeavors: the STyrLogism Project
Abstract	The goal of the project STyrLogisms is to semi-automatically extract candidate neologisms (new lexemes) for the German standard variety used in South Tyrol. We use a list of manually vetted URLs from news, magazines and blog websites of South Tyrol, and regularly crawl their data, clean and process it. We compare this new data to reference corpora, additional regional word lists and all the formerly crawled data sets. Our reference corpora are DECOW14, with around 60 million word forms, and the South Tyrolean Web Corpus, with around 2.4 million word forms; the additional word lists consist of named entities, terminological terms from the region and specific terms of the German standard variety used in South Tyrol (altogether around 53,000 word forms). Here, we will report on the method employed, the first round of candidate extraction with an approach for a classification schema for the selected candidates, and some remarks on the second extraction round.
Session	NEOLOGISMS
Keywords	neologism, web corpus, dictionary of variants
BibTex	@InProceedings{ELX2018-043, author={Andrea Abel, Egon W. Stemle}, title={On the Detection of Neologism Candidates as a Basis for Language Observation and Lexicographic Endeavors: the STyrLogism Project}, pages={535-544}, booktitle={Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts}, year={2018}, month={jul}, date={17-21}, address={Ljubljana, Slovenia}, editor={Jaka Čibej, Vojko Gorjanc, Iztok Kosem, Simon Krek}, publisher={Ljubljana University Press, Faculty of Arts}, isbn={978-961-06-0097-8}, }
Download

On the Detection of Neologism Candidates as a Basis for Language Observation and Lexicographic Endeavors: the STyrLogism Project

Contact data

EURALEX address

EURALEX is supported by

Quick message