Trendi – a monitor corpus of Slovene

Iztok Kosem

Trendi – a monitor corpus of Slovene

By Iztok KosemSeptember 7, 2022Euralex 2022, Publications

Page	230-239
Author	Iztok Kosem
Title	Trendi – a monitor corpus of Slovene
Abstract	In this paper we present Trendi, a monitor corpus of written Slovene, which has been compiled recently as part of the SLED (Monitor corpus and related resources) project. The methodology and the contents of the corpus are presented, as well as the findings of the survey that aimed to identify the needs of potential users related to topical language use. The Trendi corpus currently contains news articles and other web content from 110 different sources, with the texts being collected and linguistically annotated on a daily basis. The corpus complements Gigafida 2.0, a 1.13billionword reference corpus of standard written Slovene. Also discussed are the ways in which the corpus will be integrated into various lexicographic projects, helping not only in the identification of neologisms but also in monitoring changes in already identified language phenomena.
Session	Talk
Keywords	Monitor corpus, language use, trends, Slovene, neologisms, lexicography, newsfeed
BibTex	@inproceedings{euralex_mannheim_trendi_2022, address = {Mannheim}, title = {Trendi - a {Monitor} {Corpus} of {Slovene}}, isbn = {978-3-937241-87-6}, shorttitle = {Euralex (2022)}, url = {}, language = {eng}, booktitle = {Dictionaries and {Society}. {Proceedings} of the {XX} {EURALEX} {International} {Congress}}, publisher = {IDS-Verlag}, author = {Kosem, Iztok}, editor = {Klosa-Kückelhaus, Annette and Engelberg, Stefan and Möhrs, Christine and Storjohann, Petra}, year = {2022}, pages = {230--239}, }
Download

Trendi – a monitor corpus of Slovene

Contact data

EURALEX address

EURALEX is supported by

Quick message