RIDIRE. Corpus and Tools for the Acquisition of Italian L2

Alessandro Panunzi; Emanuela Cresti; Lorenzo Gregori

RIDIRE. Corpus and Tools for the Acquisition of Italian L2

By admynNovember 17, 2016Euralex 2014, Publications

Page	447-462
Author	Alessandro Panunzi, Emanuela Cresti, Lorenzo Gregori
Title	RIDIRE. Corpus and Tools for the Acquisition of Italian L2
Abstract	This paper introduces the RIDIRE corpus, built by means of an open source tool (RIDIRE-CPI) for creating specifically designed web corpora through a targeted crawling strategy. The RIDIRE-CPI architecture combines existing open source tools with specifically developed modules, comprising a robust crawler, a user friendly web interface, several conversion and cleaning tools, an anti-duplicate filter, a language guesser, and a PoS-tagger. The RIDIRE corpus is a balanced Italian web corpus (1.5 billion tokens) designed for enhancing the study of Italian as a second language, while also being exploitable for lexicographic purposes. The targeted crawling was performed through content selection, metadata assignment, and validation procedures. These features allowed the construction of a large corpus with a specific design, covering a variety of language usage domains (News, Business, Administration and Legislation, Literature, Fiction, Design, Cookery, Sport, Tourism, Religion, Fine Arts, Cinema, Music). The RIDIRE query system allows research to be carried out on the whole corpus itself or on the sub-corpora. Specifically, available queries comprehend all the functions usually exploited in corpus-based lexicography: frequency lists, concordances and patterns, collocations, Sketches, and Sketch Differences.
Session	Lexicography and Corpus Linguistics
Keywords	Corpus linguistics; Terminology; Collocations
BibTex	@InProceedings{ELX2014-033, author={Alessandro Panunzi and Emanuela Cresti and Lorenzo Gregori}, title={RIDIRE. Corpus and Tools for the Acquisition of Italian L2}, pages={447-462}, booktitle={Proceedings of the 16th EURALEX International Congress}, year={2014}, month={jul}, date={15-19}, address={Bolzano, Italy}, editor={Abel, Andrea and Vettori, Chiara and Ralli, Natascia}, publisher={EURAC research}, isbn={978-88-88906-97-3}, }
Download

RIDIRE. Corpus and Tools for the Acquisition of Italian L2

Contact data

EURALEX address

EURALEX is supported by

Quick message