The Czech National Corpus

By November 17, 2016,
Page127-132
AuthorJan Koček, Marie Schmiedtová, Vera Kopřivová
TitleThe Czech National Corpus
AbstractThe paper deals with the history of the Czech National Corpus (CNC) project. It reports on the present stage of its development, describes what type of corpus it is, and the text processing methods and morphological annotation used in its compilation. It also briefly discusses the software used in the CNC. The Bank of Czech (BoC) has now 330 million word forms. It is the basis of a representative corpus (SYN2000 – 100 million word forms) which was created in spring 2000, and is intended as a material source for future dictionaries. At the moment the lexical saturation of the material is tested.
SessionPART 3 - Corpora, Tools and NLP Dictionaries
Keywords
BibTex
@InProceedings{ELX00-015,
author = {Jan Koček, Marie Schmiedtová, Vera Kopřivová},
title = {The Czech National Corpus},
pages = {127-132},
booktitle = {Proceedings of the 9th EURALEX International Congress},
year = {2000},
month = {aug},
date = {8-12},
address = {Stuttgart, Germany},
editor = {Ulrich Heid, Stefan Evert, Egbert Lehmann, Christian Rohrer},
publisher = {Institut für Maschinelle Sprachverarbeitung},
isbn = {3-00-006574-1},
}
Download