The Czech National Corpus

By admynNovember 17, 2016Euralex 2000, Publications

Page	127-132
Author	Jan Koček, Marie Schmiedtová, Vera Kopřivová
Title	The Czech National Corpus
Abstract	The paper deals with the history of the Czech National Corpus (CNC) project. It reports on the present stage of its development, describes what type of corpus it is, and the text processing methods and morphological annotation used in its compilation. It also briefly discusses the software used in the CNC. The Bank of Czech (BoC) has now 330 million word forms. It is the basis of a representative corpus (SYN2000 – 100 million word forms) which was created in spring 2000, and is intended as a material source for future dictionaries. At the moment the lexical saturation of the material is tested.
Session	PART 3 - Corpora, Tools and NLP Dictionaries
Keywords
BibTex	@InProceedings{ELX00-015, author = {Jan Koček, Marie Schmiedtová, Vera Kopřivová}, title = {The Czech National Corpus}, pages = {127-132}, booktitle = {Proceedings of the 9th EURALEX International Congress}, year = {2000}, month = {aug}, date = {8-12}, address = {Stuttgart, Germany}, editor = {Ulrich Heid, Stefan Evert, Egbert Lehmann, Christian Rohrer}, publisher = {Institut für Maschinelle Sprachverarbeitung}, isbn = {3-00-006574-1}, }
Download