The Czech National Corpus

By November 17, 2016,
Page 127-132
Author Jan Koček, Marie Schmiedtová, Vera Kopřivová
Title The Czech National Corpus
Abstract The paper deals with the history of the Czech National Corpus (CNC) project. It reports on the present stage of its development, describes what type of corpus it is, and the text processing methods and morphological annotation used in its compilation. It also briefly discusses the software used in the CNC. The Bank of Czech (BoC) has now 330 million word forms. It is the basis of a representative corpus (SYN2000 – 100 million word forms) which was created in spring 2000, and is intended as a material source for future dictionaries. At the moment the lexical saturation of the material is tested.
Session PART 3 - Corpora, Tools and NLP Dictionaries
