Abstract |
We present the COR.SEM lexicon, an open-source semantic lexicon for general AI purposes funded by the Danish Agency for Digitisation as part of an AI initiative embarked upon by the Danish Government in 2020. COR.SEM describes the core senses of 34,000 Danish lemmas with formal semantic information, e.g., ontological type, hypernym, semantic frame, regular polysemy pattern, and polarity value; features which are in essence drawn and simplified from other existing resources. Lexical information from The Danish Dictionary DDO and the Danish Thesaurus DDB is also integrated, e.g., user examples, domain label, synonyms, and near synonyms. It provides direct links to synsets in the Danish WordNet DanNet, as well as to the morphological lemma information in COR, the Central WordRegister which is based on the Danish Orthographical Dictionary and DDO. The register’s common numerical index at both lemma and sense level makes it is more straightforward to merge mono- as well as bilingual dictionaries with COR.SEM and thereby inherit the formal semantic information. At the website corsem.dsl.dk it is possible to browse the lexical entries and to download tailored extracts of data of your choice. We give examples of the use of COR.SEM in linguistic studies, in NLP tasks and in lexicographic projects. |
BibTex |
@inproceedings{euralex_2024_paper_11, address = {Cavtat}, title = {COR.SEM, a New Formal Semantic Lexicon for Danish},isbn = {978-953-7967-77-2}, shorttitle = {Euralex 2024}, url = {}, language = {eng}, booktitle = {Lexicography and Semantics. Proceedings of the XXI EURALEX International Congress}, publisher = {Institut za hrvatski jezik}, author = {Nimb, Sanni and Flörke, Ida and Olsen, Sussi and Pedersen, Bolette S. and Sørensen, Nathalie C. H.}, editor = {Despot, Kristina Š. and Ostroški Anić, Ana and Brač, Ivana}, year = {2024}, pages = {141-155} } |