Linking Historical Corpus Data and Annotations Using Wikibase

David Lindemann; Mikel Alonso

Linking Historical Corpus Data and Annotations Using Wikibase

By Iztok KosemDecember 19, 2024Euralex 2024, Publications

Page	785-791
Author	David Lindemann, Mikel Alonso
Title	Linking Historical Corpus Data and Annotations Using Wikibase
Abstract	This software demonstration presents a data model and a first use case for the representation of text corpus data on a Wikibase instance, including morphosyntactic, semantic and philological annotations as well as links to dictionary entries. Wikibase, an extension of MediaWiki, is the software that underlies Wikidata (Vrandečić & Krötzsch, 2014), an exceptionally large crowdsourced queryable knowledge graph, which includes nodes for ontological concepts, on the one hand, and for lexemes, lexeme senses and lexeme forms, on the other, together with annotations to and relations between them. We argue that the proposed model and the chosen software solutions for the representation of corpus and dictionary data, all free and open source, meet with the requirements of provenance transparency, open access and re-use, and the capability of collaborative work on the data. We also present our own scripts wrapped in a web application that shortcut several workflow steps in a first use case, a 1737 Basque manuscript, transcribed on Wikisource, and represented as an annotated dataset on our Wikibase instance.
Session	Software Demonstration
Keywords	Basque; historical corpus; Wikibase; corpus annotations; Linked Dana
BibTex	@inproceedings{euralex_2024_paper_64, address = {Cavtat}, title = {Linking Historical Corpus Data and Annotations Using Wikibase},isbn = {978-953-7967-77-2}, shorttitle = {Euralex 2024}, url = {}, language = {eng}, booktitle = {Lexicography and Semantics. Proceedings of the XXI EURALEX International Congress}, publisher = {Institut za hrvatski jezik}, author = {Lindemann, David and Alonso, Mikel}, editor = {Despot, Kristina Š. and Ostroški Anić, Ana and Brač, Ivana}, year = {2024}, pages = {785-791} }
Download

Linking Historical Corpus Data and Annotations Using Wikibase

Contact data

EURALEX address

EURALEX is supported by

Quick message