Towards the Automatic Generation of a Pattern-Based Dictionary of Spanish Verbs

By December 19, 2024,
Page 367-383
Author Irene Renau, Rogelio Nazar, Daniel Mora Melanchthon
Title Towards the Automatic Generation of a Pattern-Based Dictionary of Spanish Verbs
Abstract Corpus Pattern Analysis, CPA (Hanks, 2004a; 2004b; 2013), is a technique for identifying local semantic and syntactic information of a word and mapping it to its meanings. In verbs, it consists basically of the argument structure labelled with semantic types for each argument. CPA is used in several dictionary projects and allows systematic corpus analysis; however, it is extremely time-consuming. In this paper, we present a method for the automatic pattern identification of Spanish verbs in corpora. We used a syntactic parser for dependency analysis (Stanza), applied a named entity recognition (NER) tagger from the Flair NLP framework for NER and, for common nouns, we implemented a semantic tagger and a word sense disambiguation method, both created for the task. All resources were combined to extract CPA verb patterns. The method performs better than previous attempts and can contribute to a more efficient pattern-based lexicography.
Session Talk
Keywords argument structure; Corpus Pattern Analysis; pattern-based lexicography; semantic tagging; word sense disambiguation
BibTex
@inproceedings{euralex_2024_paper_29,
address = {Cavtat},
title = {Towards the Automatic Generation of a Pattern-Based Dictionary of Spanish Verbs},isbn = {978-953-7967-77-2},
shorttitle = {Euralex 2024},
url = {},
language = {eng},
booktitle = {Lexicography and Semantics. Proceedings of the XXI EURALEX International Congress},
publisher = {Institut za hrvatski jezik},
author = {Renau, Irene and Nazar, Rogelio and Melanchthon, Daniel Mora},
editor = {Despot, Kristina Š. and Ostroški Anić, Ana and Brač, Ivana},
year = {2024},
pages = {367-383}
}
Download