Abstract |
This study presents an innovative approach to crafting and enhancing Japanese lexical networks by incorporating large language models (LLMs), especially GPT-4o, utilizing data from Matsushita’s (2011) Vocabulary Database for Reading Japanese to accommodate various proficiency levels. Through this process, we extracted a total of 137,870 synonym relations and 54,324 antonym relations, forming a network comprising 104,427 nodes. A portion of the dataset underwent manual evaluation to determine the accuracy of the extracted synonym relationships, yielding an average evaluation score of 4.08 out of 5. Our findings demonstrate that almost 20% of extracted nouns are (near) synonyms, while the rest have various relation types to the source word including hyponymy, hypernymy, meronymy, class membership etc. The study emphasizes the synergy between AI-driven data generation and traditional lexicographic expertise, offering a scalable and adaptable framework for diverse linguistic applications, with implications for computational linguistics and NLP technologies. |
BibTex |
@inproceedings{euralex_2024_paper_23, address = {Cavtat}, title = {Enhancing Japanese Lexical Networks Using Large Language Models – Extracting Synonyms and Antonyms with GPT-4o},isbn = {978-953-7967-77-2}, shorttitle = {Euralex 2024}, url = {}, language = {eng}, booktitle = {Lexicography and Semantics. Proceedings of the XXI EURALEX International Congress}, publisher = {Institut za hrvatski jezik}, author = {Špica, Dragana and Perak, Benedikt}, editor = {Despot, Kristina Š. and Ostroški Anić, Ana and Brač, Ivana}, year = {2024}, pages = {283-303} } |