Can We Substitute Field Experts with Customized Large Language Model in Processing Specialized Languages? – A Case Study

By December 19, 2024,
Page 725-739
Author Bruno Nahod
Title Can We Substitute Field Experts with Customized Large Language Model in Processing Specialized Languages? – A Case Study
Abstract After the Croatian national termbase Struna ceased to receive funding in 2019, we began developing a novel model for compiling terminological collections that will not rely on field experts to provide initial terminological information. A potential solution to our issue of finding a practical and dependable source for obtaining information in the initial stages of processing terminology (i.e., the ‘raw definitions’) across multiple domains could be the publicly available AI language model developed by OpenAI known as GPT-4. GPT is a substantial language model that offers a range of capabilities, including answering queries, generating text, and executing tasks like translation and summarization. A custom GPT is currently being devised as an aid module, delivering unprocessed information for terminological units that will be processed in Struna. The initial training phase involved manually providing guidelines for best practices in terminology management, which were designed based on the well-established and successful methodology we used to train field experts in the past. The second phase involves feeding TermAI with modified data that was exported from Struna. In this paper, we will present the results of the comparative analysis of generated terminological units from TermAI and field experts in the domain of forensic sciences.
Session Talk
Keywords artificial intelligence; GPT-4; terminology management: Struna; definition generation
BibTex
@inproceedings{euralex_2024_paper_60,
address = {Cavtat},
title = {Can We Substitute Field Experts with Customized Large Language Model in Processing Specialized Languages? – A Case Study},isbn = {978-953-7967-77-2},
shorttitle = {Euralex 2024},
url = {},
language = {eng},
booktitle = {Lexicography and Semantics. Proceedings of the XXI EURALEX International Congress},
publisher = {Institut za hrvatski jezik},
author = {Nahod, Bruno},
editor = {Despot, Kristina Š. and Ostroški Anić, Ana and Brač, Ivana},
year = {2024},
pages = {725-739}
}
Download