Abstract |
The aim of this paper is to highlight the present stage of the digitalized lexicographic research from Romania and the importance of creating a Romanian Essential Lexicography Corpus. In the last years there have been taken measures for creating electronic instruments and resources that are necessary for supporting the Romanian language and culture on a transnational level, in the general context of the computerization of the fundamental academic research. The Romanian academic specialists in linguistics and applied informatics, as well as in computational linguistics fields, have initiated research projects by which they want to valorise the non-digitized resources by acquiring them in electronic formats and to create new resources and instruments for the automatic processing of the language. The project presented in this paper has as purpose the valorization of certain results from the complex project eDTLR, by using, as reference text for the alignment, the Thesaurus Dictionary in electronic format and creating a Romanian lexicographic corpus. This project's aims are: the realization of a scanned corpus, with the reference dictionaries of DLR (taking into account the present legislation regarding copyright); scanning and processing of these dictionaries (by OCR – optical character recognition – the conversion from image to text; parsing the text at entry); realizing an on-line interface for validating/correcting of the parsing (= automatic identification of the entries from previously scanned and converted dictionaries), as well as validating the alignment between the text of the Romanian Language Thesaurus Dictionary (in electronic format, from eDTLR project) and the reference dictionaries from DLR Bibliography. The final database will include an important number of essential Romanian language dictionaries (100 dictionaries from the 16th century to present day) aligned at entry level, fact that will offer Romanian specialists an excellent working instrument and will set basis for future research. |
BibTex |
@InProceedings{ELX12-105, author = {Elena Tamba Dănilă and Marius Radu Clim and Mădălin Pătraşcu and Ana Catană Spenchiu and}, title = {The Evolution of the Romanian Digitalized Lexicography. The Essential Romanian Lexicographic Corpus}, pages = {1014--1017}, booktitle = {Proceedings of the 15th EURALEX International Congress}, year = {2012}, month = {aug}, date = {7-11}, address = {Oslo,Norway}, editor = {Ruth Vatvedt Fjeld and Julie Matilde Torjusen}, publisher = {Department of Linguistics and Scandinavian Studies, University of Oslo}, isbn = {978-82-303-2228-4}, } |