Abstract |
This paper presents a multilingual dictionary project of discourse markers. During its first stage, consisting of collecting the list of headwords, we used a parallel corpus to automatically extract units from texts written in Spanish, Catalan, English, French and German. We also applied a method to create a taxonomy structure for automatically organising the markers in clusters. As a result, we obtain an extensive, corpus-driven list of headwords. We present a prototype of the microstructure of the dictionary in the form of a standard XML database and describe the procedure to automatically fill in most of its fields (e. g., the type of DM, the equivalents in other languages, etc.), before human intervention. |
BibTex |
@inproceedings{euralex_mannheim_towards_2022, address = {Mannheim}, title = {Towards a {Multilingual} {Dictionary} of {Discourse} {Markers}. {Automatic} {Extraction} of {Units} from {Parallel} {Corpus}}, isbn = {978-3-937241-87-6}, shorttitle = {Euralex (2022)}, url = {}, language = {eng}, booktitle = {Dictionaries and {Society}. {Proceedings} of the {XX} {EURALEX} {International} {Congress}}, publisher = {IDS-Verlag}, author = {Renau, Irene and Nazar, Rogelio}, editor = {Klosa-Kückelhaus, Annette and Engelberg, Stefan and Möhrs, Christine and Storjohann, Petra}, year = {2022}, pages = {262--272}, } |