Abstract |
In this paper, we report on three ways of exploiting the information available in the biggest defining dictionary (and the corresponding lexical database) on modern Swedish. First, domain classification of unknown texts is performed, based on the domain labels in the dictionary/database. Then, partial disambiguation is performed, based on the assumption that ambiguous words that possibly belong to the relevant domain, probably belong to this domain, given the domain classification. Third, assuming that non-analysable words in the text that are not names possibly belong to the relevant domain, we investigated the possibility of expanding the dictionary automatically in this way. The experiments described in the paper tend to support the hypotheses underlying point 2 and point 3, but of course there are problems. While point 1 and point 2 have been discussed in several papers on computational linguistics, point 3 seems to have received somewhat less attention, both in lexicographical contexts and in language technology. |
BibTex |
@InProceedings{ELX04-099, author = {Sven-Göran Malmgren, Christian Sjögreen}, title = {Using a lexical database for domain determination, partial disambiguation and dictionary expansion }, pages = {897-903}, booktitle = {Proceedings of the 11th EURALEX International Congress}, year = {2004}, month = {july}, date = {6-10}, address = {Lorient, France}, editor = {Geoffrey Williams and Sandra Vessier}, publisher = {Université de Bretagne-Sud, Faculté des lettres et des sciences humaines}, isbn = {29-52245-70-3}, } |