Abstract |
This paper presents a methodology for retrieving collocations from a digital text corpus for practical lexicographical purposes. The methodology has been tested on the Danish PAROLE Corpus. It is argued that each lexical target item has a unique collocational profile, and that precision is enhanced markedly when the search for collocations is tailored to fit the individual profiles, even under sparse data conditions. The disclosure of the profiles is based on a notion of positional weight relative to the target item, i.e. some contextual positions carry a heavier collocational load than others. The actual weights are estimated by comparing the original corpus with a twin corpus with random word order. As word order is decisive for the bulk of collocations, this comparison reveals salient collocational positions for each target item of the original corpus. The retrieved collocations are marked up according to the SGML standard, thus facilitating easy overview and dynamic presentation. |
BibTex |
@InProceedings{ELX02-056, author = {Dorthe Duncker}, title = {Collecting Collocations }, pages = {521-531}, booktitle = {Proceedings of the 10th EURALEX International Congress}, year = {2002}, month = {aug}, date = {13-17}, address = {København, Denmark}, editor = {Anna Braasch and Claus Povlsen}, publisher = {Center for Sprogteknologi}, isbn = {87-90708-09-1}, } |