Abstract |
This paper presents a method for automatically extracting subcorpora isolating different subcategorization frames for nouns, adjectives, and verbs in the 100 mi. word BNC. The tool is being used in the FrameNet project, an NSF-funded project that is involved in producing a database and tools for dictionary-building, based on the principles of Frame Semantics. The subcorpora are used (1) to facilitate the selection of corpus lines illustrating the full range of semantic and syntactic combinatory possibilities of a given lemma, (2) to determine relative frequencies of different syntactic contexts of each lemma in the database. The database thus created, which will be human- and computer-readable, will be a rich resource for lexicographers, as well as for researchers in lexicology and natural language processing. |
BibTex |
@InProceedings{ELX98_2-018, author = {Susanne Gahl}, title = {Automatic Extraction of Subcategorization Frames for Corpus-based Dictionary-building}, pages = {445-452}, booktitle = {Proceedings of the 8th EURALEX International Congress}, year = {1998}, month = {aug}, date = {4-8}, address = {Liège, Belgium}, editor = {Thierry Fontenelle, Philippe Hiligsmann, Archibald Michiels, André Moulin, Siegfried Theissen}, publisher = {Euralex}, isbn = {2-87233-091-7}, } |