Abstract |
For natural language processing and other applications, it has long seemed desirable to group words together according to their essential semantic type-[[Human]], [[Animate]], [[Artefact]], [[Physical Object]], [[Event]], etc.-and to arrange them into a hierarchy. Vast lexical and conceptual ontologies such as WordNet and BSO have been built on this foundation. Examples such as fire a [[Human]] (=dismiss from employment vs. fire a [[Weapon]] (=cause to discharge a projectile) have led to the expectation that semantic types such as [[Weapon]] and [[Human]] can be used systematically for word sense disambiguation. Unfortunately, this expectation is often unwarranted. For example, one attends an [[Event]]-a meeting, a lecture, a funeral, a coronation, etc., but there are many events-e.g. a thunderstorm, a suicide-that people do not attend, while some of the things that people do attend-e.g. a school, a church, a clinic-are not [[Event]]s, but rather [[Location]]s where specific events take place. The sense of attend is much the same in all these examples, unaffected by differences in the semantic type of the direct object. Nevertheless, the pattern [[Human]] attend [[Event]] is well established and intuitively canonical. The CPA (Corpus Pattern Analysis) project at Masaryk University, Brno, provides two steps for dealing with this kind of inconvenient linguistic phenomenon: Non-canonical lexical items are coerced into "honorary" membership of a lexical set in particular contexts, e.g. school, church, clinic are coerced into membership of the [[Event]] set in the context of attend, but not, for example, in the context of arrange. The ontology is not a rigid yes/no structure, but a statistically based structure of shimmering lexical sets. Thus, each canonical member of a lexical set is recorded with statistical contextual information, like this: [[Event]]: ... meeting. Thus, the semantic ontology is a shimmering hierarchy populated with words which come in and drop out according to context, and whose relative frequency in those contexts is measured. A shimmering ontology of this kind preserves, albeit in a weakened form, the predictive benefits of hierarchical conceptual organization, while maintaining the empirical validity of natural-language description. |
BibTex |
@InProceedings{ELX08-022, author = {Patrick Hanks, Elisabetta Ježek}, title = {Shimmering Lexical Sets}, pages = {391-402}, booktitle = {Proceedings of the 13th EURALEX International Congress}, year = {2008}, month = {jul}, date = {15-19}, address = {Barcelona, Spain}, editor = {Elisenda Bernal, Janet DeCesaris}, publisher = {Institut Universitari de Linguistica Aplicada, Universitat Pompeu Fabra}, isbn = {978-84-96742-67-3}, } |