Using Proper Names to Cluster Documents

Dan Winchester and Mark Lee

Proper Names are a frequent occurrence in all types of natural language text. However, the treatment of proper names is an area under-researched by Natural Language Processing. One particular problem is how to link information about the same entity referred to by possibly different proper names in several documents. In this paper we describe a prototype system which first pre-processes individual documents using a simple name-conflation algorithm and then uses an adaptation of Schutze’s contextgroup discrimination algorithm to cluster documents that are judged to contain references to the same named entity. We use this system to assess the potential utility of different contextual cues to the task.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.