Uncertainty in Artificial Intelligence
First Name   Last Name   Password   Forgot Password   Log in!
    Proceedings   Proceeding details   Article details         Authors         Search    
A Hierarchical Graphical Model for Record Linkage
Pradeep Ravikumar, William Cohen
Abstract:
The task of matching co-referent records is known among other names as rocord linkage. For large record-linkage problems, often there is little or no labeled data available, but unlabeled data shows a reasonable clear structure. For such problems, unsupervised or semi-supervised methods are preferable to supervised methods. In this paper, we describe a hierarchical graphical model framework for the linakge-problem in an unsupervised setting. In addition to proposing new methods, we also cast existing unsupervised probabilistic record-linkage methods in this framework. Some of the techniques we propose to minimize overfitting in the above model are of interest in the general graphical model setting. We describe a method for incorporating monotinicity constraints in a graphical model. We also outline a bootstrapping approach of using "single-field" classifiers to noisily label latent variables in a hierarchical model. Experimental results show that our proposed unsupervised methods perform quite competitively even with fully supervised record-linkage methods.
Keywords: null
Pages: 454-461
PS Link:
PDF Link: /papers/04/p454-ravikumar.pdf
BibTex:
@INPROCEEDINGS{Ravikumar04,
AUTHOR = "Pradeep Ravikumar and William Cohen",
TITLE = "A Hierarchical Graphical Model for Record Linkage",
BOOKTITLE = "Proceedings of the Twentieth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-04)",
PUBLISHER = "AUAI Press",
ADDRESS = "Arlington, Virginia",
YEAR = "2004",
PAGES = "454--461"
}


hosted by DSL   •   site info   •   help