Empirical Analysis of Predictive Algorithms for Collaborative Filtering
John Breese, David Heckerman, Carl Kadie
Collaborative filtering or recommender systems use a database about user preferences to predict additional topics or products a new user might like. In this paper we describe several algorithms designed for this task, including techniques based on correlation coefficients, vector-based similarity calculations, and statistical Bayesian methods. We compare the predictive accuracy of the various methods in a set of representative problem domains. We use two basic classes of evaluation metrics. The first characterizes accuracy over a set of individual predictions in terms of average absolute deviation. The second estimates the utility of a ranked list of suggested items. This metric uses an estimate of the probability that a user will see a recommendation in an ordered list. Experiments were run for datasets associated with 3 application areas, 4 experimental protocols, and the 2 evaluation metrics for the various algorithms. Results indicate that for a wide range of conditions, Bayesian networks with decision trees at each node and correlation methods outperform Bayesian-clustering and vector-similarity methods. Between correlation and Bayesian networks, the preferred method depends on the nature of the dataset, nature of the application (ranked versus one-by-one presentation), and the availability of votes with which to make predictions. Other considerations include the size of database, speed of predictions, and learning time.
Keywords: Collaborative filtering, decision trees, Bayesian networks, correlation.
PS Link: http://www.research.microsoft.com/users/breese/algsweb.PS
PDF Link: /papers/98/p43-breese.pdf
AUTHOR = "John Breese
and David Heckerman and Carl Kadie",
TITLE = "Empirical Analysis of Predictive Algorithms for Collaborative Filtering",
BOOKTITLE = "Proceedings of the Fourteenth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-98)",
PUBLISHER = "Morgan Kaufmann",
ADDRESS = "San Francisco, CA",
YEAR = "1998",
PAGES = "43--52"