Comparing Approaches to Uncertain Reasoning: Discussion System Condemnation Pays Off
This exciting session has focused on what I believe is a crucial and often neglected aspect of system design research: system condemnation. The goal of system condemnation research is to be able to say persuasively ?This system is no damn good; junk it.? System condemnation studies should not be expected or allowed to return rave reviews of the systems they evaluate. System designers can be trusted to provide at least as many rave reviews of their systems as the systems deserve; the task of severe criticism of systems is more important and more difficult. The papers we have heard among them cover the four major approaches to system condemnation. One, covered by Dr. Lehner, is the religious approach: sacrilegious systems should be condemned. A second, also covered by Dr. Lehner, might be called the NSF study approach: systems that can be condemned on the basis of simplistic arguments (often but not always a priori) should be. These two approaches to system condemnation share a virtue: they are able to condemn a proposed system at least as well as they can condemn a system already in existence. Effectively used, they can therefore prevent systems from coming into existence, and thus can save us the cost of more empirical approaches to system condemnation. The third approach is experimental, and was covered in the paper by Drs. Vaughan, Perrin, Yadrick, and Holden. It consists of designing simplified versions of one or more systems, in the context of a simplified task for them to perform, and then examining how well they perform it. The version of this approach that we heard today compared several system design strategies. The difficulty with this approach to system condemnation is that condemnation of all system design strategies studied in an experiment is unlikely. The final approach, presented in the paper by Drs. Moninger, Flueck, Lusk, and Roberts, is to go ahead and design actual systems, bring them into contact with the actual tasks they are intended to perform, and run them in competition with one another. The problem with this approach to system condemnation, of course, is that the actual systems are likely to be non-comparable. For this reason, scientists using this approach will often declare ahead of time that they do not intend to use the results to condemn any of the systems under study. Fortunately, they usually don?t mean it. Omitted from this session, and from this conference, is the final and most effective form of system condemnation: condemnation in the marketplace. Many of us have experienced one version or another of this kind on condemnation. You have probably detected that my advocacy of system condemnation as a goal and my approval of studies having that effect is only partly facetious. The reason, of course, is that unless we, the system design community, are effective at system condemnation, we are asking the marketplace to do that task for us. It will--but system condemnation in the marketplace is a blunt, undiscriminating process, easily able to kill good ideas for bad reasons. If a system will be condemned in the marketplace, I believe we should try to prevent it from getting there. I would like to examine a bit more carefully the four approaches to system condemnation that we have heard about at this meeting, as represented in this session?s papers. The religious approach starts from first principles. Its Bayesian form, for example, says ?More than 300 years of scientific experience combine with compelling formal arguments to establish probability as the appropriate rule for making decisions under uncertainty. Alternative views are heretical and should be condemned.? Dr. Lehner examined this argument and disagreed with it because he could invent a scenario in which the appropriate inferential behavior consisted of ignoring some relevant information. As a devout Bayesian, I did not find this argument persuasive, because it evaluated inferences rather than decisions. I do not know how to think about condemning systems intended to produce only inferences. Every system that I have ever been interested in has decisions as outputs. As I thought about how to design a decision making system in the context of Dr. Lehner?s two reliable information sources having differing sensitivities, I did not find the problem hard. After all, this is a standard situation in interpretation of data from technical sensors. Lack of knowledge about information input overlap between the two sensors invites studying the correlation among outputs. Such study will quickly permit correction of an incorrect prior assumption of conditional independence. Similarly, I would be reluctant to commit myself to a mode of using multiple experts that tries to represent only what they believe in common; this goal seems to give up the intuition that different experts know different things. Again, this issue becomes poignant when one thinks of the output of the system as a decision rather than a diagnosis. I guess my religious quibble with Dr. Lehner has to do with the goal of global reliability--an extreme form of maximization of expected scores obtained from a proper scoring rule. If the decision context is one in which that goal makes sense, fine. If it is not (and I think Dr. Lehner has been able to construct instances in which that goal may not make sense), then why not junk it? Next I turn to the ingenious experiment by Drs. Vaughan, Perrin, Yadrick, and Holden. They taught their subjects a fairly straightforward task and then asked them to design an expert system to perform it using six approaches to system design. The finding was that EMYCIN and PROSPECTOR approaches were less effective at dealing with inconsistent inputs to a conjunctive rule than were other, more structured approaches. I would have liked to know whether the same finding would have emerged for a disjunctive combination rule--but one characteristic of every successful experiment is that one would like to look at data from conditions other than those used by the experimenters. Is this a good approach to system condemnation? I think so, and I speculate that some may disagree. The argument against this approach, of course, is that it is extremely special to the particular circumstances of the experiment. That is inevitably true. But, as in this case, experimenters are often quite clever about picking highly diagnostic tasks on which to base experiments on human or system behavior. Those who feel that EMYCIN and PROSPECTOR got a bum rap can design other experiments to show their virtues. While this death-of-a-thousand-cuts approach to system condemnation is very tedious compared to the sweeping condemnations made possible by the religious approach, it has the enormous advantage of piling up knowledge as it goes. For example, Dr. Vaughan and his colleagues report that each approach was capable of leading to a highly accurate model. An obvious question is: why do some subjects within an approach do better than others, and how can one get all to do well? Subsequent experiments will surely look into this question, and thus will lead to more precise definitions of the system design approaches under study. Meanwhile, the data should give some pause to those who want to use EMYCIN shells in contexts that involve conjunctive cues. Next I turn to Dr. Lehner?s second paper, written in collaboration with Drs. Mullin and Cohen, in which the authors show that a decision maker should either routinely accept or else routinely ignore advice from a fallible decision aid more competent than the decision maker. The assumptions needed to establish this principle are rather stringent; in particular, the conclusion does not seem to apply if the decision aid can offer evaluative information about the quality of its recommendations. Most decision aids, I think, know enough to provide exactly such information (e. g. measures of uncertainty or of utility difference between a recommended action and its next best competitor). I therefore prefer to interpret this argument as meaning that such information should be a part of the aid?s output to the decision maker. I am agreeing with the paper?s conclusion that it is important that the user of the aid be able to identify instances in which the advice given by the aid is likely to be incorrect. Finally I turn to the Weather Bureau Shootout project. Eight AI programs, most of them rule-based, are compared in a real context in which ground-truth information is available. I very much admire this study, and am optimistic about its ability to serve the goal of system condemnation. The difficulty with studies of this kind, including this one, is that the different systems have different inputs and different outputs, to some degree comparable but not by any means identical. The authors say ?We have agreed that our analyses will not attempt to declare any system an overall winner.? While there will be no overall winner, there may well be overall losers. In particular, the adventurousness of including a model based only on use of multiple regression techniques for policy capturing in competition with elaborate role-based systems may pay off. The study may provide some basis for concluding either that rule-based systems in this application domain are unnecessarily ponderous or that the simplicity of policy capturing as an approach to expert system design is bought at a price in system performance. That kind of conclusion, it seems to me, is exactly what the most effective kinds of system condemnations should be based on. It can easily lead to broad generalizations of the kind that lie behind the NSF panel approach to system condemnation. Such generalizations, when based on good evidence, are as near to system wisdom as we are likely to get; I consider a priori system condemnations based on such evidence as entirely appropriate, as well as common. I am holding my breath in eager anticipation of this particular verdict.
PDF Link: /papers/89/p423-edwards.pdf
AUTHOR = "Ward Edwards
TITLE = "Comparing Approaches to Uncertain Reasoning: Discussion System Condemnation Pays Off",
BOOKTITLE = "Uncertainty in Artificial Intelligence 5 Annual Conference on Uncertainty in Artificial Intelligence (UAI-89)",
PUBLISHER = "Elsevier Science",
ADDRESS = "Amsterdam, NL",
YEAR = "1989",
PAGES = "423--426"