The new Kahneman et al book titled NOISE has generated considerable, well, noise, as evidenced by the Amazon reviews so far which are all over the place. Also, Andrew Gelman on his Statistical Modeling, Causal Inference, & Social Science blog has taken an observation by Rachel Meagher to question the authors’ statistical chops. His comment prompted many comments (and counting) even by the standard of his blog. On Facebook, in the Behavioral Economist group, a contributor pointed out the questionable evidence that Kahneman paraded in his previous best-seller Thinking Fast and Slow (TFS) as well as Kahneman’s failure to correct the record and either retract the book or revise it, and wondered why she should waste her time on reading his new book … . Fair question that, no?
To answer my own question, yes, I think the contributor’s question is fair. As pointed out by Schimmack and colleagues, in their deconstruction of the (in)famous chapter on priming in TFS, Tversky and Kahneman (1971) wrote “we refuse to believe that a serious investigator will knowingly accept a .50 risk of failing to confirm a valid research hypothesis.” Yet, here we are (and we are not even talking about the equally questionable track record of Sunstein.)
As Kristal and her colleagues have noted recently, it is scientists’ responsibility to fess up when they are wrong and know it. Tall poppies such as Kahneman and Sunstein have a particular responsibility of getting the science right. Also, their output deserves increased scrutiny last but not least because they are so influential.
There is a certain irony about the deplorable state of affairs sketched above, as the noise that gets produced by tall poppies not getting the science right, and/or failing to correct what they know they got wrong, is very much what NOISE (the book) is about: The NoiseSquared component in the Overall Error (MSE) equation that is “the intellectual foundation of this book” (p. 66)
Contrary to the claim made repeatedly in the book, none of the material here is really all that new. As so often (e.g., in his decades of ongoing battle with Gigerenzer or here in places such as the section on simple rules starting on p. 127), Kahneman fails to give credit where surely he should. Brighton & Gigerenzer (2015), for example, have written about the Overall Error equation and its NoiseSquared summand (and provided many references to related work in marketing and finance). It is hard to believe that Kahneman has not come across this work, as “a few minutes of research would reveal [it]” (p. 167). So what to make of this omission? There are two possible explanations: Kahneman either did not know, or he did and chose to ignore this previous work. Both explanations are disappointing and demonstrate intellectual dishonesty, right up there with the failure to correct the record on which TFS is based when he had about a decade to do so. This kind of behavior injects unnecessary noise into the presumably common project science and undermines science’s self-correction mechanism. So much for noise audits and decision hygiene. (More about them later.)
So what then is it that we learn from the book? NOISE, we learn, is different from Bias, BiasSquared being the second summand of the Overall Error equation. Bias, of course, is the major theme of TFS and an alleged property of our thinking. (I say alleged because this claim remains a major bone of contention between adherents of the Heuristics & Biases program, and the “long list of psychological biases” (p. 218) it routinely enumerates, and those questioning their reality, researchers such as Vernon L. Smith, Gerd Gigerenzer, and many other believers in what can be broadly described as the Ecological Rationality program. Disclosure: That latter group includes me.)
Kahneman and his collaborators conceptualize NOISE as a statistical concept (pp. 219–20), a flaw in human judgement to be sure but something less visible since it is undesirable variance in situations where it would be very desirable to have different experts, and/or decision makers come to the same conclusion. That kind of noise typically requires “noise audits” to assess the degree of variance that exists.
Take the example of judicial decisions, a recurrent theme in NOISE. To maintain trust in a justice system, one would want it to deal with similar crimes in about the same way. In turns out judges are humans, too, and their tempers can get triggered by many things including, negatively, their local football team losing a game on the weekend, or, positively, birthdays — if not their own then that of the defendant (p. 17).
Kahneman and his colleagues introduce us early in the book to several studies of judicial systems (see also chapter 6 where system noise gets decomposed further into level noise and pattern noise) which seem to exhibit shocking amounts of noise, variance in judgements that have no rationale in the objective facts. We get introduced on pages 13–21 to Marvin Frankel who identified the problem in the early seventies and ultimately succeeded in persuading a prominent senator to introduce sentencing reform legislation which led to the US Sentencing Commission that was tasked to harmonize criminal sentences through guidelines and thus take out some of the noisiness of the punishments that judges doled out. We also learn that three years after Frankel’s death in 2002 the Supreme Court struck down the mandatory guidelines, to the applause of the overwhelming number of federal judges, reintroducing the considerable system noise with undesirable consequences for many that manifest themselves again and again. “In many situations, noise should be a scandal. People live with it, but they should not have to do that.” (p. 359) Alas, when all is said is done, some animals are more equal than others. I think it is called politics or something.
NOISE (the book) “can be understood as an effort to generalize Frankel’s arguments and to offer an understanding of their psychological foundations.” (p. 325) It is easy to see how similar problems can arise in other contexts such as insurance companies (e.g., chapter 2 about an audit of a company “that had engaged the consulting firm with which two of use were affiliated.” (p. 23)), or other matters of (professional) judgement which the authors interpret as measurements in which the instrument is the human mind (p. 39), that deeply flawed beast that they, or at least the consulting firm they are associated with, aspire to help us tame. In this sense even individual decisions, when taken repeatedly, or even when taken once, can be afflicted by noise (chapter 4). This theme is taken up in chapter 7 where occasion noise is introduced and the wisdom of crowds, in the aggregate and within ourselves, is explored.
And so it goes. There are parts of the book (e.g., the parts on various audits, presumably among the “compelling cases” that this commentator had in mind) that are entertaining although given the authors’ track records doubts linger about the quality of the studies on which they draw. I wonder in particular what Schimmack’s audit of the replicability of the studies paraded in NOISE will find. Doubts linger also because the book often feels like a zettelkasten — it touches on one topic here and another topic there — and often lacks a clear narrative, chapter 7 being a good example.
Or chapter 8, for that matter, in which the authors attempt to enlightens us on how groups amplify noise. In passing we read ruminations about how easily musical preferences can be influenced, how pervasive informational cascades are (really?), and relatedly how pervasive group polarization is.
Chapters 9–12 touch on noise in predictive judgments and we learn about the advantages of simple mechanical prediction models. This theme has been belabored for decades by the ABC and ARC centres in Berlin, and other related centres, but at this point you can probably predict how their work fares here. (Hint: Use the simplest heuristic you can find.)
Chapters 14–17 touch on how noise, and its components, happen. In passing the authors re-visit essential workhorses of the Heuristics & Biases approach (such as the Linda problem which re-appears here in the guise of the Bill problem) which has attracted controversy and would be fun if there would not be so much interesting stuff out there on it (see here and here and here and here). We also get introduced to yet another form of noise, pattern noise, and its components, with chapter 17 offering a combined graphical representation of all components of noise (p. 211) and repeating the wrong claim that “the topic is almost entirely absent from public awareness and from discussions of judgment and error.” (p. 217) And if so, then surely those that have done their fair share to propagate the bias bias ought to take the blame that they deserve.
Chapters 18–25 touch on ways how to overcome unwanted noise. Case studies in five different domains are used to illustrate the idea of “decision hygiene” (see the second part of chapter 19 for a definition and analogy), the approach the authors recommend to reduce noise in human judgments, and five strategies related to it. We learn about “respect-experts”, a term eminently useful in the current context, so warranting a quotation: “The confidence we have in these experts’ judgements [albeit possibly not verifiable] is entirely based on the respect they enjoy from their peers. … The fact that some experts are not subject to an evaluation of the accuracy of their judgments is not criticism; it is a fact of life in many domains. Many professors, scholars, and management consultants are respect-experts.” (p. 226) Which is good because, presumably, it keeps consulting businesses going.
We also learn that respect-experts have the ability to make and explain their judgments with confidence although they often do not agree with each other. (p. 228) We learn that guidelines are a good thing and that in particular the medical profession needs more of them. (p. 286) Which poses the interesting question why they do not have them and reminds us of the fate of the sentencing guidelines discussed earlier.
And so it goes. Most of these chapters, too, have a definite zettelkasten feel to them, with many interesting stories and results thrown in the mix. Structured interviews are preferable to informal interviews, so are independent assessments. Mediating assessments protocol are highly recommended, especially in recurring decisions.
All that said, eliminating noise is not always possible, or even desirable, which motivates the idea of “optimal noise” discussed in the last part (chapters 26–28). There are costs of trying to reduce noise and maybe these costs are too high in some circumstances. When? Well, that depends. (Best to hire a consulting firm.)
Hence, in their review and conclusion starting on p. 361 the admonition that NOISE ought to be taken seriously because there is too much of unwanted variability of judgments. How much NOISE can be tolerated? The authors do not present on a silver platter an algorithm for the determination of the optimal level of noise. And truth be told that’d be asking for much. Helpfully, they offer in Appendix A a guide on “How to Conduct a Noise Audit” and in Appendix B “A Checklist for a Decision Observer”.
In sum then, should you read the book? Well, I did (and I actually finished it, contrary to TFS which I quit reading about one quarter in if my memory, that fickle beast, serves me well.) NOISE would have been a better book had the authors done their homework and acknowledge the literature that is out there on NOISE. It seems to me that the book is way too long for its relative well-known, at least to those who have worked on noise for a long time, message. A shorter book would have been better.