Cover image for Standard deviations : flawed assumptions, tortured data, and other ways to lie with statistics
Standard deviations : flawed assumptions, tortured data, and other ways to lie with statistics
Smith, Gary, 1945- , author.
First edition.
Publication Information:
New York : Overlook Duckworth, [2014]
Physical Description:
326 pages : illustrations ; 24 cm
"Did you know that baseball players whose names begin with the letter "D" are more likely to die young? Or that Asian Americans are most susceptible to heart attacks on the fourth day of the month? Or that drinking a full pot of coffee every morning will add years to your life, but one cup a day increases the risk of pancreatic cancer? All of these "facts" have been argued with a straight face by credentialed researchers and backed up with reams of data and convincing statistics. As Nobel Prize-winning economist Ronald Coase once cynically observed, "If you torture data long enough, it will confess." Lying with statistics is a time-honored con. In Standard Deviations, economics professor Gary Smith walks us through the various tricks and traps that people use to back up their own crackpot theories. Sometimes, the unscrupulous deliberately try to mislead us. Other times, the well-intentioned are blissfully unaware of the mischief they are committing. Today, data is so plentiful that researchers spend precious little time distinguishing between good, meaningful indicators and total rubbish. Not only do others use data to fool us, we fool ourselves. With the breakout success of Nate Silver's The Signal and the Noise, the once humdrum subject of statistics has never been hotter. Drawing on breakthrough research in behavioral economics by luminaries like Daniel Kahneman and Dan Ariely and taking to task some of the conclusions of Freakonomics author Steven D. Levitt, Standard Deviations demystifies the science behind statistics and makes it easy to spot the fraud all around"--
Patterns, patterns, patterns -- Garbage in, gospel out -- Apples and prunes -- Oops! -- Graphical gaffes -- Common nonsense -- Confound it! -- When you're hot, you're not -- Regression -- Even Steven -- Texas sharpshooter -- Ultimate procrastination -- Serious omissions -- Flimsy theories and rotten data -- Don't confuse me with facts -- Data without theory -- Betting the bank -- Theory without data -- When to be persuaded and when to be skeptical.
Format :


Call Number
Material Type
Home Location
Item Holds
QA279 .S638 2014 Adult Non-Fiction Open Shelf
QA279 .S638 2014 Adult Non-Fiction Non-Fiction Area
QA279 .S638 2014 Adult Non-Fiction Open Shelf
QA279 .S638 2014 Adult Non-Fiction Open Shelf

On Order



Or that Asian Americans are most susceptible to heart attacks on the fourth day of the month? Or that drinking a full pot of coffee every morning will add years to your life, but one cup a day increases the risk of pancreatic cancer? All of these "facts" have been argued with a straight face by credentialed researchers and backed up with reams of data and convincing statistics.As Nobel Prize-winning economist Ronald Coase once cynically observed, "If you torture data long enough, it will confess." Lying with statistics is a time-honored con. In Standard Deviations, economics professor Gary Smith walks us through the various tricks and traps that people use to back up their own crackpot theories. Sometimes, the unscrupulous deliberately try to mislead us. Other times, the well-intentioned are blissfully unaware of the mischief they are committing. Today, data is so plentiful that researchers spend precious little time distinguishing between good, meaningful indicators and total rubbish. Not only do others use data to fool us, we fool ourselves.With the breakout success of Nate Silver's The Signal and the Noise, the once humdrum subject of statistics has never been hotter. Drawing on breakthrough research in behavioral economics by luminaries like Daniel Kahneman and Dan Ariely and taking to task some of the conclusions of Freakonomics author Steven D. Levitt, Standard Deviations demystifies the science behind statistics and makes it easy to spot the fraud all around.

Author Notes

Gary Smith is the Fletcher Jones Professor of Economics at Pomona College in Claremont, California. He received his Ph.D. in Economics from Yale University and taught there as Assistant Professor for seven years. He has won two teaching awards and authored more than seventy academic papers, nine textbooks, and seven educational software programs. This is his first trade book.

Reviews 2

Library Journal Review

In this age of "big data," it is not unusual to read about findings that seem startling, counterintuitive, and at odds with what was previously "known" to be true. The average person, when confronted with such claims, is frequently left not knowing what to believe. Smith (economics, Pomona Coll., CA), the author of numerous academic papers and textbooks, offers a guide to the perplexed in his first book for a general audience. Using examples from various fields, he demonstrates how statistics and probabilistic reasoning have been misused through ignorance, wishful thinking, and outright fraud to arrive at strange, and, therefore, publishable conclusions. More important, the author posits how major decisions that have affected many lives have been based on these fallacies. Of particular note is the effect on international economic policy of the controversial Reinhart-Rogoff tipping point theory, which Smith scrutinizes. He is palpably irritated at the large-scale uncritical media coverage of researchers and practitioners in fields as diverse as sports, medicine, and economics, as they hoodwink the public. VERDICT This well-written and convincing book will make readers think twice before accepting uncritically claims based on statistical arguments.-Harold D. Shane, -Professor of Mathematics Emeritus, Baruch Coll. Lib., CUNY (c) Copyright 2014. Library Journals LLC, a wholly owned subsidiary of Media Source, Inc. No redistribution permitted.

Choice Review

Statistics are indispensable in the era of big data, but the pitfalls of using statistics are unavoidable. The misuse of statistics is frequently observed and causes many issues in statistical applications. In this book, Smith (economics, Pomona College) outlines, in an understandable manner, the importance of being aware of the abuse of statistics, ranging from a simple mistake to a deliberate falsification. The author extensively collected well-known, substantial examples from multiple disciplines from economics to medicine. He addresses the pitfalls of flawed, ignorant cases on data usage and statistical assumptions and also demystifies plausible stories that would hoax general readers with limited statistical knowledge. The specific topics discussed include the flawed design of experiments, data collection with selection bias and omission, graphical distortion, misunderstanding of probabilities, confusion between correlation and causation, ignorance of confounding factors, regression to mean effect, and exaggeration of a random phenomenon. Smith provides a considerable number of references related to the examples. This book can be adopted as a reference book for case studies in statistics, data science, and research ethics courses. Summing Up: Recommended. Lower-division undergraduates through graduate students; general readers. --Seong-Tae Kim, North Carolina A&T State University



DID YOU KNOW that building a quarry in your backyard can increase your property value? Or that America's unemployment rate is actually zero? Or that drinking a full pot of coffee every morning adds years to your life, but drinking two cups a day increases the risk of cancer? If you believe the above, then Professor Gary Smith has a World Cup-predicting octopus he'd like to show you. The sad truth is that "facts" like these are routinely presented with a straight face by credentialed academics and backed up with reams of raw data. In Standard Deviations, Smith skillfully unpacks the various ways we are duped by data every day. He deftly demonstrates how a straightforward set of findings can be teased and manipulated to reflect whatever the researcher wants to see. Lying with statistics is a time-honored con, and in this age of Big Data even the most accredited findings can be suspect. Blending the keen statistical eye of Nate Silver with the probing insights of Daniel Kahneman and Dan Ariely, Smith demystifies the math behind the dismal science, making it easy to spot flaws all around and find the truth hidden in plain sight. MORE ADVANCE PRAISE FOR STANDARD DEVIATIONS "Gary Smith's Standard Deviations is both a statement of principles for doing statistical inference correctly and a practical guide for interpreting the (supposedly) data-based inferences other people have drawn. Cleverly written and engaging to read, the book is full of concrete examples that make clear not just what Smith is saying but why it matters. Readers will discover that lots of what they thought they'd learned is wrong, and they'll understand why." --BENJAMIN M. FRIEDMAN William Joseph Maier Professor of Political Economy, Harvard University " Standard Deviations shows in compelling fashion why humans are so susceptible to the misuse of statistical evidence and why this matters. I know of no other book that explains important concepts such as selection bias in such an entertaining and memorable manner." --RICHARD J. MURNANE Thompson Professor of Education and Society, Harvard Graduate School of Education "We all learn in school that there are three kinds of lies: lies, damn lies, and statistics. Gary Smith's new book imparts true substance to this point by setting forth myriad examples of how and why statistics and data-crunching at large are susceptible to corruption. The great risk today is that the young will forget that deductive logic is vastly more powerful than inductive logic." --HORACE "WOODY" BROCK President, Strategic Economic Decisions, Inc. "Statistical reasoning is the most used and abused form of rhetoric in the field of finance. Standard Deviations is an approachable and effective means to arm oneself against the onslaught statistical hyperbole in our modern age. Professor Smith has done us all a tremendous service." --BRYAN WHITE Managing Director, BlackRock, Inc. "It's entertaining, it's gossipy, it's insightful--and it's destined to be a classic. Based on a lifetime of experience unraveling the methodical blunders that remain all too frequent, this book communicates Gary Smith's wisdom about how not to do a data analysis. Smith's engaging rendering of countless painful mistakes will help readers avoid the pitfalls far better than merely mastering theorems." --EDWARD E. LEAMER Distinguished Professor and Chauncey J. Medberry Chair in Management, UCLA " Standard Deviations will teach you how not to be deceived by lies masquerading as statistics. Written in an entertaining style with contemporary examples, this book should appeal to everyone, whether interested in marriages or mortgages, the wealth of your family, or the health of the economy. This should be required reading for everyone living in this age of (too much?) information." --ARTHUR BENJAMIN Professor of Mathematics, Harvey Mudd College and author of Secrets of Mental Math "One of those rare books that make people better for having read it." --JAY CORDES Senior Manager, "Most of the authoritative, sciencey-sounding claims we're fed by the media are polluted by distortions, biases, and plain old errors. In Standard Deviations , Gary Smith sets the record straight." --DAVID H. FREEDMAN Author of Wrong: Why Experts Keep Failing Us--and How to Know When Not to Trust Them Copyright To my wife Margaret and my children Josh, Jo, Chaska, Cory, Cameron, and Claire INTRODUCTION WE LIVE IN THE AGE OF BIG DATA. THE POTENT COMBINATION of fast computers and worldwide connectivity is continually praised--even worshipped. Over and over, we are told that government, business, finance, medicine, law, and our daily lives are being revolutionized by a newfound ability to sift through reams of data and discover the truth. We can make wise decisions because powerful computers have looked at the data and seen the light. Maybe. Or maybe not. Sometimes these omnipresent data and magnificent computers lead to some pretty outlandish discoveries. Case in point, serious people have seriously claimed that: • Messy rooms make people racist. • Unborn chicken embryos can influence computer random-event generators. • When the ratio of government debt to GDP goes above 90 percent, nations nearly always slip into recession. • As much as 50 percent of the drop in the crime rate in the United States over the past twenty years is because of legalized abortion. • Drinking two cups of coffee a day substantially increases the risk of pancreatic cancer. • The most successful companies tend to become less successful, while the least successful companies tend to become more successful, so that soon all will be mediocre. • Athletes who appear on the cover of Sports Illustrated or Madden NFL are jinxed in that they are likely to be less successful or injured. • Living near power lines causes cancer in children. • Humans have the power to postpone death until after important ceremonial occasions. • Asian Americans are more susceptible to heart attacks on the fourth day of the month. • People live three to five years longer if they have positive initials, like ACE. • Baseball players whose first names began with the letter D die, on average, two years younger than players whose first names began with the letters E through Z. • The terminally ill can be cured by positive mental energy sent from thousands of miles away. • When an NFC team wins the Super Bowl, the stock market almost always goes up. • You can beat the stock market by buying the Dow Jones stock with the highest dividend yield and the second lowest price per share. These claims--and hundreds more like them--appear in newspapers and magazines every day even though they are surely false. In today's Information Age, our beliefs and actions are guided by torrents of meaningless data. It is not hard to see why we repeatedly draw false inferences and make bad decisions. Even if we are reasonably well informed, we are not always alert to the ways in which data are biased or irrelevant, or to the ways in which scientific research is flawed or misleading. We tend to assume that computers are infallible--that no matter what kind of garbage we put in, computers will spit out gospel. It happens not just to laymen in their daily lives, but in serious research by diligent professionals. We see it in the popular press, on television, on the Internet, in political campaigns, in academic journals, in business meetings, in courtrooms, and, of course, in government hearings. Decades ago, when data were scarce and computers nonexistent, researchers worked hard to gather good data and thought carefully before spending hours, even days, on painstaking calculations. Now with data so plentiful, researchers often spend too little time distinguishing between good data and rubbish, between sound analysis and junk science. And, worst of all, we are too quick to assume that churning through mountains of data can't ever go wrong. We rush to make decisions based on the balderdash these machines dish out--to increase taxes in the midst of a recession, to trust our life savings to financial quants who impress us because we don't understand them, to base business decisions on the latest management fad, to endanger our health with medical quackery, and--worst of all--to give up coffee. Ronald Coase cynically observed that, "If you torture the data long enough, it will confess." Standard Deviations is an exploration of dozens of examples of tortuous assertions that, with even a moment's reflection, don't pass the smell test. Sometimes, the unscrupulous deliberately try to mislead us. Other times, the well-intentioned are blissfully unaware of the mischief they are committing. My intention in writing this book is to help protect us from errors--both external and self-inflicted. You will learn simple guidelines for recognizing bull when you see it--or say it. Not only do others use data to fool us, we often fool ourselves. 1 PATTERNS, PATTERNS, PATTERNS YOUTH SOCCER IS A VERY BIG DEAL WHERE I LIVE IN SOUTHERN California. It's a fun, inexpensive sport that can be played by boys and girls of all sizes and shapes. I initially didn't know anything about soccer. All I knew was that, every weekend, the city parks and school grounds were filled with kids in brightly colored uniforms chasing soccer balls while their parents cheered. When my son was old enough, we were in. By the time the 2010 World Cup came around, my son was playing on one of the top soccer teams in Southern California. I was the manager and a fanatic about soccer, so naturally he and I watched every World Cup match we could. The opponents in the 2010 championship game were Netherlands and Spain, two extraordinarily talented teams from underachieving nations that often disappointed their supporters. Which country would finally win the World Cup? I loved the Dutch, who had won all six of their World Cup games, scoring twelve goals while allowing only five, and had knocked out the mighty Brazil and Uruguay. But then I heard about Paul the octopus, who had correctly predicted the winners of seven World Cup games by choosing food from plastic boxes with the nations' flags on them. Paul the Oracle had picked Spain, and the world now seemed certain of a Spanish victory. What the heck was going on? How could a slimy, pea-brained invertebrate know more about soccer than I did? I laughed and waited for Paul the Omniscient to get his comeuppance. Except he didn't. The Dutch did not play with their usual creativity and flair. In a brutal, cynical match, with fourteen yellow cards--nine given to the dirty Dutchmen--Spain scored the winning goal with four minutes left in the game. How could an octopus living in a tank have predicted any of this? Had Paul ever seen a soccer game? Did Paul even have a brain? It turns out that octopuses are among the most intelligent invertebrates, but that isn't saying much--sort of like being the world's tallest midget. Still, Paul made eight World Cup predictions and got every single one right. Not only that, Paul made six predictions during the 2008 European Football Championships and got four right. Overall, that's twelve out of fourteen correct, which in the eyes of many would be considered statistical proof of Paul's psychic abilities. But were there really enough data? If a fair coin is flipped fourteen times, the chances of twelve or more heads are less than one percent. In the same way, if Paul were just a hapless guesser with a 50 percent chance of making a correct prediction, the probability that he would make so many correct predictions is less than 1 percent, a probability so low that it is considered "statistically significant." The chances of Paul being correct so many times are so small that, logically, we can rule out luck as an explanation. With his consistency, Paul had demonstrated that he was not merely a lucky guesser. He was truly Paul the Psychic Octopus! And yet, something didn't seem quite right. Is it really possible for an octopus to predict the future? Paul's performance raises several issues that are endemic in statistical studies. Paul was not a psychic (surprise, surprise), but he is a warning of things to watch out for the next time you hear some fanciful claim. CONFOUNDING EFFECTS First, let's look at how Paul made his predictions. At feeding time, he was shown two clear plastic boxes with the national flags of the opposing teams glued to the front of the boxes. The boxes contained identical yummy treats, such as a mussel or an oyster. Whichever box Paul opened first was the predicted winner. Octopuses don't know much about soccer, but they do have excellent eyesight and good memories. One time, an octopus at the New England Aquarium decided he didn't like a volunteer and shot salt water at her whenever he saw her. She left the aquarium to go to college, but when she returned months later, the octopus remembered her and immediately drenched her with salt water again. In an experiment at a Seattle aquarium, one volunteer fed the octopuses while another wearing identical clothes irritated the octopuses with a stick. After a week of this, most of the octopuses could tell who was who. When they saw the good person, they moved closer; when they saw the bad person, they moved away (and sometimes shot water at him for good measure). Paul the Psychic Octopus happened to be living in an aquarium in Germany and, except for the Spain-Netherlands World Cup final, Paul only predicted games involving Germany. In eleven of the thirteen games involving Germany, Paul picked Germany--and Germany won nine of these eleven games. Was Paul picking Germany because he had analyzed their opponents carefully or because he had an affinity for the German flag? Paul was almost certainly color blind, but experiments have shown that octopuses recognize brightness and are attracted to horizontal shapes. Germany's flag has three vivid horizontal stripes, as do the flags of Serbia and Spain, the only other countries Paul selected. Indeed, the Spanish and German flags are pretty similar, which may explain why Paul picked Spain over Germany in one of the two matches they played and picked Spain over the Netherlands in the World Cup final. The only game in which Paul did not choose the German or Spanish flag was a match between Serbia and Germany. The flag was apparently a confounding factor in that Paul wasn't picking the best soccer team. He was choosing his favorite flag. Paul the Omniscient was just a pea-brained octopus after all. Figure 1.1: Paul's Favorite Flags Germany (eleven times) Spain (twice) Serbia (once) SELECTIVE REPORTING AND MISREPORTING Another explanation for Paul's success is that too many people with too much time on their hands try stupid pet tricks, using animals to predict sports, lottery, and stock market winners. Some will inevitably succeed, just like among thousands of people flipping coins, some people will inevitably flip heads ten times in a row. Who do you think gets reported, the octopus who picked winners or the ostrich who didn't? Several years ago, a sports columnist for The Dallas Morning News had a particularly bad week picking the winners of National Football League (NFL) football games--he got one right and twelve wrong, with one tie. He wrote that, "Theoretically, a baboon at the Dallas Zoo can look at a schedule of 14 NFL games, point to one team for each game and come out with at least seven winners." The next week, Kanda the Great, a gorilla at the Dallas Zoo, made his predictions by selecting pieces of paper from his trainer. Kanda got nine right and four wrong, better than all six Morning News sportswriters. The media descended on the story like hungry wolves, but would Kanda's performance have been reported if he had gotten, say, six right and seven wrong? Not to be outdone, officials at the Minnesota Zoo in Apple Valley, Minnesota, reported that a dolphin named Mindy successfully predicted the outcomes of NFL games by choosing among pieces of Plexiglas, each bearing a different team's name. The opponents' Plexiglas sheets were dropped into Mindy's pool and the one she brought back to her handler was considered to be her "prediction." The handlers reported that Mindy had gotten thirty-two of fifty-three games correct. If so, that's 60 percent, enough to make a profit betting on football games. How many other birds, bees, and beasts tried and failed to predict NFL games and went unreported because they failed? We don't know, and that's precisely the point. If hundreds of pets are forced to make pointless predictions, we will be misled by the successful ones that get reported because we don't take into account the hundreds of unsuccessful pets that were not reported. This doesn't just happen in football games. A Minneapolis stock broker once boasted that he selected stocks by spreading The Wall Street Journal on the floor and buying the stock touched by the first nail on the right paw of his golden retriever. The fact that he thought this would attract investors says something about him--and perhaps his customers. Another factor is that people seeking fifteen minutes of fame are tempted to fudge the data to attract attention. Was there an impartial observer monitoring the Minneapolis stockbroker and his dog each morning? Back when bridge was the most popular card game in America, a mathematically inclined bridge player estimated that far too many people were reporting to their local paper that they had been dealt a hand with thirteen cards of the same suit. Given the chances of being dealt such a hand, there were not nearly enough games being played to yield so many wacky hands. Tellingly, the suit reported was usually spades. People were evidently embellishing their experiences in order to get their names in the paper. After Paul the octopus received worldwide attention, a previously obscure Singapore fortune teller reported that his assistant, Mani the parakeet, had correctly predicted all four winners of the World Cup quarterfinal matches. Mani was given worldwide publicity, and then predicted that Uruguay would beat Netherlands and that Spain would beat Germany in the semifinals, with Spain defeating Uruguay in the championship game. After Netherlands defeated Uruguay, Mani changed his finals prediction, choosing Netherlands, which turned out to be incorrect. Nonetheless, the number of customers visiting this fortune teller's shop increased from ten a day to ten an hour--which makes you wonder whether the owner's motives were purely sporting and whether his initial reports of Mani's quarterfinal predictions were accurate. Why did Paul and Mani become celebrities who were taken seriously by soccer fans celebrating and cursing their predictions? Why didn't they stay unnoticed in the obscurity they deserved? It's not them, it's us. HARDWIRED TO BE DECEIVED More than a century ago, Sherlock Holmes pleaded to his long-suffering friend Watson, "Data! Data! Data! I can't make bricks without clay." Today, Holmes's wish has been granted in spades. Powerful computers sift through data, data, and more data. The problem is not that we don't have enough data, but that we are misled by what we have in front of us. It is not entirely our fault. You can blame it on our ancestors. The evolution of certain traits is relatively simple. Living things with inheritable traits that help them survive and reproduce are more likely to pass these traits on to future generations than are otherwise similar beings that do not have these traits. Continued generation after generation, these valuable inherited traits become dominant. The well-known history of the peppered moth is a simple, straightforward example. These moths are generally light-colored and spend most of their days on trees where they are camouflaged from the birds that prey on them. The first dark-colored peppered moths were reported in England in 1848, and by 1895, 98 percent of the peppered moths in Manchester were dark-colored. In the 1950s, the pendulum started swinging back. Dark-colored moths are now so rare that they may soon be extinct. The evolutionary explanation is that the rise of dark-colored moths coincided with the pollution caused by the Industrial Revolution. The blackening of trees from soot and smog gave dark-colored moths the advantage of being better camouflaged and less likely to be noticed by predators. Because dark-colored moths were more likely to survive long enough to reproduce, they came to dominate the gene pool. England's clean-air laws reversed the situation, as light-colored moths are camouflaged better on pollution-free trees. Their survival advantage now allows them to flourish. Other examples of natural selection are more subtle. For example, studies have consistently found that men and women are more attracted to people with symmetrical faces and bodies. This isn't just cultural--it is true across different societies, true of babies, and even found in other animals. In one experiment, researchers clipped the tail feathers of some male barn swallows to make them asymmetrical. Other males kept their symmetrical tail feathers. When female swallows were let loose in this mating pool, they favored the males with symmetrical feathers. This preference for symmetry is not just a superficial behavior. Symmetry evidently indicates an absence of genetic defects that might hamper a potential mate's strength, health, and fertility. Those who prefer symmetry eventually dominate the gene pool because those who don't are less likely to have offspring that are strong, healthy, and fertile. Believe it or not, evolution is also the reason why many people took Paul and Mani seriously. Our ingrained preference for symmetry is an example of how recognizing patterns helped our human ancestors survive and reproduce in an unforgiving world. Dark clouds often bring rain. A sound in the brush may be a predator. Hair quality is a sign of fertility. Those distant ancestors who recognized patterns that helped them find food and water, warned them of danger, and attracted them to fertile mates passed this aptitude on to future generations. Those who were less adept at recognizing patterns that would help them survive and reproduce had less chance of passing on their genes. Through countless generations of natural selection, we have become hardwired to look for patterns and to think of explanations for the patterns we find. Storm clouds bring rain. Predators make noise. Fertile adults have nice hair. Unfortunately, the pattern-recognition skills that were valuable for our long-ago ancestors are ill-suited for our modern lives, where the data we encounter are complex and not easily interpreted. Our inherited desire to explain what we see fuels two kinds of cognitive errors. First, we are too easily seduced by patterns and by the theories that explain them. Second, we latch onto data that support our theories and discount contradicting evidence. We believe stories simply because they are consistent with the patterns we observe and, once we have a story, we are reluctant to let it go. When you keep rolling sevens at the craps table, you believe you are on a hot streak because you want to keep winning. When you keep throwing snake eyes, you believe you are due for a win because you want to start winning. We don't think hard enough about the fact that dice do not remember the past and do not care about the future. They are inanimate; the only meaning they carry is what we hopeful humans ascribe to them. If the hot streak continues or the cold streak ends, we are even more convinced that our fanciful theory is correct. If it doesn't, we invent excuses so that we can cling to our nonsensical story. We see the same behavior when athletes wear unwashed lucky socks, when investors buy hot stocks, or when people throw good money after bad, confident that things must take a turn for the better. We yearn to make an uncertain world more certain, to gain control over things that we do not control, to predict the unpredictable. If we did well wearing these socks, then it must be that these socks help us do well. If other people made money buying this stock, then we can make money buying this stock. If we have had bad luck, our luck has to change, right? Order is more comforting than chaos. These cognitive errors make us susceptible to all sorts of statistical deceptions. We are too quick to assume that meaningless patterns are meaningful when they are presented as evidence of the consequences of a government policy, the power of a marketing plan, the success of an investment strategy, or the benefits of a food supplement. Our vulnerability comes from a deep desire to make sense of the world, and it's notoriously hard to shake off. PUBLISH OR PERISH Even highly educated and presumably dispassionate scientists are susceptible to being seduced by patterns. In the cutthroat world of academic research, brilliant and competitive scientists perpetually seek fame and funding to sustain their careers. This necessary support, in turn, depends on the publication of interesting results in peer-reviewed journals. "Publish or perish" is a brutal fact of university life. Sometimes, the pressure is so intense that researchers will even lie and cheat to advance their careers. Needing publishable results to survive, frustrated that their results are not turning out the way they want, and fearful that others will publish similar results first, researchers sometimes take the shortcut of manufacturing data. After all, if you are certain that your theory is true, what harm is there in making up data to prove it? One serious example of this kind of deception is the vaccine scare created by the British doctor Andrew Wakefield. His 1998 coauthored paper in the prestigious British medical journal The Lancet claimed that twelve normal children had become autistic after being given the measles, mumps, and rubella (MMR) vaccine. Even before the paper was published, Wakefield held a press conference announcing his findings and calling for the suspension of the MMR vaccine. Many parents saw the news reports and thought twice about what was previously a de rigeur procedure. The possibility of making their children autistic seemed more worrisome than the minute chances of contracting diseases that had been virtually eradicated from Britain. More than a million parents refused to allow their children to be given the MMR vaccine. I live in the United States, but my wife and I read the news stories and we worried, too. We had sons born in 1998, 2000, and 2003, and a daughter born in 2006, so we had to make a decision about their vaccinations. We did our homework and talked to doctors, all of whom were skeptical of Wakefield's study. They pointed out that there is no evidence that autism has become more commonplace, only that the definition of autism has broadened in recent years and that doctors and parents have become more aware of its symptoms. On the other hand, measles, mumps, and rubella are highly contagious diseases that had been effectively eliminated in many countries precisely because of routine immunization programs. Leaving our children unvaccinated would not only put them but other children at risk as well. In addition, the fact that this study was so small (only twelve children) and the author seemed so eager for publicity were big red flags. In the end, we decided to give our children the MMR vaccine. The doctors we talked to weren't the only skeptics. Several attempts to replicate Wakefield's findings found no relationship at all between autism and the MMR vaccine. Even worse, a 2004 investigation by a London Sunday Times reporter named Brian Deer uncovered some suspicious irregularities in the study. It seemed that Wakefield's research had been funded by a group of lawyers envisioning lucrative personal-injury lawsuits against doctors and pharmaceutical companies. Even more alarmingly, Wakefield himself was evidently planning to market an alternative vaccine that he could claim as safe. Were Wakefield's conclusions tainted by these conflicts of interest? Wakefield claimed no wrongdoing, but Deer kept digging. What he found was even more damning: the data in Wakefield's paper did not match the official National Health Service medical records. Of the nine children who Wakefield reported to have regressive autism, only one had actually been diagnosed as such, and three had no autism at all. Wakefield reported that the twelve children were "previously normal" before the MMR vaccine, but five of them had documented developmental problems. Most of Wakefield's coauthors quickly disassociated themselves from the paper. The Lancet retracted the article in 2010, with an editorial comment: "It was utterly clear, without any ambiguity at all, that the statements in the paper were utterly false." The British Medical Journal called the Wakefield study "an elaborate fraud," and the UK General Medical Council barred Wakefield from practicing medicine in the UK. Unfortunately, the damage was done. Hundreds of unvaccinated children have died from measles, mumps, and rubella to date, and thousands more are at risk. In 2011, Deer received a British Press Award, commending his investigation of Wakefield as a "tremendous righting of a wrong." We can only hope that the debunking of Wakefield will receive as much press coverage as his false alarms, and that parents will once again allow their children to be vaccinated. Vaccines--by definition, the injection of pathogens into the body--are a logical fear, particularly when they relate to our children's safety. But what about the illogical? Can manufactured data persuade us believe the patently absurd? Diederik Stapel, an extraordinarily productive and successful Dutch social psychologist, was known for being very thorough and conscientious in designing surveys, often with graduate students or colleagues. Oddly enough for a senior researcher, he administered the surveys himself, presumably to schools that he alone had access to. Another oddity was that Stapel would often learn of a colleague's research interest and claim that he had already collected the data the colleague needed; Stapel supplied the data in return for being listed as a coauthor. Stapel was the author or coauthor on hundreds of papers and received a Career Trajectory Award from the Society of Experimental Social Psychology in 2009. He became dean of the Tilburg School of Social and Behavioral Sciences in 2010. Many of Stapel's papers were provocative but plausible. Others pushed the boundaries of plausibility. In one paper, he claimed that messy rooms make people racist. In another, he reported that eating meat--indeed, simply thinking about eating meat--makes people selfish. (No, I am not making this up!) Some of Stapel's graduate students were skeptical of how strongly the data supported his half-baked theories and frustrated by Stepel's refusal to show them the actual survey data. They reported their suspicions to the chair of the psychology department, and Stapel soon confessed that many of his survey results were either manipulated or completely fabricated. He explained that, "I wanted too much too fast." Stapel was suspended and then fired by Tilburg University in 2011. In 2013, Stapel gave up his PhD and retracted more than 50 papers in which he had falsified data. He also agreed to do 120 hours of community service and forfeit benefits worth 18 months' salary. In return, Dutch prosecutors agreed not to pursue criminal charges against him for the misuse of public research funds, reasoning that the government grants had been used mainly to pay the salaries of graduate students who did nothing wrong. Meanwhile, the rest of us can feel a little less guilty about eating meat and having messy rooms. Another example of falsified data involved tests for extrasensory perception (ESP). Early ESP experiments used a pack of cards designed by Duke psychologist Karl Zener. The twenty-five card pack features five symbols: circle, cross, wavy lines, square, or star. After the cards are shuffled, the "sender" looks at the cards one by one and the "receiver" guesses the symbols. Figure 1.2: The five Zener cards Some skeptics suggested that receivers could obtain high scores by peeking at the cards or by detecting subtle clues from the sender's behavior, such as a quick glance, a smile, or a raised eyebrow. Walter J. Levy, the director of the Institute for Parapsychology established by ESP pioneer J. B. Rhine, tried to defuse such criticism by conducting experiments involving computers and nonhuman subjects. In one experiment, eggs containing chicken embryos were placed in an incubator that was heated by a light turned on and off by a computer random-event generator. The random-event generator had a 50 percent chance of turning the light on, but Levy reported that the embryos were able to influence the computer in that the light was turned on more than half the time. Some of Levy's colleagues were skeptical of these telepathic chicks (I would hope so!) and puzzled by Levy's fussing with the equipment during the experiments. They modified the computer to generate a secret record of the results and observed the experiment from a secret hiding place. Their fears were confirmed. The secret record showed the light going on 50 percent of the time, and they witnessed Levy tampering with the equipment to push the reported light frequency above 50 percent. When confronted, Levy confessed and resigned, later explaining that he was under tremendous pressure to publish. CHASING STATISTICAL SIGNIFICANCE The examples we're most interested in, though, do not involve fraudulent data. They involve practices more subtle and widespread. Many concern statistical significance, an odd religion that researchers worship almost blindly. Suppose we want to test whether daily doses of aspirin reduce the risk of a heart attack. Ideally, we compare two random samples of healthy individuals. One sample takes aspirin daily, the second sample takes a placebo--an inert substance that looks, feels, and tastes like aspirin. The test should be double-blind so that the subjects and the doctors do not know who is in each group. Otherwise, the patients might be more likely to report (and the doctors more likely to hear) the "right results." When the study is finished, the statisticians move in. The statistical issue is the probability that, by chance alone, the difference between the two groups would be as large as that actually observed. Most researchers consider a probability less than 0.05 to be "statistically significant." Patterns in the data are considered statistically persuasive if they have less than a 1-in-20 chance of occurring by luck alone. Paul the Octopus's record was statistically significant because he had less than a 1 percent chance of being that lucky. In the first 5 years of an aspirin study of involving 22,000 males doctors, there were 18 fatal heart attacks in the placebo group, compared to 5 in the aspirin group. The probability of this large a disparity by chance alone is less than 1 percent. What about nonfatal heart attacks? Here, there were 171 in the placebo group and 99 in the aspirin group. The chances of this large a disparity by luck alone is about 1 in 100,000. These results were statistically significant and the American Heart Association now recommends daily aspirin for those with a high risk of suffering a heart attack. On the other hand, not finding a statistically significant result is sometimes more interesting than finding one. In 1887 Albert Michelson and Edward Morley measured the speed of light both parallel and perpendicular to the earth's motion, expecting to find a difference that would confirm a theory that was popular at the time. But instead they found no statistically significant difference at all. Their research laid the groundwork for the development and the acceptance of Einstein's special theory of relativity. Their "failed" study helped revolutionize physics. Closer to home, later in this book we will discuss arthroscopic surgery, a routine procedure performed hundreds of thousands of times each year for knee osteoarthritis. Recent studies have found no statistically significant benefits, a conclusion that could save millions of dollars each year in unnecessary surgery, not to mention the inconvenience and risk of complications from the surgical procedure. Not finding statistical significance for this widespread procedure was undoubtedly more valuable than many studies that have found statistical significance for the treatment of uncommon ailments. Nonetheless, a study of psychology journals found that 97 percent of all published test results were statistically significant. Surely, 97 percent of all the tests that were conducted did not yield statistically significant results, but editors generally believe that tests are not worth reporting unless the results are statistically significant. The same is true outside academia. A business or government researcher trying to demonstrate the value of a certain strategy, plan, or policy feels compelled to present statistically significant empirical evidence. Everywhere, researchers chase statistical significance, and it is by no means an elusive prey. With fast computers and plentiful data, finding statistical significance is trivial. If you look hard enough, it can even be found in tables of random numbers. One way to find it is to test many theories but only report the results that are statistically significant. Even if only worthless theories are considered, one out of every twenty tests of worthless theories will be statistically significant. With mountains of data, powerful computers, and incredible pressure to produce publishable results, untold numbers of worthless theories get tested. Hundreds of researchers test thousands of theories, write up the statistically significant results, and discard the rest. The problem for society is that we only see the tip of this statistical iceberg. We see the statistically significant result, but not the tests that didn't work out. If we knew that behind the reported test were hundreds of unreported tests and remember that, on average, one out of every twenty tests of worthless theories will be statistically significant, we would surely view what does get reported with more skepticism. Pharmaceutical companies, for example, test thousands of experimental drugs and, even in well-designed, unbiased studies, we can expect hundreds of worthless drugs to show statistically significant benefits--which can in turn generate immense profits. Drugmakers have a powerful incentive to test, test, and test some more. There is a much smaller incentive to retest an approved treatment to see whether the initial results were just a fluke--one of those one out of every twenty worthless treatments that turn out to be statistically significant. When approved treatments do get retested, it is not at all surprising that the results are often disappointing. John Ioannidis holds positions at the University of Ioannina in Greece, the Tufts University School of Medicine in Massachusetts, and the Stanford University School of Medicine in California. (Imagine the frequent flier miles! Imagine the lack of sleep!) Ioannidis has devoted his career to warning doctors and the public about naively accepting medical test results that have not been convincingly replicated. In one study, he looked at 45 of the most widely respected medical studies during the years 1990 through 2003 that claimed to have demonstrated effective treatments for various ailments. In only 34 cases were attempts made to replicate the original test results with larger samples. The initial results were confirmed in 20 of these 34 cases (59 percent). For seven treatments, the benefits were much smaller than initially estimated; for the other seven treatments, there were no benefits at all. Overall, only 20 of the 45 studies have been replicated, and these were for the most highly respected studies! In the same year that Ioannidis published these unsettling findings, he wrote another paper with the damning title, "Why Most Published Research Findings Are False." Another way to secure statistical significance is to use the data to discover a theory. Statistical tests assume that the researcher starts with a theory, collects data to test the theory, and reports the results--whether statistically significant or not. Many people work in the other direction, scrutinizing the data until they find a pattern and then making up a theory that fits the pattern. Ransacking data for patterns is fun and exciting--like playing Sudoku or solving a murder mystery. Examine the data from every angle. Separate the data into categories based on gender, age, and race. Discard data that muddle patterns. Look for something--anything--that is interesting. After a pattern is discovered, start thinking about reasons. As researchers sift through the data, looking for patterns, they are explicitly or implicitly doing hundreds of tests. Imagine yourself in their shoes. First, you look at the data as a whole. Then you look at males and females separately. Then you differentiate between children and adults; then between children, teenagers, and adults; then between children, teenagers, adults, and seniors. Then you try different age cutoffs. You let the senior category be 65+, and when that doesn't work, you try 55+, or 60+, or 70+, or 75+. Eventually, something clicks. Even if researchers don't do formal statistical tests with every permutation of the data, they are still doing casual tests by looking for arrangements of the data that appear to be statistically significant. If we knew that the researcher obtained the published results by looking at the data in a hundred different ways, we would surely view the results with suspicion. These practices--selective reporting and data pillaging--are known as data grubbing . The discovery of statistical significance by data grubbing shows little other than the researcher's endurance. We cannot tell whether a data grubbing marathon demonstrates the validity of a useful theory or the perseverance of a determined researcher until independent tests confirm or refute the finding. But more often than not, the tests stop there. After all, you won't become a star by confirming other people's research, so why not spend your time discovering new theories? The data-grubbed theory consequently sits out there, untested and unchallenged. Many important scientific theories started out as efforts to explain unearthed patterns. For example, during the 1800s, most biologists believed that parental characteristics were averaged together to determine the characteristics of their offspring. For example, a child's height is an average of the parents' heights, modified perhaps by environmental influences. Gregor Mendel, an Augustinian monk, conducted meticulous studies of tens of thousands of pea plants over an eight-year period. He looked at several different traits and concluded that the blending theory didn't work. When he cross-pollinated green-seeded plants with yellow-seeded plants, the offspring's seeds were either green or yellow, not yellowish-green. When he cross-pollinated smooth-seeded plants with wrinkly-seeded plants, the offspring's seeds were either smooth or wrinkled, not something in between. To explain the results of his experiments, he proposed what are now known as Mendel's Laws of Inheritance, an elegant probabilistic model of how traits pass from one generation to the next and sometimes skip generations. He conceived a theory to fit his data and thereby laid the foundation for modern genetics. However, data grubbing has also been the source of thousands of quack theories. How can we tell the difference between a good theory and quackery? There are two effective antidotes: common sense and fresh data. If it is a ridiculous theory, we shouldn't be persuaded by anything less than overwhelming evidence, and even then be skeptical. Extraordinary claims require extraordinary evidence. Unfortunately, common sense is an uncommon commodity these days, and many silly theories have been seriously promoted by honest researchers. Have you heard the one about a baseball player's life expectancy falling by five years once he is elected to the Hall of Fame? Or the Chinese people who die of heart disease because they were born in a "fire year"? You will later in this book. The second antidote is fresh data. It is not sensible to test a theory with the very data that were ransacked to concoct the theory. If a theory was made up to fit the data, then of course the data support the theory! Theories should be tested with new data that have not been contaminated by data grubbing. When data-grubbed theories are tested with fresh data, the results are usually disappointing, which is not at all surprising. It is surely misleading to use the data that inspired a theory to test the theory, and it is surely unsurprising that the theory does not fit new data nearly as well as it fit the original data. Case in point, I just flicked a quarter off my desk with the baby finger on my left hand, and the quarter landed tails. After seeing the quarter land tails, my theory is that if I flick a quarter off my desk with the baby finger on my left hand, it will always land tails. After all, my data support it. This theory is obviously stupid and useless, but no more so than some theories we will examine in detail in later chapters that are harder to see through even though they are derived in essentially the same way as my quarter-flicking theory. If children who died of cancer lived near power lines, then electromagnetic fields (EMFs) from power lines must cause cancer, right? If a theory sort of makes sense and you don't know that the theory came after looking at the data--after the quarter landed on the floor--it is tempting to believe that a theory that fits the data must be correct. After all, the data confirm the theory! This is one of those temptations that should be resisted. Fortunately, we can resist. We can overcome the predilections we inherited from our distant ancestors as they struggled to survive and reproduce. We don't have to be duped by data. Don't be Fooled: We are genetically predisposed to look for patterns and to believe that the patterns we observe are meaningful. If a baseball player plays a good game after wearing new socks, he shouldn't change socks. If the stock market does well after NFC teams win the Super Bowl, watch the game before investing. If a basketball player makes four shots in a row, he is hot and is very likely to make the next shot. If a heart-attack victim recovers after being sent healing thoughts from a thousand miles away, distant healing works. If a customer satisfaction survey finds that people living in homes with three bathrooms are more enthusiastic than are people living in homes with two bathrooms, that is the target market. If a country has a recession when federal debt was high, then government debt causes recessions. Throughout this book, we will debunk dozens of such examples. Don't be fooled into thinking that a pattern is proof. We need a logical, persuasive explanation and we need to test the explanation with fresh data. 2 GARBAGE IN, GOSPEL OUT CHARLES BABBAGE WAS BORN IN LONDON ON DECEMBER 26, 1791, a time of great change in technology and social mobility. He was keenly interested in mathematics, but frustrated by mistakes he found in mathematical and astronomical tables that were based on human calculations. The mistakes were not only intellectually frustrating, they had serious consequences, including causing captains to sail their ships into rocks and other hazards. It was considered unpatriotic for an honorable Englishman to pay attention to French mathematicians. Nonetheless, Babbage did. He discovered that the French government had produced several mathematical tables using an automated human system. Senior mathematicians determined the formulas needed to fill in a table, and junior mathematicians respecified these formulas so that the calculations could be done by simple addition and subtraction. For example, to calculate 4 times 8, we can add 8 + 8 + 8 + 8 = 32. The menial work of adding and subtracting was done by specialists who were called "computers." Babbage realized that, in theory, machines could be designed that would add and subtract with 100 percent accuracy, thereby eliminating human error. Babbage also knew about the calculating machines designed by two Germans (Wilhelm Schickard and Gottfried Wilhelm Leibniz) and the great French mathematician Blaise Pascal. As a teenager, Pascal had invented a mechanical calculator called the Aritmatique (or Pascaline ) to help his father, a French tax collector. The Aritmatique was a box with visible dials connected to wheels hidden inside the box. Each dial had ten digits labeled 0 through 9. When the dial for the 1s column moved from 9 to 0, the dial for the 10s column moved up one notch; when the dial for the 10s column moved from 9 to 0, the dial for the 100s column moved up one notch; and so on. The Aritmatique could do addition and subtraction, but the dials had to be turned by hand. Babbage put together these two ideas (converting complex formulas into simple calculations and automating the simple calculations) and designed a mechanical computer that could do the calculations perfectly every time. Called the Difference Engine, Babbage's first design was a steam-powered behemoth made of brass and iron that stood eight feet tall, weighed fifteen tons, and contained twenty-five thousand distinct parts. The Difference Engine could make calculations up to twenty decimals long and could print formatted tables of the results. After a decade tinkering with the design, Babbage began working on plans for a more powerful calculator he called the Analytical Engine. This design had more than fifty thousand components, used perforated cards to input instructions and data, and could store up to one thousand fifty-digit numbers. The Analytical Engine had a cylindrical "mill" fifteen feet tall and six feet in diameter that executed instructions sent from a twenty-five-foot-long "store." The store was like a modern computer's memory, with the mill the CPU. Babbage's core principles were sound and similar to how modern computers work. However, given the technology of his time, his proposed machines were mechanical beasts and he was continually frustrated by limited financial resources and an inability to secure the precision components he needed. Nonetheless, his vision was so grand and his attention to detail so astonishing that his brain--the brain that invented the computer--is preserved to this day and displayed at the English Royal College of Surgeons. On the 200th anniversary of his birth, in 1991, the London Science Museum made several of Babbage's computers from his original plans, including the Second Difference Engine, which worked as accurately as he intended and made calculations to 31 digits. In 2011, a private nonprofit project called Plan 28 was launched to build Babbage's Analytical Engine so that we can be inspired by Babbage's vision, which was literally a hundred years ahead of its time. The goal is to have it built by 2021, the 150th anniversary of Babbage's death. Being a century ahead of his time, it is not surprising that many people were mystified by Babbage's vision. In his autobiography, he recounted that, On two occasions I have been asked [by members of parliament], "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. Even today, when computers are commonplace, many well-meaning people still cling to the misperception that because computers do not make arithmetic mistakes, they are infallible. A 2014 article in Harvard's alumni magazine claimed that, "Whenever sufficient information can be quantified, modern statistical methods will outperform an individual or small group of people every time." That statement is either so hopelessly circular as to be meaningless or it is flat-out wrong. The reality is that if we ask a computer to do something stupid, it will faithfully do it. Garbage in, garbage out is a snappy reminder of the fact that, no matter how powerful the computer, the value of the output depends on the quality of the input. A variation on this saying is garbage in, gospel out, referring to the tendency of people to put excessive faith in computer-generated output without thinking carefully about the input. If a computer's calculations are based on bad data, the output is not gospel, but garbage. There are, unfortunately, far too many examples of people worshipping calculations based on misleading data. Here are a few. GO TO THE BEST SCHOOL David Leonhardt, Washington Bureau chief of The New York Times , has won several awards, including the Pulitzer Prize, for his writing on economic topics. In 2009 he wrote a Times column about Crossing the Finish Line , a book by William Bowen and Michael McPherson (two former college presidents) and a doctoral candidate who presumably did the heavy lifting by analyzing data for two hundred thousand students at sixty-eight colleges. The book's core argument is that the United States does a great job persuading students to go to college, but a lousy job getting students to graduate from college. Half of those who go to college don't graduate. The first culprit they identify is under-matching: students who could go to colleges with high graduation rates choose instead to go to colleges with low graduation rates. Professor Bowen told Leonhardt, "I was really astonished by the degree to which presumptively well-qualified students from poor families under-matched." Overall, about half the low-income college-bound students with GPAs above 3.5 and SAT scores above 1200 could have gone to better colleges, but chose not to. For example, 90 percent of the students at the University of Michigan graduate within six years, compared to only 40 percent at Eastern Michigan, yet many students with good enough grades to go to Michigan choose Eastern Michigan. An economic solution to under-matching would be to make Eastern Michigan more expensive or Michigan less expensive so that students would have an incentive to choose the school with the higher graduation rate. If only it were that easy. These data are garbage and the conclusion is not gospel. Getting these so-called under-matched students to go to Michigan might actually lower their chances of graduating. The researchers assumed that the students were randomly assigned to Michigan or Eastern Michigan, much like the doctors who were randomly given aspirin or a placebo. College decisions are not a science experiment. Self-selection bias occurs when people choose to be in the data--for example, when people choose to go to college, marry, or have children. When this happens, comparisons to people who make different choices are treacherous. For example, we are often told that college graduates earn more than high school graduates, suggesting that the observed difference in incomes measures the financial return from going to college. However, part of the reason college graduates earn more may be that they are brighter and more ambitious than those who choose not to go to college. People who make different choices may in fact be different. Excerpted from Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics by Gary Smith All rights reserved by the original copyright owners. Excerpts are provided for display purposes only and may not be reproduced, reprinted or distributed without the written permission of the publisher.

Table of Contents

Introductionp. 3
1 Patterns, Patterns, Patternsp. 7
2 Garbage In, Gospel Outp. 25
3 Apples and Prunesp. 43
4 Oops!p. 55
5 Graphical Gaffesp. 71
6 Common Nonsensep. 91
7 Confound It!p. 105
8 When You're Hot, You're Notp. 123
9 Regressionp. 137
10 Even Stevenp. 159
11 The Texas Sharpshooterp. 163
12 The Ultimate Procrastinationp. 175
13 Serious Omissionsp. 185
14 Flimsy Theories and Rotten Datap. 201
15 Don't Confuse Me With Factsp. 213
16 Data Without Theoryp. 233
17 Betting the Bankp. 263
18 Theory Without Datap. 281
19 When to Be Persuaded and When to Be Skepticalp. 289
Sourcesp. 299
Indexp. 321