## Saturday, January 25, 2020

### How to lie with statistics

I recently re-read Darrell Huff's classic text from 1954, 'How to lie with statistics'. In case you haven't read it, the book takes a number of deceitful statistical tricks of the trade and explains how they work and how to defend yourself from being hoodwinked. My overwhelming thought was 'plus ça change'; the more things change, the more they remain the same. The statistical tricks people used to mislead 50 years ago are still being used today.

(Image credit: Wikipedia)

Huff discusses surveys and how very common methodology flaws can produce completely misleading results. His discussion of sampling methodologies and the problems with them are clear and unfortunately, still relevant. Making your sample representative is a perennial problem as the polling for the 2016 Presidential election showed. Years ago, I was a market researcher conducting interviews on the street and Huff's bias comments rang very true with me - I faced these problems on a daily basis. In my experience, even people with a very good statistical education aren't aware of survey flaws and sources of bias.

The chapter on averages still holds up. Huff shows how the mean can be distorted and why the median might be a better choice. I've interviewed people with Master's degrees in statistics who couldn't explain why the median might be a better choice of average than the mean, so I guess there's still a need for the lesson.

One area where I think things have moved in the right direction is the decreasing use of some types of misleading charts. Huff discusses the use of images to convey quantitative information. He shows a chart where steel production was represented by images of a blast furnace (see below). The increase in production was 50%, but because the height and width were both increased, the area consumed by the images increases by 150%, giving the overall impression of a 150% increase in production1. I used to see a lot of these types of image-based charts, but their use has declined over the years. It would be nice to think Huff had some effect.

(Image credit: How to lie with statistics)

Staying with charts, his discussion about selecting axis ranges to mislead still holds true and there are numerous examples of people using this technique to mislead every day. I might write a blog post about this at some point.

He has chapters on the post hoc fallacy (confusing correlation and causation) and has a nice explanation of how percentages are regularly mishandled. His discussion of general statistical deceitfulness is clear and still relevant.

Unfortunately, the book hasn't aged very well in other aspects. 2020 readers will find his language sexist, the jokey drawings of a smoking baby are jarring, and his roundabout discussion of the Kinsey Reports feels odd. Even the writing style is out of date.

Huff himself is tainted; he was funded by the tobacco industry to speak out against smoking as a cause of cancer. He even wrote a follow-up book, How to lie with smoking statistics to debunk anti-smoking data. Unfortunately, his source of authority was the widespread success of How to lie with statistics. How to lie with smoking statistics isn't available commercially anymore, but you can read about it on Alex Reinhart's page.

Despite all its flaws, I recommend you read this book. It's a quick read and it'll give you a grounding in many of the problems of statistical analysis. If you're a business person, I strongly recommend it - its lessons about cautiously interpreting analysis still hold.

This is a flawed book by a flawed author but it still has a lot of value. I couldn't help thinking that the time is probably right for a new popular book on how people are lying and misleading you using charts and statistics.

Correction

[1] Colin Warwick pointed out an error in my original text. My original text stated the height and width of the second chart increased by 50%. That's not quite what Huff said. I've corrected my post.