Monday, August 2, 2021

Poleaxed opinion polls: the ongoing 2020 disaster

Why the polls failed in the US Presidential Election of 2020

In the wake of the widespread failure of opinion polls to accurately predict the outcome of the 2020 US Presidential election, the American Association for Public Opinion Research (AAPOR) commissioned a study to investigate the causes and make recommendations. Their findings were recently released.

(This is the key question for 2020 opinion pollsters. The answer is yes, but they don't know why. Image source: Wikimedia)

Summary of AAPOR's findings

I've read the report and I've dug through the findings. Here's my summary:

  1. The polls overstated support for Democratic candidates.
  2. We don't really know why.
  3. Er... that's it.

Yes, I'm being harsh, but I'm underwhelmed by the report and I find some of the statements in it unconvincing. I'll present some of their main findings and talk through them. I encourage you to read the report for yourself and reach your own conclusions.

(We don't know why we didn't get the results right.)

Factors they ruled out for 2020

  • Late-breaking changes in favor of Republican candidates. This happened in 2016 but didn't happen in 2020. The polls were directionally consistent throughout the campaign.
  • Weighting for education. In 2016, most polls didn't weight for education and education did seem to be a factor. In 2020, most polls did weigh for education. Educational weighting wasn't a factor.
  • Pollsters got the demographics wrong. Pollsters don't use random sampling, they often use stratified sampling based on demographics. There's no evidence that errors in demographics led to widespread polling errors in 2020.
  • People were afraid to say they voted for Trump. In races not involving Trump, the opinion polls were still wrong and still favored Democratic candidates. Trump wasn't the cause.
  • Intention to vote vs. actually voting. The results can't be explained by voters saying they were going to vote but who didn't actually vote. For example, if Democratic voters said they were going to vote Democratic and didn't actually vote, this would explain the error, but it didn't happen.
  • Proportion of early voters or election day voters. Early voting/election day voting didn't make a difference to the polling error.

Factors they couldn't rule out

  • Republican voters chose not to take part in surveys at a higher number than Democratic voters.
  • The weighting model used to adjust sampling may have been wrong. Pollsters use models of the electorate to adjust their results. If these models are wrong, the results will be biased.
  • Many more people voted in 2020 than in 2016 ("new voters" in the report) - maybe pollsters couldn't model these new voters very well.

Here's a paragraph from the report:

"Unfortunately, the ability to determine the cause or causes of polling error in 2020 is limited by the available data. Unless the composition of the overall electorate is known, looking only at who responded says nothing about who did not respond. Not knowing if the Republicans (or unaffiliated voters, or new voters) who responded to polls were more supportive of Biden than those who did not respond, for example, it is impossible to identify the primary source of polling error."

Let me put that paragraph another way: we don't have enough data to investigate the problem so we can't say what went wrong.

Rinse and repeat - or just don't

I'm going to quote some sentences from the report's conclusions and comments:

  • "Considering that the average margin of error among the state-level presidential polls in 2020 was 3.9 points, that means candidate margins smaller than 7.8 points would be difficult to statistically distinguish from zero using conventional levels of statistical significance. Furthermore, accounting for uncertainty of statistical adjustments and other factors, the total survey error would be even larger."
  • "Most pre-election polls lack the precision necessary to predict the outcome of semi-close contests."
  • "Our investigation reveals a systemic overstatement of the Democratic-Republican margin in nearly every contest, regardless of mode or proximity to the election. This overstatement is largest in states with more Republican supporters"

Some of the report's statements are extraordinary if you stop and think for a moment. I want you to ponder the key question: "what use are polls"?

The people paying for polls are mostly (but not completely) political campaigns and the media. The media want to report on an accurate snapshot of where the election is now and make an assessment of who will win. Political campaigns largely want the same thing. 

In places like Alaska or Hawaii, polls aren't very useful because voters tend to vote strongly Democratic or Republican. For example, Wyoming is overwhelmingly a Republican stronghold, and Washington D.C. a Democratic stronghold. My forecast for 2024 is simple: Wyoming will vote Republican and Washington D.C. Democratic. 

Polls are useful where the race is close, or, in the words of the report "semi-close". But, according to the report, polls in semi-close states don't have sufficient accuracy to predict the result.

So, if polls aren't useful in strongly Democratic or Republican states, and they lack predictive power in "semi-close" races, what use are they? Why should anyone pay for them?

There's an even deadlier issue for polling organizations. You can very clearly judge the accuracy of political opinion polls. Opinion poll companies run all kinds of polls on all kinds of topics, not just elections. How accurate are they in other areas where their success is harder to assess?

Where to next?

The polling industry has an existential credibility crisis. It can't continue to sell a product that doesn't work. It's extraordinary that an industry that's been around for nearly 100 years doesn't have the data to diagnose its failures. The industry needs to come together to fix its problems as soon as possible - or face irrelevancy in the near future.

No comments:

Post a Comment