Why we need to investigate fraud

In July 2016, Fox News' Sean Hannity reported that Mitt Romney received no votes at all in 59 Philadelphia voting precincts in the 2012 Presidential Election. He claimed that this was evidence of vote-rigging - something that received a lot of commentary and on-air discussion at the time. On the face of it, this does sound like outright electoral fraud; in a fair election, how is it possible for a candidate to receive no votes at all? Since then, there have been other allegations of fraud and high-profile actual incidents of fraud. In this blog post, I’m going to talk about how a citizen-analyst might find electoral fraud. But I warn you, you might not like what I’m going to say.

National Museum of American History, Public domain, via Wikimedia Commons

Election organization - the smallest electoral units

In almost every country, the election process is organized in the same way; the electorate is split into geographical blocks small enough to be managed by a team on election day. The blocks might contain one or many polling stations and may have a few hundred to a few thousand voters. These blocks are called different things in different places, for example, districts, divisions, or precincts. Because precinct seems to be the most commonly used word, that's what I'm going to use here. The results from the precincts are aggregated to give results for the ward, county, city, state, or country. The precinct boundaries are set by different authorities in different places, but they're known.

How to look for fraud

A good place to look for electoral shenanigans is at the precinct level, but what should we look for? There are several easy checks:

A large and unexplained increase or decrease in the number of voters compared to previous elections and compared to other nearby precincts.
An unexpected change in voting behavior compared to previous elections/nearby precincts. For example, a precinct that ‘normally’ votes heavily for party Y suddenly voting for party X.
Changes in voting patterns for absentee voters e.g. significantly more or less absentee votes or absentee voter voting patterns that are very different from in-person votes.
Results that seem inconsistent with the party affiliation of registered voters in the precinct.
A result that seems unlikely given the demographics of the precinct.

Of course, none of these checks is a smoking gun, either individually or collectively, but they might point to divisions that should be investigated. Let’s start with the Philadelphia case and go from there.

Electoral fraud - imagined and real

It’s true that some divisions (precincts) in Philadelphia voted overwhelmingly for Obama in 2012. These divisions were small (averaging about 600 voters) and almost exclusively (95%+) African-American. Obama was hugely popular with the African-American community in Philadelphia, polling 93%+. The same divisions also have a history of voting overwhelmingly Democratic. Given these facts, it’s not at all surprising to see no or very few votes for Mitt Romney. Similar arguments hold for allegations of electoral fraud in Cleveland, Ohio in 2012.

In fact, there were some unbalanced results the other way too; in some Utah precincts, Obama received no votes at all - again not surprising given the voter population and voter history.

Although on the face of it these lopsided results seem to strongly indicate fraud, the allegations don't stand up to analytical scrutiny.

Let’s look at another alleged case of electoral fraud, this time in 2018 in North Carolina. The congressional election was fiercely contested and appeared to be narrowly decided in favor of Mark Harris. However, investigators found irregularities in absentee ballots, specifically, missing ballots from predominantly African-American areas. The allegations were serious enough that the election was held again, and criminal charges have been made against a political operative in Mark Harris’ campaign. The allegation is ‘ballot harvesting’, where operatives persuade voters who might vote for their opposition to voting via an absentee ballot and subsequently make these ballots disappear.

My sources of information here are newspaper reports and analysis, but what if I wanted to do my own detective work and find areas where the results looked odd? How might I get the data? This is where things get hard.

Democracy’s data - official sources

To get the demographics of a precinct, I can try going to the US Census Bureau. The Census Bureau defines small geographic areas, called tracts, that they can supply data on. Tract data include income levels, population, racial makeup, etc. Sometimes, these tracts line up with voting districts (the Census term for precincts), but sometimes they don’t. If tracts don’t line up with voting districts, then automated analysis becomes much harder. In my experience, it takes a great investment of time to get any useful data from the Census Bureau; the data’s there, it’s just really hard finding out how to get it. In practice then, it’s extremely difficult for a citizen-analyst to link census data to electoral data.

What about voting results? Surely it’s easy to get electoral result data? As it turns out, this is surprisingly hard too. You might think the Federal Election Commission (FEC) will have detailed data, but it doesn’t. The data available from the FEC for the 2016 Presidential Election is less detailed than the 2016 Presidential Election Wikipedia page. The reason is, Presidential Elections are run by the states, so there are 51 (including Washington DC) separate authorities maintaining electoral results, which means 51 different ways of getting data, 51 different places to get it, and 51 different levels of detail available. The FEC sources its data from the states, so it's not surprising its reports are summary reports.

If we need more detailed data, we need to go to the states themselves.

Let's take Massachusetts as an example, Presidential Election data is available for 2016, down to the ward level (as a CSV), but for Utah, data is only available at the county level (as an Excel file), which is the same as Pennsylvania, where the data is only available from a web page. To get detail below the county level may take freedom of information requests, if the information is available at all.

In effect, this puts precinct-level nationwide voting analysis from official sources beyond almost all citizen-analysts.

Democracy’s data - unofficial sources

In practice, voting data is hard to come by from official sources, but it is available from unofficial sources who've put the work into getting the data from the states and make it available to everyone.

Dave Leip offers election data down to detailed levels; the 2016 results by country will cost you $92 and results by Congressional District will cost you $249, however, high-level results are on his website and available for free. He's even been kind enough to list his sources and URLs if you want to spend the time to duplicate his work. Leip’s data is used by the media in their analysis, and probably by political campaigns too. He’s put in a great deal of work to gather the data and he’s asking for a return on his effort, which is fair enough.

The MIT Election Data and Science Lab (MEDSL) collects election data, including down to the precinct level and the data is available for the most recent Presidential Election (2016 at the time of writing). As usual with this kind of data, there are all kinds of notes to read before using the data. MIT has also been kind enough to make tools available to analyze the data and they also make available their website scrapping tools.

The MIT project isn't the only project providing data. Various other universities have collated electoral resources at various levels of detail:

Harvard Kennedy School
University of Michigan Library
Princeton University Library
The University of Florida has a project to provide US election data at the precinct level and has also made their precinct data available online.

Democracy’s data - electoral fraud

What about looking for cases of electoral fraud? There isn't a central repository of electoral fraud cases and there are multiple different court systems in the US (state and federal), each maintaining records in different ways. Fortunately, Google indexes a lot of cases, but often, court transcripts are only available for a fee, and of course, it's extremely time-consuming to trawl through cases.

The Heritage Foundation maintains a database of known electoral fraud cases. They don't claim their database is complete, but they have put a lot of effort into maintaining it and it's the most complete record I know of.

In 2018, there were elections for the House of Representatives, the Senate, state elections, and of course county and city elections. Across the US, there must have been thousands of different elections in 2018. How many cases of electoral fraud do you think there were? What level of electoral fraud would undermine your faith in the system? In 2018, there were 65 cases. From the Heritage Foundation data, here’s a chart of fraud cases per year for the United States as a whole.

(Electoral fraud cases by year from the Heritage Foundation electoral fraud database)

It does look like there's been an increase in electoral fraud up to about 2010, but bear in mind the dataset cover the period of computerization and the rise of the internet. We might expect a rise in fraud cases because it's easier to find case records.

Based on this data, there really doesn’t seem to be large-scale electoral fraud in the United States. In fact, in reading the cases on their website, most of them are small-scale frauds concerning local elections (e.g. mayoral elections) - in a lot of cases, the frauds are frankly pathetic.

Realistic assessment of election data

Official data is either hard to come by or not available at the precinct level, which leaves us using unofficial data. Fortunately, unofficial data is high quality and from reputable sources. The problem is, data from unofficial sources aren't available immediately after an election; there may be a long delay between the election and the data. If one of the goals of electoral data analysis is finding fraud, then timely data is paramount.

Of course, this kind of analysis I'm talking about here won't find small-scale fraud, where a person votes more than once or impersonates someone. But small-scale fraud will only affect the outcome of the very tightest of races. Democracy is most threatened by fraud that might affect the results, which in most cases is larger-scale fraud like the North Carolina case. Statistical analysis might detect these kinds of fraud.

Sean Hannity's allegation of electoral fraud in Philadelphia didn't stand up to analysis, but it was worth investigating and is the kind of fraud we could detect using data - if only it were available in a timely way.

How things could be - a manifesto

Imagine groups of researchers sitting by their computers on election night. As election results at the precinct level are posted online, they analyze the results for oddities. By the next morning, they may have spotted oddities in absentee ballots, or unexplained changes in voting behavior, or unexpected changes in voter turnout - any of which will feed into the news cycle. Greater visibility of anomalies will enable election officials to find and act on fraud more quickly.

To do this will require consistency of reporting at the state level and a commitment to post precinct results as soon as they're counted and accepted. This may sound unlikely, but there are federal standards the states must follow in many other areas, including deodorants, teddy bears, and apple grades, but also for highway construction, minimum drinking age, and the environment. Isn't the transparency of democracy at least as important as deodorants, teddy bears, and apples?

If you liked this post, you might like these ones

Forecasting the 2020 election: a retrospective

What do presidential approval polls really tell us?

Fundamentally wrong? Using economic data as an election predictor - why I distrust forecasting models built on economic and other data

Can you believe the polls? - fake polls, leading questions, and other sins of opinion polling.

President Hilary Clinton: what the polls got wrong in 2016 and why they got it wrong - why the polls said Clinton would win and why Trump did.

Poll-axed: disastrously wrong opinion polls - a brief romp through some disastrously wrong opinion poll results.

Who will win the election? Election victory probabilities from opinion polls

Sampling the goods: how opinion polls are made - my experiences working for an opinion polling company as a street interviewer.

The electoral college for beginners - how the electoral college works

Engora Data Blog

Saturday, May 23, 2020

Finding electoral fraud - the democracy data deficit