Showing posts with label analytics. Show all posts
Showing posts with label analytics. Show all posts

Tuesday, September 8, 2020

Can you believe the polls?

Opinion polls have known sin

Polling companies have run into trouble over the years in ways that render some poll results doubtful at best. Here are just a few of the problems:

  • Fraud allegations.
  • Leading questions
  • Choosing not to publish results/picking methodologies so that polls agree.

Running reliable polls is hard work that takes a lot of expertise and commitment. Sadly, companies sometimes get it wrong for several reasons:

  • Ineptitude.
  • Lack of money. 
  • Telling people what they want to hear. 
  • Fakery.

In this blog post, I'm going to look at some high-profile cases of dodgy polling and I'm going to draw some lessons from what happened.

(Are some polls real or fake? Image source: Wikimedia Commons. Image credit: Basile Morin. License: Creative Commons.)

Allegations of fraud part 1 - Research 2000

Backstory

Research 2000 started operating around 1999 and gained some solid early clients. In 2008, The Daily Kos contracted with Research 2000 for polling during the upcoming US elections. In early 2010, Nate Silver at FiveThirtyEight rated Research 2000 as an F and stopped using their polls. As a direct result, The Daily Kos terminated their contract and later took legal action to reclaim fees, alleging fraud.

Nate Silver's and others' analysis

After the 2010 Senate elections, Nate Silver analyzed polling results for 'house effects' and found a bias towards the Democratic party for Research 2000. These kinds of biases appear all the time and vary from election to election. The Research 2000 bias was large (at 4.4%), but not crazy; the Rasmussen Republican bias was larger for example. Nonetheless, for many reasons, he graded Research 2000 an F and stopped using their polling data.

In June of 2010, The Daily Kos publicly dismissed Research 2000 as their pollster based on Nate Silver's ranking and more detailed discussions with him. Three weeks later, The Daily Kos sued Research 2000 for fraud. After the legal action was public, Nate Silver blogged some more details of his misgivings about Research 2000's results, which led to a cease and desist letter from Research 2000's lawyers. Subsequent to the cease-and-desist letter, Silver published yet more details of his misgivings. To summarize his results, he was seeing data inconsistent with real polling - the distribution of the numbers was wrong. As it turned out, Research 2000 was having financial trouble around the time of the polling allegations and was negotiating low-cost or free polling with The Daily Kos in exchange for accelerated payments. 

Others were onto Research 2000 too. Three statisticians analyzed some of the polling data and found patterns inconsistent with real polling - again, real polls tend to have results distributed in certain ways and some of the Research 2000 polls did not.

The result

The lawsuit progressed with strong evidence in favor of The Daily Kos. Perhaps unsurprisingly, the parties agreed a settlement, with Research 2000 agreeing to pay The Daily Kos a settlement fee. Research 2000 effectively shut down after the agreement.

Allegations of fraud part 2 - Strategic Vision, LLC

Backstory

This story requires some care in the telling. At the time of the story, there were two companies called Strategic Vision, one company is well-respected and wholly innocent, the other not so much. The innocent and well-respected company is Strategic Vision based in San Diego. They have nothing to do with this story. The other company is Strategic Vision, LLC based in Atlanta. When I talk about Strategic Vision, LLC from now on it will be solely about the Atlanta company.

To maintain trust in the polling industry, the American Association for Public Opinion Research (AAPOR) has guidelines and asks polling companies to disclose some details of their polling methodologies. They rarely censure companies, and their censures don't have the force of law, but public shaming is effective as we'll see. 

What happened

In 2008, the AAPOR asked 21 polling organizations for details of their 2008 pre-election polling, including polling for the New Hampshire Democratic primary. Their goal was to quality-check the state of polling in the industry.

One polling company didn't respond for a year, despite repeated requests to do so. As a result, in September 2009, the AAPOR published a public censure of Strategic Vision, LLC which you can read here

It's very unusual for the AAPOR to issue a censure, so the story was widely reported at the time, for example in the New York Times, The Hill, and The Wall Street Journal. Strategic Vision LLC's public response to the press coverage was that they were complying but didn't have time to submit their data. They denied any wrongdoing.

Subsequent to the censure, Nate Silver looked more closely at Strategic Vision LLC's results. Initially, he asked some very pointed and blunt questions. In a subsequent post, Nate Silver used Benford's Law to investigate Strategic Vision LLC's data, and based on his analysis he stated there was a suggestion of fraud - more specifically, that the data had been made up. In a post the following day, Nate Silver offered some more analysis and a great example of using Benford's Law in practice. Again, Strategic Vision LLC vigorously denied any wrongdoing.

One of the most entertaining parts of this story is a citizenship poll conducted by Strategic Vision, LLC among high school students in Oklahoma. The poll was commissioned by the Oklahoma Council on Public Affairs, a think tank. The poll asked eight various straightforward questions, for example:

  • who was the first US president? 
  • what are the two main political parties in the US?  

and so on. The results were dismal: only 23% of students answered George Washington and only 43% of students knew Democratic and Republican. Not one student in 1,000 got all questions correct - which is extraordinary. These types of polls are beloved of the press; there are easy headlines to be squeezed from students doing poorly, especially on issues around citizenship. Unfortunately, the poll results looked odd at best. Nate Silver analyzed the distribution of the results and concluded that something didn't seem right - the data was not distributed as you might expect. To their great credit, when the Oklahoma Council on Public Affairs became aware of problems with the poll, they removed it from their website and put up a page explaining what happened. They subsequently terminated their relationship with Strategic Vision, LLC.

In 2010, a University of Cincinnati professor awarded Strategic Vision LLC the ''Phantom of the Soap Opera" award on the Media Ethics site. This site has a little more back story on the odd story of Strategic Vision LLC's offices or lack of them.

The results

Strategic Vision, LLC continued to deny any wrongdoing. They never supplied their data to the AAPOR and they stopped publishing polls in late 2009. They've disappeared from the polling scene.

Other polling companies

Nate Silver rated other pollsters an F and stopped using them. Not all of the tales are as lurid as the ones I've described here, but there are accusations of fraud and fakery in some cases, and in other cases, there are methodology disputes and no suggestion of impropriety. Here's a list of pollsters Nate Silver rates an F.

Anarchy in the UK

It's time to cross the Atlantic and look at polling shenanigans in the UK. The UK hasn't seen the rise and fall of dodgy polling companies, but it has seen dodgy polling methodologies.

Herding

Let's imagine you commission a poll on who will win the UK general election. You get a result different from the other polls. Do you publish your result? Now imagine you're a polling analyst, you have a choice of methodologies for analyzing your results, do you do what everyone else does and get similar results, or do you do your own thing and maybe get different results from everyone else?

Sadly, there are many cases when contrarian polls weren't published and there is evidence that polling companies made very similar analysis choices to deliberately give similar results. This leads to the phenomenon called herding where published poll results tend to herd together. Sometimes, this is OK, but sometimes it can lead to multiple companies calling an election wrongly.

In 2015, the UK polls predicted a hung parliament, but the result was a working majority for the Conservative party. The subsequent industry poll analysis identified herding as one of the causes of the polling miss. 

This isn't the first time herding has been an issue with UK polling and it's occasionally happened in the US too.

Leading questions

The old British TV show 'Yes, Prime Minister' has a great piece of dialog neatly showing how leading questions work in surveys. 'Yes, Prime Minister' is a comedy, but UK polls have suffered from leading questions for a while.

The oldest example I've come across dates from the 1970's and the original European Economic Community membership referendum. Apparently, one poll asked the following questions to two different groups:

  • France, Germany, Italy, Holland, Belgium and Luxembourg approved their membership of the EEC by a vote of their national parliaments. Do you think Britain should do the same?
  • Ireland, Denmark and Norway are voting in a referendum to decide whether to join the EEC. Do you think Britain should do the same?

These questions are highly leading and unsurprisingly elicited the expected positive result in both (contradictory) cases.

Moving forward in time to 2012, leading questions or artful question wording, came up again. The background is press regulation. After a series of scandals where the press behaved shockingly badly, the UK government considered press regulation to curb abuses. Various parties were for or against various aspects of press regulation and they commissioned polls to support their viewpoints. 

The polling company YouGov published a poll, paid for by The Media Standards Trust, that showed 79% of people thought there should be an independent government-sanctioned regulator to investigate complaints against the press. Sounds comprehensive and definitive. 

But there was another poll at about the same time, this time paid for by The Sun newspaper,  that found that only 24% of the British public wanted a government regulator for the press - the polling company here was also YouGov! 

The difference between the 79% and 24% came through careful question wording - a nuance that was lost in the subsequent press reporting of the results. You can listen to the story on the BBC's More Or Less program that gives the wording of the question used.

What does all this mean?

The quality of the polling company is everything

The established, reputable companies got that way through high-quality reliable work over a period of years. They will make mistakes from time to time, but they learn from them. When you're considering whether or not to believe a poll,  you should ask who conducted the poll and consider the reputation of the company behind it.

With some exceptions, the press is unreliable

None of the cases of polling impropriety were caught by the press. In fact, the press has a perverse incentive to promote the wild and outlandish, which favors results from dodgy pollsters. Be aware that a newspaper that paid for a poll is not going to criticize its own paid-for product, especially when it's getting headlines out of it.

Most press coverage of polls focuses on discussing what the poll results mean, not how accurate they are and sources of bias. If these things are discussed, they're discussed in a partisan manner (disagreeing with a poll because the writer holds a different political view). I've never seen the kind of analysis Nate Silver does elsewhere - and this is to the great detriment of the press and their credibility.

Vested interests

A great way to get press coverage is by commissioning polls and publishing the results; especially if you can ask leading questions. Sometimes, the press gets very lazy and doesn't even report who commissioned a poll, even when there's plainly a vested interest.

Anytime you read a survey, ask who paid for it and what the exact questions were.

Outliers are outliers, not trends

Outlier poll results get more play than results in line with other pollsters. As I write this in early September 2020, Biden is about 7% ahead in the polls. Let's imagine two survey results coming in early September:

  • Biden ahead by 8%.
  • Trump ahead by 3%

Which do you think would get more space in the media? Probably the shocking result, even though the dull result may be more likely. Trump-supporting journalists might start writing articles on a campaign resurgence while Biden-supporting journalists might talk about his lead slipping and losing momentum. In reality, the 3% poll might be an anomaly and probably doesn't justify consideration until it's backed by other polls. 

Bottom line: outlier polls are probably outliers and you shouldn't set too much store by them.

There's only one Nate Silver

Nate Silver seems like a one-man army, routing out false polling and pollsters. He's stood up to various legal threats over the years. It's a good thing that he exists, but it's a bad thing that there's only one of him. It would be great if the press could take inspiration from him and take a more nuanced, skeptical, and statistical view of polls. 

Can you believe the polls?

Let me close by answering my own question: yes you can believe the polls, but within limits and depending on who the pollster is.

Reading more

This blog post is one of a series of blog posts about opinion polls. 

Monday, August 17, 2020

Poll-axed: disastrously wrong opinion polls

Getting it really, really wrong

On occasions, election opinion polls have got it very, very wrong. I'm going to talk about some of their biggest blunders and analyze why they messed up so very badly. There are lessons about methodology, hubris, and humility in forecasting.

 (Image credit: secretlondon123, Source: Wikimedia Commons, License: Creative Commons)

The Literary Digest - size isn't important

The biggest, badest, and boldest polling debacle happened in 1936, but it still has lessons for today. The Literary Digest was a mass-circulation US magazine published from 1890-1938. In 1920, it started printing presidential opinion polls, which over the years acquired a good reputation for accuracy [Squire], so much so that they boosted the magazine's subscriptions. Unfortunately, its 1936 opinion poll sank the ship.

(Source: Wikimedia Commons. License: Public Domain)

The 1936 presidential election was fought between Franklin D. Roosevelt (Democrat), running for re-election, and his challenger Alf Landon (Republican).  The backdrop was the ongoing Great Depression and the specter of war in Europe. 

The Literary Digest conducted the largest-ever poll up to that time, sending surveys to 10 million people and receiving 2.3 million responses; even today, this is orders of magnitude larger than typical opinion polls. Through the Fall of 1936, they published results as their respondents returned surveys; the magazine didn't interpret or weight the surveys in any way [Squire]. After 'digesting' the responses, the Literary Digest confidently predicted that Landon would easily beat Roosevelt. Their reasoning was, the poll was so big it couldn’t possibly be wrong, after all the statistical margin of error was tiny

Unfortunately for them, Roosevelt won handily. In reality, handily is putting it mildly, he won a landslide victory (523 electoral college votes to 8).

So what went wrong? The Literary Digest sampled its own readers, people who were on lists of car owners, and people who had telephones at home. In the Great Depression, this meant their sample was not representative of the US voting population; the people they sampled were much wealthier. The poll also suffered from non-response bias; the people in favor of Landon were enthusiastic and filled in the surveys and returned them, the Roosevelt supporters less so. Unfortunately for the Literary Digest, Roosevelt's supporters weren't so lethargic on election day and turned up in force for him [Lusinchi, Squire]. No matter what the size of the Literary Digest's sample, their methodology baked in bias, so it was never going to give an accurate forecast. 

Bottom line: survey size can't make up for sampling bias.

Sampling bias is an ongoing issue for pollsters. Factors that matter a great deal in one election might not matter in another, and pollsters have to estimate what will be important for voting so they know who to select. For example, having a car or a phone might not correlate with voting intention for most elections, until for one election they do correlate very strongly. The Literary Digest's sampling method was crude, but worked fine in previous elections. Unfortunately, in 1936 the flaws in their methodology made a big difference and they called the election wrongly as a result. Fast-forwarding to 2016, flaws in sampling methodology led to pollsters underestimating support for Donald Trump.

Sadly, the Literary Digest never recovered from this misstep and folded two years later. 

Dewey defeats Truman - or not

The spectacular implosion of the 1936 Literary Digest poll gave impetus to the more 'scientific' polling methods of George Gallup and others [Igo]. But even these scientific polls came undone in the 1948 US presidential election. 

The election was held not long after the end of World War II and was between the incumbent, Harry S. Truman (Democrat), and his main challenger, Thomas E. Dewey (Republican). At the start of the election campaign, Dewey was the favorite over the increasingly unpopular Truman. While Dewey ran a low-key campaign, Truman led a high-energy, high-intensity campaign.

The main opinion polling companies of the time, Gallup, Roper, and Crossley firmly predicted a Dewey victory. The Crossley Poll of 15 October 1948 put Dewey ahead in 27 states [Topping]. In fact, their results were so strongly in favor of Dewey that some polling organizations stopped polling altogether before the election. 

The election result? Truman won convincingly. 

A few newspapers were so convinced that Dewy had won that they went to press with a Dewey victory announcement, leading to one of the most famous election pictures of all time.

(Source: Truman Library)

What went wrong?

As far as I can tell, there were two main causes of the pollsters' errors:

  • Undecided voters breaking for Truman. Pollsters had assumed that undecided voters would split their votes evenly between the candidates, which wasn't true then, and probably isn't true today.
  • Voters changing their minds or deciding who to vote for later in the campaign. If you stop polling late in the campaign, you're not going to pick up last-minute electoral changes. 

Just as in 1936, there was a commercial fallout, for example, 30 newspapers threatened to cancel their contracts with Gallup. 

As a result of this fiasco, the polling industry regrouped and moved towards random sampling and polling late into the election campaign.

US presidential election 2016

For the general public, this is the best-known example of polls getting the result wrong. There's a lot to say about what happened in 2016, so much in fact, that I'm going to write a blog post on this topic alone. It's not the clear-cut case of wrongness it first appears to be.

(Imaged credit: Michael Vadon, Source: Wikimedia Commons, License: Creative Commons)

For now, I'll just give you some hints: like the Literary Digest example, sampling was one of the principal causes, exacerbated by late changes in the electorate's voting decisions. White voters without college degrees voted much more heavily for Donald Trump than Hilary Clinton and in 2016, opinion pollsters didn't control for education, leading them to underestimate Trump's support in key states. Polling organizations are learning from this mistake and changing their methodology for 2020. Back in 2016, a significant chunk of the electorate seemed to make up their minds in the last few weeks of the election which was missed by earlier polling.

It seems the more things change, the more they remain the same.

Anarchy in the UK?

There are several properties of the US electoral system that make it very well suited for opinion polling but other electoral systems don't have these properties. To understand why polling is harder in the UK than in the US, we have to understand the differences between a US presidential election and a UK general election.

  • The US is a national two-party system, the UK is a multi-party democracy with regional parties. In some constituencies, there are three or more parties that could win.
  • In the US, the president is elected and there are only two candidates, in the UK, the electorate vote for Members of Parliament (MPs) who select the prime minister. This means the candidates are different in each constituency and local factors can matter a great deal.
  • There are 50 states plus Washington DC, meaning 51 geographical areas. In the UK, there are currently 650 constituencies, meaning 650 geographies area to survey.

These factors make forecasting UK elections harder than US elections, so perhaps we should be a bit more forgiving. But before we forgive, let's have a look at some of the UK's greatest election polling misses.

General elections

The 1992 UK general election was a complete disaster for the opinion polling companies in the UK [Smith]. Every poll in the run-up to the election forecast either a hung parliament (meaning, no single party has a majority) or a slim majority for the Labour party. Even the exit polls forecast a hung parliament. Unfortunately for the pollsters, the Conservative party won a comfortable working majority of seats. Bob Worcester, the best-known UK pollster at the time, said the polls were more wrong "...than in any time since their invention 50 years ago" [Jowell].

Academics proposed several possible causes [Jowell, Smith]:
  • "Shy Tories". The idea here is that people were too ashamed to admit they intended to vote Conservative, so they lied or didn't respond at all. 
  • Don't knows/won't say. In any poll, some people are undecided or won't reveal their preference. To predict an election, you have to model how these people will vote, or at least have a reliable way of dealing with them, and that wasn't the case in 1992 [Lynn].
  • Voter turnout. Different groups of people actually turn out to vote at different proportions. The pollsters didn't handle differential turnout very well, leading them to overstate the proportion of Labour votes.
  • Quota sampling methods. Polling organizations use quota-based sampling to try and get a representative sample of the population. If the sampling is biased, then the results will be biased [Lynn, Smith]. 

As in the US in 1948, the pollsters re-grouped, licked their wounds and revised their methodologies.

After the disaster of 1992, surely the UK pollsters wouldn't get it wrong again? Moving forward 2015, the pollsters got it wrong again!

In the 2015 election, the Conservative party won a working majority. This was a complex, multi-party election with strong regional effects, all of which were well-known at the time. As in 1992, the pollsters predicted a hung parliament and their subsequent humiliation was very public. Once again, there were various inquiries into what went wrong [Sturgis]. Shockingly, the "official" post-mortem once again found that sampling was the cause of the problem. The polls over-represented Labour supporters and under-represented Conservative supporters, and the techniques used by pollsters to correct for sampling issues were inadequate [Sturgis]. The official finding was backed up by independent research which further suggested pollsters had under-represented non-voters and over-estimated support for the Liberal Democrats [Melon].

Once again, the industry had a rethink.

There was another election in 2019. This time, the pollsters got it almost exactly right.

It's nice to see the polling industry getting a big win, but part of me was hoping Lord Buckethead or Count Binface would sweep to victory in 2019.

(Count Binface. Source: https://www.countbinface.com/)

(Lord Buckethead. Source: https://twitter.com/LordBuckethead/status/1273601785094078464/photo/1. Not the hero we need, but the one we deserve.)

EU referendum

This was the other great electoral shock of 2016. The polls forecast a narrow 'Remain' victory, but the reality was a narrow 'Leave' win. Very little has been published on why the pollsters got it wrong in 2016, but what little that was published suggests that the survey method may have been important. The industry didn't initiate a broad inquiry, instead, individual polling companies were asked to investigate their own processes

Other countries

There have been a series of polling failures in other countries. Here are just a few:

Takeaways

In university classrooms around the world, students are taught probability theory and statistics. It's usually an antiseptic view of the world, and opinion poll examples are often presented as straightforward math problems, stripped of the complex realities of sampling. Unfortunately, this leaves students unprepared for the chaos and uncertainty of the real world.

Polling is a complex, messy issue. Sampling governs the success or failure of polls, but sampling is something of a dark art and it's hard to assess its accuracy during a campaign. In 2020, do you know the sampling methodologies used by the different polling companies? Do you know who's more accurate than who?

Every so often, the polling companies take a beating. They re-group, fix the issues, and survey again. They get more accurate, and after a while, the press forgets about the failures and talks in glowing terms about polling accuracy, and maybe even doing away with the expensive business of elections in favor of polls. Then another debacle happens. The reality is, the polls are both more accurate and less accurate than the press would have you believe. 

As Yogi Berra didn't say, "it's tough to make predictions, especially about the future".  

If you liked this post, you might like these ones

References

[Igo] '"A gold mine and a tool for democracy": George Gallup, Elmo Roper, and the business of scientific polling,1935-1955', Sarah Igo, J Hist Behav Sci. 2006;42(2):109-134

[Jowell] "The 1992 British Election: The Failure of the Polls", Roger Jowell, Barry Hedges, Peter Lynn, Graham Farrant and Anthony Heath, The Public Opinion Quarterly, Vol. 57, No. 2 (Summer, 1993), pp. 238-263

[Lusinchi] '“President” Landon and the 1936 Literary Digest Poll: Were Automobile and Telephone Owners to Blame?', Dominic Lusinchi, Social Science History 36:1 (Spring 2012)

[Lynn] "How Might Opinion Polls be Improved?: The Case for Probability Sampling", Peter Lynn and Roger Jowell, Journal of the Royal Statistical Society. Series A (Statistics in Society), Vol. 159, No. 1 (1996), pp. 21-28 

[Melon] "Missing Nonvoters and Misweighted Samples: Explaining the 2015 Great British Polling Miss", Jonathan Mellon, Christopher Prosser, Public Opinion Quarterly, Volume 81, Issue 3, Fall 2017, Pages 661–687

[Smith] "Public Opinion Polls: The UK General Election, 1992",  T. M. F. Smith, Journal of the Royal Statistical Society. Series A (Statistics in Society), Vol. 159, No. 3 (1996), pp. 535-545

[Squire] "Why the 1936 Literary Digest poll failed", Peverill Squire, Public Opinion Quarterly, 52, 125-133, 1988

[Sturgis] "Report of the Inquiry into the 2015 British general election opinion polls", Patrick Sturgis,  Nick Baker, Mario Callegaro, Stephen Fisher, Jane Green, Will Jennings, Jouni Kuha, Ben Lauderdale, Patten Smith

[Topping] '‘‘Never argue with the Gallup Poll’’: Thomas Dewey, Civil Rights and the Election of 1948', Simon Topping, Journal of American Studies, 38 (2004), 2, 179–198

Saturday, June 13, 2020

Death by rounding

Round, round, round, round, I get around

Rounding errors are one of those basic things that every technical person thinks they're on top of and won't happen to them, but the problem is, it can and does happen to good people, sometimes with horrendous consequences. In this blog post, I'm going to look at rounding errors, show you why they can creep in, and provide some guidelines you should follow to keep you and your employer safe. Let's start with some real-life cases of rounding problems.


(Rounding requires a lot of effort. Image credit: Wikimedia Commons. License: Public Domain)

Rounding errors in the real world

The wrong rounding method

In 1992, there was a state-level election in Schleswig-Holstein in Germany. The law stated that every party that received 5% or more of the vote got a seat, but there were no seats for parties with less than 5%. The software that calculated results rounded the results up (ceil) instead of rounding the results down (floor) as required by law. The Green party received 4.97% of the vote, which was rounded up to 5.0%, so it appeared the Green party had won a seat. The bug was discovered relatively quickly, and the seat was reallocated to the Social Democrats who gained a one-seat majority because of it [Link]. 

Cumulative rounding

The more serious issue is cumulative rounding errors in real-time systems. Here a very small error becomes very important when it's repeatedly or cumulatively added.

The Vancouver Stock Exchange set up a new index in January 1982, with a value set to 1,000. The index was updated with each trade, but the index was rounded down to three decimal places (truncated) instead of rounding to the nearest decimal place. The index was calculated thousands of times a day, so the error was cumulative. Over time, the error built up from something not noticeable to something very noticeable indeed. The exchange had to correct the error; on Friday November 25th, 1983 the exchange closed at 524.811, the rounding error was fixed, and when the exchange reopened, the index was 1098.892 - the difference being solely due to the rounding error bug fix [Link].

The most famous case of cumulative rounding errors is the Patriot missile problem in Dharan in 1991. A Patriot missile failed to intercept a Scud missile, which went on to kill 28 people and injured a further 98. The problem came from the effects of a cumulative rounding error. The Patriot system updated every 0.1s, but 0.1 can't be represented exactly in a fixed point system, there's rounding, which in this case was rounding down. The processors used by the Patriot system were old 24-bit systems that truncated the 0.1 decimal representation. Over time, the truncation error built up, resulting in the Patriot missile incorrectly responding to sensor data and missing the Scud missile [Link].  

Theoretical explanation of rounding errors

Cumulative errors

Fairly obviously, cumulative errors are a sum:


E = ∑e

where E is the cumulative error and e is the individual error. In the Vancouver Stock Exchange example, the mean individual rounding error when rounding to three decimal places was 0.0005. From Wikipedia, there were about 3,000 transactions per day, and the period from January 1st 1982 when the index started to November 25th, 1983 when the index was fixed was about 473 working days. This gives an expected cumulative error of about 710, which is in the ballpark of what actually happened. 

Of course, if the individual error can be positive or negative, this can make the problem better or worse. If the error is distributed evenly around zero, then the cumulative error should be zero, so things should be OK in the long run. But even a slight bias will eventually result in a significant cumulative error - regrettably, one that might take a long time to show up.

Although the formula above seems trivial, the point is, it is possible to calculate the cumulative effect of rounding errors.

Combining errors

When we combine numbers, errors can really hurt depending on what the combination is. Let's start with a simple example, if:


z = x - y 

and:
 
sis the standard error in z
sis the standard error in x
sis the standard error in y

then

sz  = [s2x + s2y]1/2

If x and y are numerically close to one another, errors can quickly become very significant. My first large project involved calculating quantum states, which included a formula like z = x - y. Fortunately, the rounding was correct and not truncated, but the combination of machine precision errors and the formulae above made it very difficult to get a reliable result. We needed the full precision of the computer system and we had to check the library code our algorithms used to make sure rounding errors were correctly dealt with. We were fortunate in that the results of rounding errors were obvious in our calculations, but you might not be so fortunate.


Ratios are more complex, let's define:

z = x/y

with the s values defined as before, then:

sz /z = [(sx/x)2 + (sy/y)2]0.5

This suffers from the same problem as before, under certain conditions, the error can become very significant very quickly. In a system like the Patriot missile, sensor readings are used in some very complex equations. Rounding errors can combine to become very important.

The takeaway is very easy to state: if you're combining numbers using a ratio or subtracting them, rounding (or other errors) can hurt you very badly very quickly.

Insidious rounding errors

Cumulative rounding errors and the wrong type of rounding are widely discussed on the internet, but I've seen two other forms of rounding that have caught people out. They're hard to spot but can be damaging.

Rounding in the wrong places - following general advice too closely

Many technical degrees include some training on how to present errors and significant digits. For example, a quantity like 12.34567890 ∓ 0.12345678 is usually written 12.3 ∓ 0.1. We're told not to include more significant digits than the error analysis warrants. Unfortunately, this advice can lead you astray if you apply it unthinkingly.

Let's say we're taking two measurements:


x = 5.26 ∓0.14
y = 1.04 ∓0.12

following the rules of representing significant digits, this gives us

x = 5.3 ∓0.1
y = 1.0 ∓0.1

If :

z = x/y

then with the pre-rounded numbers:

z = 5.1 ∓ 0.6

but with the rounded numbers we have:

z = 5.3 ∓ 0.5

Whoops! This is a big difference. The problem occurred because we applied the advice unthinkingly. We rounded the numbers prematurely; in calculations, we should have kept the full precision and only shown rounded numbers for display to users.

The advice is simple: preserve full precision in calculations and reserve rounding for numbers shown to users.

Spreadsheet data

Spreadsheets are incredible sources of errors and bugs. One of the insidious things spreadsheets do is round numbers, which can result in numbers appearing not to add up. 

Let's have a look at an example. The left of the table shows numbers before rounding. The right of the table shows numbers with rounding. The numbers on the right don't add up because of rounding (they should sum to 1206). 


  No round   Round
 Jan121.4   Jan  121
 Feb 251.4  Feb 251
 Mar 311.4  Mar 311
 Apr 291.4  Apr 291
 May 141.4  May 141
 Jun 91.4  Jun 91
 TOTAL 1208.4  TOTAL 1208

An insidious problem occurs rounded when numbers are copied from spreadsheets and used in calculations - which is a manifestation of the premature rounding problem I discussed earlier.

1.999... = 2, why 2 != 2, and machine precision

Although it's not strictly a rounding error, I do have to talk about the fact that 1.999... = 2. This result often surprises people, but it's an easy thing to prove. Unfortunately, on machines with finite precision, 1.9999... == 2 will give you False! Just because it's mathematically true, doesn't mean it's true on your system.  

I've seen a handful of cases when two numbers that ought to be the same fail an equality test, the equivalent of 2 == 2 evaluating to False. One of the numbers has been calculated through a repeated calculation and machine precision errors propagate, the other number has been calculated directly. Here's a fun example from Python 3:


1 == (1/7) + (1/7) + (1/7) + (1/7) + (1/7) + (1/7) + (1/7)

evaluates to False!

To get round this problem, I've seen programmers do True/False difference evaluations like this:

abs(a - b) <= machine_precision

The machine precision constant is usually called epsilon.

What to watch for

Cumulative errors in fixed-point systems

The Patriot missile case makes the point nicely: if you're using sensor data in a system using fixed-point arithmetic, or indeed in any computer system, be very careful how your system rounds its inputs. Bear in mind, the rounding might be done in an ADC (analog-to-digital converter) beyond your control - in which case, you need to know how it converts data. If you're doing the rounding, you might need to use some form of dithering.

Default rounding and rounding methods

There are several different rounding methods you can use; your choice should be a deliberate one and you should know their behavior. For example, in Python, you have:

  • floor
  • ceil
  • round - which uses banker's rounding not the school textbook form of rounding and was changed from Python 2 to Python 3.

You should be aware of the properties of each of these rounding methods. If you wanted to avoid the Vancouver Stock Exchange problem, what form of rounding would you choose and why? Are you sure?

A more subtle form of rounding can occur when you mix integers and floating-point numbers in calculations. Depending on your system and the language you use, 7.5/2 can give different answers. I've seen some very subtle bugs involving hidden type conversion, so be careful.

Premature rounding

You were taught to only present numbers to an appropriate numbers of decimal places, but that was only for presentation. For calculations, use the full precision available.

Spreadsheets

Be extremely careful copying numbers from spreadsheets, the numbers may have been rounded and you may need to look closer to get extra digits of precision.

Closing thoughts

Rounding seems like a simple problem that happens to other people, but it can happen to you and it can have serious consequences. Take some time to understand the properties of the system and be especially careful if you're doing cumulative calculations, mixed floating-point and integer calculation, or if you're using a rounding function.

Saturday, May 30, 2020

Inventory: your job may depend on how it's managed

Why should you care about inventory?

For your own job security, you need to understand the financial position of your employer. Their true financial position will govern your pay and promotion prospects. Long before they affect your job, you can spot looming signs of trouble in company financial statements. Learning a little of accounting will also help you understand news stories as we'll see. In this blog post, I'm going to talk about one of the easiest signs of trouble to spot, inventory problems. Because it's always fun, I'm going to include some cases of fraud. Bear in mind, innocent people lose their jobs because of inventory issues; I hope you won't be one of them. 


Inventory
(Is inventory good or bad? It depends. Image credit: Wikimedia Commons. License: public domain.)

Inventory concepts

Inventories are items held for sale or items that will be used to manufacture products. Good examples are the items retailers hold for sale (e.g. clothes, food, books) and the parts manufacturers hold (e.g. parts for use on a car assembly line). On a balance sheet, inventory is listed as a current asset, which means it's something that can be turned into cash 'quickly'. There are different types of accounting inventory, but I won't go into what they are.

Inventory changes can be benign but can be a sign of trouble. Let's imagine a bookseller whose inventory is increasing. Is this good or bad?

  • If the bookseller is expanding (more sales, more shops), then increasing inventory is a sign of success.
  • If the bookseller is not expanding, then increasing inventory is deeply concerning. The bookseller is buying books it can't sell.

There are two ways of valuing inventory, which opens the door to shenanigans. Let's imagine you're a coal-burning power station and you have a stockpile of coal. The price of coal fluctuates. Do you value your stockpile of coal at current market prices or the price that you paid for it? There are two ways of evaluating inventory: FIFO and LIFO.

  • FIFO is first-in, first-out - the first items purchased are the first items sold. Inventory is valued at current market prices.
  • LIFO is last-in, first-out - the last items purchased are the first items sold. Inventory is valued at historic market prices.

If prices are going up, then LIFO increases the cost of goods sold and reduces profitability, conversely, FIFO reduces the cost of goods sold and increases profitability. There are also tax implications for the different inventory evaluation methods. 

Obviously, things are more complex than I've described here but we have enough of the basic ideas, so let's get to the fraud stories.

Inventory shenanigans

OM Group produced specialty chemicals from raw materials, including cobalt. In the early 2000s, cobalt was mostly sourced as a by-product from mines in the Democratic Republic of the Congo, a very unstable part of the world. The price of cobalt was going down and OM Group saw a way of making that work to their advantage. Their first step was to use the LIFO method of valuing their cobalt inventory. The next step was to buy up cheap cobalt and keep buying as the price dropped. Here's what that meant; because they used LIFO, for accounting purposes, the cobalt they used in production was valued at the new (low) market price, so the cost of goods sold went down, so profitability went up! The older (and more expensive) cobalt was kept in inventory. To keep the business profits increasing, they needed the price of cobalt to go down and they needed to buy more of it, regardless of their manufacturing needs. The minute prices went up, or they started eating into inventory, or they stopped buying more cobalt, profitability would fall. To put it simply, the boost to profits was an accounting shell game.

OM Group logo at the time. Image credit: Wikimedia Commons. License: Public Domain.)

As you might expect, the music eventually stopped. The SEC charged some of the executives with fraud and reached a settlement with the company, and there was a class-action lawsuit from some company investors. Unsurprisingly, the company later changed its name when the dust settled. If you want to understand how you could spot something like this, there's a very readable description of the accounting fraud by Gary Mishuris, an analyst who spotted it.  

Manufacturing plants run best when producing at a constant rate, but market demands fluctuate. If demand reduces, then inventory will increase, something that will be obvious in a company's financial statements. How can a company disguise a drop in demand? One illegal way is through something called 'channel stuffing', which is forcing your distributors and resellers to take your unsold inventory so you can record it as sales. 

Semiconductor companies are manufacturers and typically have large distribution networks through resellers covering different geographies or markets. For example, a semiconductor company may sell directly to large electronics companies but may serve smaller electronics companies through resellers, who may, in turn, resell to other distributors and so on.

Between 1995 and 2006, Vitesse Semiconductor used channel stuffing extensively to manage its earnings. It had an arrangement with its distributors that they could unconditionally sell back any chips they had bought and not sold. Here's how channel stuffing worked; if Vitesse needed to increase profits in a quarter, they would require their distributors to buy Vitesse' inventory. This would show up as an increase in Vitesse's sales for that quarter. At some point in the future, the resellers could sell the chips back to Vitesse. The chips themselves might never leave the warehouse, but might have been 'owned' by several different companies. In other words, short-term increases in profitability were driven by an accounting scam.

(Vitesse Semiconductor chip. Image credit: Raimond Spekking via Wikimedia Commons. License: Creative Commons)

Of course, this is all illegal and the SEC took action against the company; the executives were indicted for fraud. Vitesse had to restate their earnings substantially downwards, which in turn triggered a class action lawsuit. This fraud has even made it into fraud textbooks.

I want to stop for a minute and ask you to think. These are entertaining stories, but what if you were an (innocent) employee of OM Group or Vitesse Semiconductor? When the SEC arrests the leadership, what are the implications for employees? When the accounts are restated and profitability takes a nose dive, what do you think the pay and job prospects are like for the rank-and-file workers?

Inventory and politics - Brexit

A while back, I was chatting to a Brexit supporter, when a news report came on the TV; UK economic output had increased, but the increase had gone into inventory, not sales. Manufacturers and others were assuming Brexit would disrupt their supply chain, so they'd increased output to give them a buffer. I was horrified, but the Brexit supporter thought this was great news. After chatting some more, I realized they had no understanding of how inventory worked. Let's talk through some scenarios to understand why the news was bad.

  • Scenario 1: no Brexit supply chain disruption.  UK firms have an excess of inventory. They can either keep the inventory indefinitely (and pay higher costs than their overseas competitors) or they can run down inventory which means fewer hours for their workers.
  • Scenario 2: Brexit supply chain disruption.  UK firms can't get parts, so they run down inventory until supply chain issues are fixed. Selling inventory means fewer hours worked by the workers.

In both scenarios, UK firms have incurred production costs earlier than their overseas competitors, which reduces their cash flow.


(Image credit: Wikimedia Commons - Tim Reckmann. License: Creative Commons.)

This is obviously highly simplified, but you get the point. None of these scenarios are good for firms or for workers.

If increasing inventory without sales is so good, why don't firms do it all the time? In fact, why bother with the pesky business of selling at all when you can just produce for inventory? The question seems silly, but answering it leads you to consider what an optimum level of inventory is. The logic pushes you towards just in time, which leads to an understanding of why supply chain interruptions are bad.

Closing thoughts

Your job security depends on the financial stability of your employer. If you work for a company that produces public accounts, you have an opportunity to make your own risk assessment. Inventory is one factor among many you should watch. Here are some things you should look out for:

  • Changes to inventory evaluation methods (LIFO, FIFO).
  • Increases in inventory not matched to growth.
  • Increasing sales to distributors not matched to underlying market demand (especially when the inventory never leaves the company).

Yes, some companies do produce fraudulent accounts, and yes, some do hide poor performance, but you can still take steps to protect your career based on the cold hard reality of financial statements, not on hype.

Saturday, May 23, 2020

Finding electoral fraud - the democracy data deficit

Why we need to investigate fraud

In July 2016, Fox News' Sean Hannity reported that Mitt Romney received no votes at all in 59 Philadelphia voting precincts in the 2012 Presidential Election. He claimed that this was evidence of vote-rigging - something that received a lot of commentary and on-air discussion at the time. On the face of it, this does sound like outright electoral fraud; in a fair election, how is it possible for a candidate to receive no votes at all? Since then, there have been other allegations of fraud and high-profile actual incidents of fraud. In this blog post, I’m going to talk about how a citizen-analyst might find electoral fraud. But I warn you, you might not like what I’m going to say.

Election organization - the smallest electoral units

In almost every country, the election process is organized in the same way; the electorate is split into geographical blocks small enough to be managed by a team on election day. The blocks might contain one or many polling stations and may have a few hundred to a few thousand voters. These blocks are called different things in different places, for example, districts, divisions, or precincts. Because precinct seems to be the most commonly used word, that's what I'm going to use here. The results from the precincts are aggregated to give results for the ward, county, city, state, or country. The precinct boundaries are set by different authorities in different places, but they're known. 

How to look for fraud

A good place to look for electoral shenanigans is at the precinct level, but what should we look for? There are several easy checks:

  • A large and unexplained increase or decrease in the number of voters compared to previous elections and compared to other nearby precincts. 
  • An unexpected change in voting behavior compared to previous elections/nearby precincts. For example, a precinct that ‘normally’ votes heavily for party Y suddenly voting for party X.
  • Changes in voting patterns for absentee voters e.g. significantly more or less absentee votes or absentee voter voting patterns that are very different from in-person votes.
  • Results that seem inconsistent with the party affiliation of registered voters in the precinct.
  • A result that seems unlikely given the demographics of the precinct.

Of course, none of these checks is a smoking gun, either individually or collectively, but they might point to divisions that should be investigated. Let’s start with the Philadelphia case and go from there.

Electoral fraud - imagined and real

It’s true that some divisions (precincts) in Philadelphia voted overwhelmingly for Obama in 2012. These divisions were small (averaging about 600 voters) and almost exclusively (95%+) African-American. Obama was hugely popular with the African-American community in Philadelphia, polling 93%+. The same divisions also have a history of voting overwhelmingly Democratic. Given these facts, it’s not at all surprising to see no or very few votes for Mitt Romney. Similar arguments hold for allegations of electoral fraud in Cleveland, Ohio in 2012

In fact, there were some unbalanced results the other way too; in some Utah precincts, Obama received no votes at all - again not surprising given the voter population and voter history. 

Although on the face of it these lopsided results seem to strongly indicate fraud, the allegations don't stand up to analytical scrutiny.

Let’s look at another alleged case of electoral fraud, this time in 2018 in North Carolina. The congressional election was fiercely contested and appeared to be narrowly decided in favor of Mark Harris. However, investigators found irregularities in absentee ballots, specifically, missing ballots from predominantly African-American areas. The allegations were serious enough that the election was held again, and criminal charges have been made against a political operative in Mark Harris’ campaign. The allegation is ‘ballot harvesting’, where operatives persuade voters who might vote for their opposition to voting via an absentee ballot and subsequently make these ballots disappear.

My sources of information here are newspaper reports and analysis, but what if I wanted to do my own detective work and find areas where the results looked odd? How might I get the data? This is where things get hard.

Democracy’s data - official sources

To get the demographics of a precinct, I can try going to the US Census Bureau. The Census Bureau defines small geographic areas, called tracts, that they can supply data on. Tract data include income levels, population, racial makeup, etc. Sometimes, these tracts line up with voting districts (the Census term for precincts), but sometimes they don’t. If tracts don’t line up with voting districts, then automated analysis becomes much harder. In my experience, it takes a great investment of time to get any useful data from the Census Bureau; the data’s there, it’s just really hard finding out how to get it. In practice then, it’s extremely difficult for a citizen-analyst to link census data to electoral data.

What about voting results? Surely it’s easy to get electoral result data? As it turns out, this is surprisingly hard too. You might think the Federal Election Commission (FEC) will have detailed data, but it doesn’t. The data available from the FEC for the 2016 Presidential Election is less detailed than the 2016 Presidential Election Wikipedia page. The reason is, Presidential Elections are run by the states, so there are 51 (including Washington DC) separate authorities maintaining electoral results, which means 51 different ways of getting data, 51 different places to get it, and 51 different levels of detail available. The FEC sources its data from the states, so it's not surprising its reports are summary reports.  

If we need more detailed data, we need to go to the states themselves. 

Let's take Massachusetts as an example, Presidential Election data is available for 2016, down to the ward level (as a CSV), but for Utah, data is only available at the county level (as an Excel file), which is the same as Pennsylvania, where the data is only available from a web page. To get detail below the county level may take freedom of information requests, if the information is available at all. 

In effect, this puts precinct-level nationwide voting analysis from official sources beyond almost all citizen-analysts.

Democracy’s data - unofficial sources

In practice, voting data is hard to come by from official sources, but it is available from unofficial sources who've put the work into getting the data from the states and make it available to everyone.

Dave Leip offers election data down to detailed levels; the 2016 results by country will cost you $92 and results by Congressional District will cost you $249, however, high-level results are on his website and available for free. He's even been kind enough to list his sources and URLs if you want to spend the time to duplicate his work. Leip’s data is used by the media in their analysis, and probably by political campaigns too. He’s put in a great deal of work to gather the data and he’s asking for a return on his effort, which is fair enough. 

The MIT Election Data and Science Lab (MEDSL) collects election data, including down to the precinct level and the data is available for the most recent Presidential Election (2016 at the time of writing). As usual with this kind of data, there are all kinds of notes to read before using the data. MIT has also been kind enough to make tools available to analyze the data and they also make available their website scrapping tools.

The MIT project isn't the only project providing data. Various other universities have collated electoral resources at various levels of detail:

Democracy’s data - electoral fraud

What about looking for cases of electoral fraud? There isn't a central repository of electoral fraud cases and there are multiple different court systems in the US (state and federal), each maintaining records in different ways. Fortunately, Google indexes a lot of cases, but often, court transcripts are only available for a fee, and of course, it's extremely time-consuming to trawl through cases.

The Heritage Foundation maintains a database of known electoral fraud cases. They don't claim their database is complete, but they have put a lot of effort into maintaining it and it's the most complete record I know of. 

In 2018, there were elections for the House of Representatives, the Senate, state elections, and of course county and city elections. Across the US, there must have been thousands of different elections in 2018. How many cases of electoral fraud do you think there were? What level of electoral fraud would undermine your faith in the system? In 2018, there were 65 cases. From the Heritage Foundation data, here’s a chart of fraud cases per year for the United States as a whole.

Electoral fraud cases by year
(Electoral fraud cases by year from the Heritage Foundation electoral fraud database)

It does look like there's been an increase in electoral fraud up to about 2010, but bear in mind the dataset cover the period of computerization and the rise of the internet. We might expect a rise in fraud cases because it's easier to find case records. 

Based on this data, there really doesn’t seem to be large-scale electoral fraud in the United States. In fact, in reading the cases on their website, most of them are small-scale frauds concerning local elections (e.g. mayoral elections) - in a lot of cases, the frauds are frankly pathetic. 

Realistic assessment of election data

Official data is either hard to come by or not available at the precinct level, which leaves us using unofficial data. Fortunately, unofficial data is high quality and from reputable sources. The problem is, data from unofficial sources aren't available immediately after an election; there may be a long delay between the election and the data. If one of the goals of electoral data analysis is finding fraud, then timely data availability is paramount.

Of course, this kind of analysis I'm talking about here won't find small-scale fraud, where a person votes more than once or impersonates someone. But small-scale fraud will only affect the outcome of the very tightest of races. Democracy is most threatened by fraud that might affect the results, which in most cases is larger-scale fraud like the North Carolina case. Statistical analysis might detect these kinds of fraud.

Sean Hannity's allegation of electoral fraud in Philadelphia didn't stand up to analysis, but it was worth investigating and is the kind of fraud we could detect using data - if only it were available in a timely way. 

How things could be - a manifesto

Imagine groups of researchers sitting by their computers on election night. As election results at the precinct level are posted online, they analyze the results for oddities. By the next morning, they may have spotted oddities in absentee ballots, or unexplained changes in voting behavior, or unexpected changes in voter turnout - any of which will feed into the news cycle. Greater visibility of anomalies will enable election officials to find and act on fraud more quickly.

To do this will require consistency of reporting at the state level and a commitment to post precinct results as soon as they're counted and accepted. This may sound unlikely, but there are federal standards the states must follow in many other areas, including deodorants, teddy bears, and apple grades, but also for highway construction, minimum drinking age, and the environment. Isn't the transparency of democracy at least as important as deodorants, teddy bears, and apples?

If you liked this post, you might like these ones

Wednesday, May 6, 2020

Florence Nightingale, data analyst

Introduction - why do I care about Florence Nightingale, data analyst?

I've used statistics and data visualizations for a long time now, and in the last few years, I've become increasingly interested in where the methods I use come from. Who were the founding figures of statistics and visualization? Why was their work important? How did their work influence the world? As I've looked back in time, I've found the data science creation stories more interesting than I thought. There were real people who struggled to achieve their goals and used early data science methods to do so. One of these pioneers was Florence Nightingale, more famous for founding modern nursing, but a key figure in analytics and data visualization. What she did and why she did it have clear lessons for analysts today.

(Simon Harriyott from Uckfield, England, CC BY 2.0, via Wikimedia Commons)

Early life

Florence was born on May 12th, 1820, near Florence in Italy. Her parents were wealthy and very well-connected, two factors that were to have a big impact on her later life. As the second daughter, she was expected to have the learning of a woman of her station and to marry well; her family, especially her mother, had a very definite expectation of the role she was to fulfill. Her upbringing was almost like a character from a Jane Austen novel, which was to cause Florence mental health problems.

Initially, the family lived in a fifteen-bedroom house in Derbyshire, but this was too small for them (!) and they wanted to be nearer to London, so they moved to Embley in the New Forest. They also had an apartment in London and spent a lot of time in the city. Given the family connections and their time spent in London, it’s not surprising that Florence met many influential men and women growing up, including future prime ministers and a young Queen Victoria. This was to be crucially important to her later.

Up until she was 12, Florence was educated by a governess, then her father took over her education. Unusually for the time, her father believed in equality of education for women and put considerable effort into educating his daughters [Bostridge]. Notably, she received no formal schooling and never took anything like university lectures or courses, however, she had a precocious intellect and had an appetite for statistics and data. When she was 17, the family took a six-month vacation to Italy, and along the way, Florence recorded their departure and arrival times, the distances they traveled, and kept notes on local conditions and laws [Bostridge, Huxley].

Throughout her life, she was deeply religious, and in her teenage years, she felt a call from God to do something useful, she wanted ‘some regular occupation, for something worth doing instead of frittering time away on useless trifles’ [Huxley]. On the 7th of February 1837, Florence recorded “...God spoke to me and called me to His service”, but what the form of that call was, Florence didn’t note [Bostridge]. This theme of a calling from God was to come up several times in her life.

Bear in mind, Florence’s life was a round of socializing to prepare her for an appropriate marriage, nothing more. For an intellectually gifted woman wanting to make a difference in the world, the tension between the life she wanted and the life she had was immense. It’s not a surprise to hear that she was often withdrawn and on the verge of a nervous breakdown; in modern times, she may well have been diagnosed with depression. By the age of 30, Florence wasn’t married, something that wasn’t respectable - however, she was to shock her family with a very disreputable request.

Introduction to nursing

Florence decided that nursing was her calling, unfortunately, her parents violently objected, and with good reason.

At the time, nursing was considered a disreputable profession. Hospitals were filthy and nurses were both ill-trained and poorly educated. In many cases, their role was little more than cleaning up the hospital messes, and in the worst cases, they were promiscuous with doctors and surgeons [Huxley]. It was also known that nurses were present at operations, which in the 1850s were bloody, gruesome affairs. Even Charles Dickens had a poor view of nurses. In Martin Chuzzlewit, published in 1843, Dickens created a character, Sarah Gamp, who was sloppy, a drunk, and a nurse. Dickens was playing to a well-known stereotype and adding to it.

Nursing as a profession was about as far away from a suitable occupation for Florence as you can imagine. Her family knew all about nursing’s reputation and vigorously objected to Florence having anything to do with it. Her mother in particular opposed Florence learning or practicing nursing for a very long time, going as far as actively blocking Florence’s training. However, Florence could read about nursing and health, which she did copiously.

There was one bright nursing light; the Institution of Deaconesses at Kaiserworth (Germany) was a quasi-religious institute that sought to improve nursing standards. Florence wanted to study there, but her parents stopped her. She managed to go for two weeks in 1850, but only with some shenanigans. Perhaps because of the deception, when she came back, she anonymously published a 32-page pamphlet on her experience which is her first known published work [Nightingale 1851]. After some blazing stand-up rows with her mother, she finally went for three months of training in 1853. Bear in mind, her family still controlled her life, even at this late age.

The discipline at Kaiserworth was harsh and the living conditions were spartan. Days consisted of prayer and patient support, in effect, it was living a religious life while learning nursing, fulfilling two of Florence’s needs. She learned the state of nursing as it stood at the time, even witnessing amputations and other operations, which would have horrified her parents had they known. However, Florence appreciated the limitations of the Kaiserworth system.

On her return to Britain, her appetite for nursing wasn’t diminished, in fact, she read widely about nursing, disease in general, and statistics - broadening her knowledge base. What was missing was an opportunity to practice what she’d learned, which finally arrived in April 1853. 

Through her extensive family connections, she was made superintendent of a new ‘Institution for the Care of Sick Gentlewomen’ based in Harley Street in London. This was a combination of hospital and recuperation unit for sick women, with the goal of providing a better standard of care than was currently offered. With Florence, the founders thought they were getting a hands-off lady of leisure, instead, they got a human dynamo who was waiting to put into practice years of learning and preparation. Not only did Florence do nursing, she also fought on committees to get the funding she needed, became a tough people manager, and put the institution’s finances in order. Under Florence’s guidance, the institution became groundbreaking in simple but effective ways; it treated its patients well, it was clean, and its nurses were professional.

Had she continued in Harley Street, she probably would have still been a founding figure of modern nursing, but events elsewhere were conspiring to thrust her into the limelight and make her a national hero.

The Crimean War

Britain has fought almost every country in Europe many times. Sometimes with the French and sometimes against the French. By the mid-1850s, Britain and France were becoming worried about the influence of Russia in the Middle East, which resulted in the Crimean War, where Britain and France fought Russia [Britannica]. This was a disastrous war for pretty much everyone.

Painting of the Siege of Sevastapol
(Siege of Sevastopol (1854–55), Franz Roubaud)

British troops were shipped to Turkey to fight the Russians. Unfortunately, cholera, diarrhea, and dysentery ripped through the men, resulting in large numbers of casualties before the war had even started; the men were too sick to fight. Of the 30,000 British troops dispatched to Turkey, 1,000 died of disease before a single shot was fired [Bostridge].

Hospitals were squalid and poorly equipped; the main British hospital at Scutari was a national shame; men were trying to recover from their injuries in filthy conditions with poor food and limited supplies. The situation was made worse by bureaucratic blundering and blind rule-following, there were instances of supplies left to rot because committees hadn’t approved their release. By contrast, the French were well-equipped and were running effective field hospitals.

In an early example of embedded journalism, William Howard Russell provided dispatches for The Times exposing the poor treatment of the troops, incompetent management, and even worse, the superiority of the French. His reports riled up the British population, who in turn pressurized politicians to do something; it became politically imperative to take action [Huxley].

Florence in Crimea

War and medicine were male preserves, but politicians needed votes, meaning change came quickly. Russell’s dispatches made it clear that troops were dying in hospital, not on the battlefield, so medical support was needed. This is where Florence’s family connections came in. Sidney Herbert, Secretary at War, wrote to Florence asking her to run nursing operations in the Crimea. The War Office needed to give Florence a title, so they called her ‘Superintendent of the Female Nursing Establishment of the English General Military Hospitals in Turkey’. Nothing like this had ever been done before - women had never been sent to support war - which would cause problems later.

Florence was asked to recruit 50 nurses, but there were no female nurses at all in the British Army, and nursing was in its infancy. She found 14 women with hospital experience and several nuns from various religious orders - 38 women in total. On October 21st, 1854, this rag-tag army set out from England to go to the war in the Crimea.

The conditions they found in the barrack hospital at Scutari were shocking. The place was filthy and vermin-infested, rats were running around in plain view, and even the kitchens weren’t clean. Bedding and clothing weren’t washed, which meant soldiers preferred to keep their existing filthy bedding and clothing rather than changing them for someone else's equally unclean items - better to have your own lice bite you than someone else’s.  Basics like furniture were in short supply, there weren’t even enough tables for operations. Soldiers were left untreated for long periods of time, and there were many cases when maggots weren’t cleaned out of wounds. Unsurprisingly, cholera and dysentery were rampant. The death rate was high. As a further twist, the military wasn’t even using the whole building, the cellars had refugees living in them, and there was a prostitution ring operating there [Huxley].


(The military hospital at Scutari. Image source: The Wellcome Collection. License: Creative Commons.)

Florence wanted to make a difference, but military rules and misogyny prevented her nurses from taking up their duties. Her title was, “Superintendent of the Female Nursing Establishment of the English General Hospitals in Turkey”, but military orders didn’t say what she was to do. This was enough of an excuse for the (male) doctors and surgeons to block her nurses. Despite being blocked, the nurses did what they could to improve things, by ensuring clean bedding and better quality food for example.

Things changed, but for the worst reason. The Battle of Balaclava brought a tidal wave of wounded into the hospital, too many for the existing system to cope with, so the military gave in and let the women in. Florence’s nurses finally got to nurse.

Given her opportunity, Florence moved quickly to establish hygiene, cleanliness, and good nutrition. The rats were dispatched, the tenants in the basement were removed, and food quality was improved. Very unusually for the time, Florence insisted on hand washing, which of itself reduced the death rate [Globalhandwashing]. Back in London, The Times had established a fund to care for wounded soldiers, so Florence had a pot of money to spend as she chose, free of military rules. She set up contracts with local suppliers to improve the food supply, she set up washrooms to clean bedding and clothes, and she provided soldiers with new, clean clothing.

Her nurses tended to the men during the daytime, treating their wounds and ensuring they were clean and cared for. Florence’s administrative work tied her up in the daytime, but she was able to walk the wards at night to check on the men. She nursed them too and stayed with them as they died. Over the winter of 1855/1856, it’s estimated she saw something like 2,000 men die.

To light her way on her nocturnal rounds, she used a Turkish lamp. This is where the legend of the ‘lady with the lamp’ came from. Under desperate conditions, men would see a beacon of hope in the darkness. This is such a strong legend in UK culture that even 170 years later, it still resonates.

Drawing of Florence doing her rounds
(Illustrated London News, 24 Feb 1855, Source: Wikimedia Commons)

The difference Florence’s nurses made was eagerly reported back to the British public who were desperate for a good news story. The story was perfect, a heroine making a difference under terrible conditions while being blocked by the intransigence of military bureaucracy, and the ‘lady with the lamp’ image sold well. The donations came rolling in.

A highly fanciful representation of Florence
(A fanciful depiction of Florence doing her rounds. Creative Commons license.)

In May 1855, Florence got closer to the Crimean War when she toured Balaclava in the Crimea itself. Unfortunately, on 13th May 1855, she collapsed through exhaustion and became gravely ill, suffering fevers and delirium. The word was, she was close to death. On hearing of her condition, it’s said the patients in the Scutari hospital turned towards the wall and wept. Florence recovered, but she continued to suffer debilitating illness for the rest of her long life.

The war finally ended on 30th March 1856, and Florence returned to England in July of the same year. She left an unknown but came back a celebrity.

Florence as a data analyst and statistician

The Crimean War was a disaster for the British military and the public was angry; the political fall-out continued after the war was over and the poor medical treatment the troops received was a hot topic. After some delay, a “Royal Commission on the Health of the Army” was formed to investigate the health of the British Army, and Florence was its powerhouse. Sadly, as a woman, she couldn't formally be appointed to the Commission, so her role was less formal. Despite the informality, she was determined to prove her points with data and to communicate clearly with the public.

In the 1850s, statistics was in its infancy, but there were some early pioneers, including Willam Farr at the General Registry Office who was an early epidemiologist and one of the founders of medical statistics. Of course, Florence was a friend of Farr’s. Farr had introduced the idea of comparing the mortality rates of different occupations, which Florence was to run with [Cohen]. He also had a dismal view of data visualization which Florence disagreed with.

Florence’s stand-out piece of work is her report “Mortality of the British Army: at home and abroad, and during the Russian war, as compared with the mortality of the civil population in England.” which was appended to the Commission's main report. She knew she needed to reach the general public who wouldn’t read a huge and dull tome, she had to make an impact quickly and clearly, and she did so through the use of tables and data visualization. Bear in mind, the use of charts was in its infancy.

Here's one of the tables from her report, it's startlingly modern in its presentation. The key column is the one on right, the excess of deaths in the army compared to the general population. The excess deaths weren't due to warfare.

Incredibly, the excess of deaths was due to disease as we can see in the table below. The death rate for the general population for 'chest and tubercular disease' was 4.5 per 1,000, but for the army, it was 10.1. Tubercular disease isn't a disease of war, it's a disease of poor living conditions and poor sanitation.

The report is full of these kinds of tables, presented in a clear and compelling way that helped tell the terrible story: the British Army was killing its own soldiers through neglect.

Of course, tables are dry; charts make a more immediate impression and Florence used bar charts to great effect. Here's a bar chart of death by age group for the British Army (red) and the general population (black). Bear in mind, the period leading up to the Crimean War was peaceful - there were no major engagements, so the excess deaths aren't battle casualties. In fact, as Florence showed in the tables and in the charts, these excess death were avoidable.

In private, Florence was more forceful about the effect of poor medical treatment on the strength of the army. Salisbury Plain was (and is), a big British Army practice area, and she said: "it is as criminal to have a mortality of 17, 19, and 20 per thousand in the Line, Artillery and Guards, when in civilian life it is on 11 per thousand as it would be to take 1,100 men every year out upon Salisbury Plain and shoot them" [Kopf].

The death toll is shocking in human terms, but it also has a profound impact in terms of the army's efficiency, fighting ability, and recruitment needs. Men dying early means a loss of experience and a continued high need for recruitment. Florence illustrated the impact of early deaths with a pair of charts I've shown below.

The chart on the left showed the effect of disease at home on the army. The chart on the right showed what would happen if death rates came down to those of the general population. If people didn't care about lives, they might care about the strength of the army and do something about medical care.

The Royal Commission wasn't the end of it. A little later, Florence produced yet another report, "Notes on matters affecting the health, efficiency, and hospital administration of the British Army: founded chiefly on the experience of the late war". This report is notable because it contains the famous coxcomb plot. If you read anything about Florence and visualization online, this is what you'll find. I'm going to take some time to explain it because it's so fundamental in the history of data visualization.

(I should note that Florence never called these plots coxcomb plots, the use of the term came far later and not from her. However, the internet calls these charts coxcomb plots and I'm going to follow the herd for now.)

The visualization takes its name from the comb on a rooster's head.

(Image credit: Lander. Source. License Creative Commons.)

There are two coxcomb plots in the report, appearing on the same pull-out page. To make it easier to understand them, I'm going to show you the two plots separately.

The plot is divided into twelve segments, one for each month from April 1854 to March 1855. The area of each segment represents the number of deaths. The red wedges are deaths from wounds, the blue (gray in the image) represents deaths from preventable diseases, and the black wedges are deaths from other causes. You can plainly see the battle deaths. But what's really shocking is the number of deaths from preventable diseases. Soldiers are dying in battle, but many more of them are dying from preventable diseases. In other words, the soldiers didn't have to die.

Here's the other part of the diagram, from April 1855 to March 1856 (the end of the war) - not to scale with the previous plot.

Interestingly, Florence preferred the coxcomb plots to bar charts because she felt they were more mathematically accurate.

Although William Farr was an advisor to Florence and involved in building the coxcomb plots, he wasn't a fan of data visualization. He advised her that 'statistics should be as dry as possible' [Bostridge]. But Florence's aim was influencing the public, not a stone-cold presentation of data. In the introduction, I said there were lessons that modern analysts could learn from Florence, and this is the key one: you have to communicate your results clearly to a general audience to influence opinion and effect change.

The lessons from Florence's analysis are very clear: the men in the British Army were dying through poor treatment. They were dying at home, and dying after battle. The disaster in the Crimea was avoidable.

The Commission had far-reaching effects, specifically, the radical restructuring of the British Army's healthcare system, including the construction of a new army hospital. Florence had firm views on hospital design, which the new hospital didn't meet. Unfortunately, by the time she was involved in the project, it was too late to change some of the design basics, but she did manage to make it less bad. Radical reform doesn't happen overnight, and that was the case here. 

Florence's friend, Lord Herbert carried out a series of reforms over many years. Unfortunately, he died 1861. Two years later, Florence published a monograph in his honor, "Army Sanitary Administration, and Its Reform under the Late Lord Herbert", which included more charts and data [McDonald]. As before, Florence's goal was communication, but this time communicating the impact her friend and collaborator had on saving lives.

Florence was famous by the 1860s, famous enough to have an early photograph taken.


Florence and nursing

Quite rightly, Florence is considered one of the founding figures of modern nursing. She wrote a short book (75 pages), called "Notes on nursing: what it is and what it is not", which was by far her most widely read publication and stayed in print for a long time. In 1860, St Thomas's hospital in London opened a nursing school with Florence as an advisor, this was the "Nightingale Training School for Nurses", which was to set the standard for nursing education.

Florence and public health

The illness she picked up in the Crimea prevented her from traveling but didn't prevent her from absorbing data and influencing public health. In 1859, she took part in a Royal Commission, the "Royal Commission on the Sanitary State of the Army in India", which aimed to do for the British Army in India what the previous Royal Commission did for the Army in Britain. Sadly, the story was the same as the Crimea, poor health leading to premature death. Once again, Florence illustrated her work with visualizations and statistics. 

This report is notable for another type of visualization: woodcut drawings. Royal Commission reports are known to be dull, worthy affairs, but Florence wanted her work to be read and she knew she had to reach a wider audience (the same lesson about communicating effectively to create change). Her relative, Hilary Bonham Carter, drew the woodcuts she included in her report. The Treasury balked at the printing costs and wanted the report without the woodcuts, but Florence knew that some people would only read the report for the woodcuts, so she insisted they be included. Her decision was the right one, by communicating clearly, she was more effective in winning reforms.

(Image source: Wikimedia Commons)

Sadly, as a woman, Florence couldn't formally be part of the Commission, despite her huge input.

To use statistics to understand what's going on requires agreement and consistency in data collection. If different authorities record illnesses differently, then there can be no comparison and no change. Florence realized the need for consistent definitions of disease and proposed a classification scheme that was endorsed by the International Statistical Congress, held in London in 1860 [Magnello]. Sadly, only a few hospitals adopted her scheme and an opportunity to improve healthcare through data was lost.

Hospital design 

In 1859, Florence's writings on hospital design were consolidated into a book 'Notes on Hospitals' which led her to become the leading authority on hospital design.  Many British cities asked her to consult on their proposed hospital-building programs, as did the Government of India, the Queen of Holland, and the King of Portugal.

Decline and death

She never enjoyed good health after the Crimea, and never again traveled far from home. In her later years, she spent her time at home with her cats, occasionally doling out nursing or public health advice. In her last few years, her mental acuity fell away, and she retreated from public life. She died in 1910, aged 90.

(Florence shortly before her death in 1910. Lizzie Caswall Smith. Source: Wikimedia Commons.)

Florence as a Victorian

Florence was very much a product of her time and her class, she wasn't a feminist icon and she wasn't an advocate for the working classes - in many ways, she was the reverse [Stanley]. I've read some quotes from her which are quite shocking to modern ears [Bostridge]. However, I'm with the historians here, we have to understand people in their context and not expect them to behave in modern ways or judge them against modern standards.

Florence’s legacy

During her life, she received numerous honors, and the honors continued after her death.

The Royal Statistical Society was founded in 1834 as the Statistical Society of London, and Florence became its first female member in 1858 and was elected a Fellow in 1859. The American Statistical Association gave her honorary membership in 1874.

The Queen’s head appears on all British banknotes, but on the other side, there’s usually someone of historical note. On the £10 note, from 1975-1992, it was Florence Nightingale, the first woman to be featured on a banknote [BoE].

(UK £10 note)

For a very long time, many British hospitals have had a Nightingale ward. Things went a step further in response to the coronavirus pandemic; the British Army turned large conference centers into emergency hospitals for the infected, for example, the ExCel Center in London was turned into a hospital in nine days. Other large conference venues in the UK were also converted. The name of these hospitals? Nightingale Hospitals.

Her legend and what it says about society

Florence Nightingale is a revered figure in nursing, and rightly so, but her fame in the UK extends beyond the medical world to the general population. She’s known as the founder of nursing, and the story of the “lady with the lamp” still resonates. But less well-known is her analysis work on soldiers’ deaths during the war, her work on hospital design, and her role in improving public health. She probably saved more lives with her work after Crimea than she did during the Crimean War. Outside of the data analytics world, her ground-breaking visualizations are largely unknown. In my view, there’s definitely gender stereotyping going on; it’s fine for a woman to be a caring nurse, but not fine for her to be a pioneering public health analyst. Who society chooses as its heroes is very telling, but what society chooses to celebrate about them is even more telling.

The takeaways for analysts

I've read a lot on Florence's coxcomb charts, but less on her use of tables, and even less on her use of woodcut illustrations. The discussions mostly miss the point; Florence used these devices as a way of communicating a clear message to a wide audience, her message was all about the need for change. The diagrams weren't the goal, they were a means to an end - she spent a lot of time thinking about how to present data meaningfully; a lesson modern analysts should take to heart.

References

[BofE] https://www.bankofengland.co.uk/museum/noteworthy-women/historical-women-on-banknotes
[Bostridge] Mark Bostridge, “Florence Nightingale The Making Of An Icon”, Farrar, Straus, and Giroux, New York, 2008
[Britannica] https://www.britannica.com/event/Crimean-War
[Cohen] I Bernard Cohen, "Florence Nightingale", Scientific American, 250(3):128-137, March 1984 
[Kopf] Edwin Kopf, "Florence Nightingale as Statistician", Publications of the American Statistical Association, Vol. 15, No. 116 (Dec., 1916), pp. 388-404
[Globalhandwashing] https://globalhandwashing.org/about-handwashing/history-of-handwashing/
[Huxley] Elspeth Huxley, “Florence Nightingale”, G.P. Putnam’s Sons, New York, 1975
[Magnello] https://plus.maths.org/content/florence-nightingale-compassionate-statistician 
[McDonald] https://rss.onlinelibrary.wiley.com/doi/10.1111/1740-9713.01374
[Nightingale 1851] Florence Nightingale, “The institution of Kaiserswerth on the Rhine, for the practical training of deaconesses”, 1851
[Stanley] David Stanley, Amanda Sherratt, "Lamp light on leadership: clinical leadership and Florence Nightingale", Journal of Nursing Management, 18, 115–121, 2010