Saturday, May 23, 2020

Finding electoral fraud - the democracy data deficit

Why we need to investigate fraud

In July 2016, Fox News' Sean Hannity reported that Mitt Romney received no votes at all in 59 Philadelphia voting precincts in the 2012 Presidential Election. He claimed that this was evidence of vote-rigging - something that received a lot of commentary and on-air discussion at the time. On the face of it, this does sound like outright electoral fraud; in a fair election, how is it possible for a candidate to receive no votes at all? Since then, there have been other allegations of fraud and high-profile actual incidents of fraud. In this blog post, I’m going to talk about how a citizen-analyst might find electoral fraud. But I warn you, you might not like what I’m going to say.

 National Museum of American History, Public domain, via Wikimedia Commons

Election organization - the smallest electoral units

In almost every country, the election process is organized in the same way; the electorate is split into geographical blocks small enough to be managed by a team on election day. The blocks might contain one or many polling stations and may have a few hundred to a few thousand voters. These blocks are called different things in different places, for example, districts, divisions, or precincts. Because precinct seems to be the most commonly used word, that's what I'm going to use here. The results from the precincts are aggregated to give results for the ward, county, city, state, or country. The precinct boundaries are set by different authorities in different places, but they're known. 

How to look for fraud

A good place to look for electoral shenanigans is at the precinct level, but what should we look for? There are several easy checks:

  • A large and unexplained increase or decrease in the number of voters compared to previous elections and compared to other nearby precincts. 
  • An unexpected change in voting behavior compared to previous elections/nearby precincts. For example, a precinct that ‘normally’ votes heavily for party Y suddenly voting for party X.
  • Changes in voting patterns for absentee voters e.g. significantly more or less absentee votes or absentee voter voting patterns that are very different from in-person votes.
  • Results that seem inconsistent with the party affiliation of registered voters in the precinct.
  • A result that seems unlikely given the demographics of the precinct.

Of course, none of these checks is a smoking gun, either individually or collectively, but they might point to divisions that should be investigated. Let’s start with the Philadelphia case and go from there.

Electoral fraud - imagined and real

It’s true that some divisions (precincts) in Philadelphia voted overwhelmingly for Obama in 2012. These divisions were small (averaging about 600 voters) and almost exclusively (95%+) African-American. Obama was hugely popular with the African-American community in Philadelphia, polling 93%+. The same divisions also have a history of voting overwhelmingly Democratic. Given these facts, it’s not at all surprising to see no or very few votes for Mitt Romney. Similar arguments hold for allegations of electoral fraud in Cleveland, Ohio in 2012

In fact, there were some unbalanced results the other way too; in some Utah precincts, Obama received no votes at all - again not surprising given the voter population and voter history. 

Although on the face of it these lopsided results seem to strongly indicate fraud, the allegations don't stand up to analytical scrutiny.

Let’s look at another alleged case of electoral fraud, this time in 2018 in North Carolina. The congressional election was fiercely contested and appeared to be narrowly decided in favor of Mark Harris. However, investigators found irregularities in absentee ballots, specifically, missing ballots from predominantly African-American areas. The allegations were serious enough that the election was held again, and criminal charges have been made against a political operative in Mark Harris’ campaign. The allegation is ‘ballot harvesting’, where operatives persuade voters who might vote for their opposition to voting via an absentee ballot and subsequently make these ballots disappear.

My sources of information here are newspaper reports and analysis, but what if I wanted to do my own detective work and find areas where the results looked odd? How might I get the data? This is where things get hard.

Democracy’s data - official sources

To get the demographics of a precinct, I can try going to the US Census Bureau. The Census Bureau defines small geographic areas, called tracts, that they can supply data on. Tract data include income levels, population, racial makeup, etc. Sometimes, these tracts line up with voting districts (the Census term for precincts), but sometimes they don’t. If tracts don’t line up with voting districts, then automated analysis becomes much harder. In my experience, it takes a great investment of time to get any useful data from the Census Bureau; the data’s there, it’s just really hard finding out how to get it. In practice then, it’s extremely difficult for a citizen-analyst to link census data to electoral data.

What about voting results? Surely it’s easy to get electoral result data? As it turns out, this is surprisingly hard too. You might think the Federal Election Commission (FEC) will have detailed data, but it doesn’t. The data available from the FEC for the 2016 Presidential Election is less detailed than the 2016 Presidential Election Wikipedia page. The reason is, Presidential Elections are run by the states, so there are 51 (including Washington DC) separate authorities maintaining electoral results, which means 51 different ways of getting data, 51 different places to get it, and 51 different levels of detail available. The FEC sources its data from the states, so it's not surprising its reports are summary reports.  

If we need more detailed data, we need to go to the states themselves. 

Let's take Massachusetts as an example, Presidential Election data is available for 2016, down to the ward level (as a CSV), but for Utah, data is only available at the county level (as an Excel file), which is the same as Pennsylvania, where the data is only available from a web page. To get detail below the county level may take freedom of information requests, if the information is available at all. 

In effect, this puts precinct-level nationwide voting analysis from official sources beyond almost all citizen-analysts.

Democracy’s data - unofficial sources

In practice, voting data is hard to come by from official sources, but it is available from unofficial sources who've put the work into getting the data from the states and make it available to everyone.

Dave Leip offers election data down to detailed levels; the 2016 results by country will cost you $92 and results by Congressional District will cost you $249, however, high-level results are on his website and available for free. He's even been kind enough to list his sources and URLs if you want to spend the time to duplicate his work. Leip’s data is used by the media in their analysis, and probably by political campaigns too. He’s put in a great deal of work to gather the data and he’s asking for a return on his effort, which is fair enough. 

The MIT Election Data and Science Lab (MEDSL) collects election data, including down to the precinct level and the data is available for the most recent Presidential Election (2016 at the time of writing). As usual with this kind of data, there are all kinds of notes to read before using the data. MIT has also been kind enough to make tools available to analyze the data and they also make available their website scrapping tools.

The MIT project isn't the only project providing data. Various other universities have collated electoral resources at various levels of detail:

Democracy’s data - electoral fraud

What about looking for cases of electoral fraud? There isn't a central repository of electoral fraud cases and there are multiple different court systems in the US (state and federal), each maintaining records in different ways. Fortunately, Google indexes a lot of cases, but often, court transcripts are only available for a fee, and of course, it's extremely time-consuming to trawl through cases.

The Heritage Foundation maintains a database of known electoral fraud cases. They don't claim their database is complete, but they have put a lot of effort into maintaining it and it's the most complete record I know of. 

In 2018, there were elections for the House of Representatives, the Senate, state elections, and of course county and city elections. Across the US, there must have been thousands of different elections in 2018. How many cases of electoral fraud do you think there were? What level of electoral fraud would undermine your faith in the system? In 2018, there were 65 cases. From the Heritage Foundation data, here’s a chart of fraud cases per year for the United States as a whole.

Electoral fraud cases by year
(Electoral fraud cases by year from the Heritage Foundation electoral fraud database)

It does look like there's been an increase in electoral fraud up to about 2010, but bear in mind the dataset cover the period of computerization and the rise of the internet. We might expect a rise in fraud cases because it's easier to find case records. 

Based on this data, there really doesn’t seem to be large-scale electoral fraud in the United States. In fact, in reading the cases on their website, most of them are small-scale frauds concerning local elections (e.g. mayoral elections) - in a lot of cases, the frauds are frankly pathetic. 

Realistic assessment of election data

Official data is either hard to come by or not available at the precinct level, which leaves us using unofficial data. Fortunately, unofficial data is high quality and from reputable sources. The problem is, data from unofficial sources aren't available immediately after an election; there may be a long delay between the election and the data. If one of the goals of electoral data analysis is finding fraud, then timely data availability is paramount.

Of course, this kind of analysis I'm talking about here won't find small-scale fraud, where a person votes more than once or impersonates someone. But small-scale fraud will only affect the outcome of the very tightest of races. Democracy is most threatened by fraud that might affect the results, which in most cases is larger-scale fraud like the North Carolina case. Statistical analysis might detect these kinds of fraud.

Sean Hannity's allegation of electoral fraud in Philadelphia didn't stand up to analysis, but it was worth investigating and is the kind of fraud we could detect using data - if only it were available in a timely way. 

How things could be - a manifesto

Imagine groups of researchers sitting by their computers on election night. As election results at the precinct level are posted online, they analyze the results for oddities. By the next morning, they may have spotted oddities in absentee ballots, or unexplained changes in voting behavior, or unexpected changes in voter turnout - any of which will feed into the news cycle. Greater visibility of anomalies will enable election officials to find and act on fraud more quickly.

To do this will require consistency of reporting at the state level and a commitment to post precinct results as soon as they're counted and accepted. This may sound unlikely, but there are federal standards the states must follow in many other areas, including deodorants, teddy bears, and apple grades, but also for highway construction, minimum drinking age, and the environment. Isn't the transparency of democracy at least as important as deodorants, teddy bears, and apples?

If you liked this post, you might like these ones

Saturday, May 16, 2020

The Emperor's new objects: a two-year failed project

It’s a sad thing when people you admire turn out not to be what you thought they were andwhen you find out your heroes have feet of clay. I had an experience like that a few years ago and it taught me a lot.

(Image credit: Old Book Illustrations)

I worked for a company that decided to go for object-oriented technology in a big way. They hired a team of people to design and build components that the lesser people (e.g. me and my colleagues) would snap together to build systems more quickly. Let’s call this team Team X and the team leader Mr. Y. Team X was to be a team of superstars leading the company forward.

I admired Team X and Mr. Y greatly. They had all the skills I wanted and every time I walked past them, they were diligently designing systems; I was impressed by their energy. I used to chat to them once a week and they used to like it because I was so clearly in awe of what they were doing. Mr. Y was at great pains to show me their work, though it was plain I didn’t understand it.

Some time passed and I had the opportunity to learn object-oriented programming. I did a design course on UML, and I learned C++ and became familiar with all the key tools. I put a huge amount of effort in and went from knowing nothing to knowing a lot in a short space of time.

It had been a while since I dropped in on Team X and Mr. Y, so I said hello. As before, Mr. Y showed me their designs and work. This time, I knew about object-oriented design. Unfortunately, I immediately spotted several mistakes in their design and several places where their approach would have been very inefficient. I was foolish enough to point them out. At first, Mr. Y conceded I was right, but as I pointed out more problems, he got irritated. Eventually, I realized I wasn’t welcome anymore and left. I didn’t stop by and chat again.

After this, Team X locked down their designs and prevented anyone from viewing them, they even angled their screens away from people passing by so no-one could see their work. They stopped communicating. In the meantime, after chatting with my boss, I realized Team X didn’t know what they were doing. My boss counseled me to keep quiet and say nothing, which I did after some moaning. He told me my chances of getting real change were zero and that I would just damage my reputation by speaking out.

About six months later, Team X was all laid off. The company gossip was, they’d been employed for two years and produced absolutely nothing of value, not a single piece of executable code. The company got nothing for several people’s efforts over the entire time.

Ironically, the only group that actually produced an object-oriented system was the group I was in. We used C++ to create useful systems that actually did something, and we produced the systems in a few months with just a handful of people, none of whom claimed to be a superstar. We didn't do the huge analysis Team X did, we just built systems to meet our customers' needs.

What were my takeaways from all of this?

  • Results are what counts.
  • If you have a team working for you using a technology, make sure you understand it well enough to measure their progress.
  • Deliverables are important and you should manage projects to have some results at checkpoints throughout the project. This is the soundest way of measuring progress. Never, ever leave long gaps between deliverables.
  • A team that hides its work is a major red flag.
  • Choose your battles and only choose ones you can win or that are important. Companies do silly things all the time, but eventually, they get corrected, whether you intervene or not.

Wednesday, May 6, 2020

Florence Nightingale, data analyst

Introduction - why do I care about Florence Nightingale, data analyst?

I've used statistics and data visualizations for a long time now, and in the last few years, I've become increasingly interested in where the methods I use come from. Who were the founding figures of statistics and visualization? Why was their work important? How did their work influence the world? As I've looked back in time, I've found the data science creation stories more interesting than I thought. There were real people who struggled to achieve their goals and used early data science methods to do so. One of these pioneers was Florence Nightingale, more famous for founding modern nursing, but a key figure in analytics and data visualization. What she did and why she did it have clear lessons for analysts today.

(Simon Harriyott from Uckfield, England, CC BY 2.0, via Wikimedia Commons)

Early life

Florence was born on May 12th, 1820, near Florence in Italy. Her parents were wealthy and very well-connected, two factors that were to have a big impact on her later life. As the second daughter, she was expected to have the learning of a woman of her station and to marry well; her family, especially her mother, had a very definite expectation of the role she was to fulfill. Her upbringing was almost like a character from a Jane Austen novel, which was to cause Florence mental health problems.

Initially, the family lived in a fifteen-bedroom house in Derbyshire, but this was too small for them (!) and they wanted to be nearer to London, so they moved to Embley in the New Forest. They also had an apartment in London and spent a lot of time in the city. Given the family connections and their time spent in London, it’s not surprising that Florence met many influential men and women growing up, including future prime ministers and a young Queen Victoria. This was to be crucially important to her later.

Up until she was 12, Florence was educated by a governess, then her father took over her education. Unusually for the time, her father believed in equality of education for women and put considerable effort into educating his daughters [Bostridge]. Notably, she received no formal schooling and never took anything like university lectures or courses, however, she had a precocious intellect and had an appetite for statistics and data. When she was 17, the family took a six-month vacation to Italy, and along the way, Florence recorded their departure and arrival times, the distances they traveled, and kept notes on local conditions and laws [Bostridge, Huxley].

Throughout her life, she was deeply religious, and in her teenage years, she felt a call from God to do something useful, she wanted ‘some regular occupation, for something worth doing instead of frittering time away on useless trifles’ [Huxley]. On the 7th of February 1837, Florence recorded “...God spoke to me and called me to His service”, but what the form of that call was, Florence didn’t note [Bostridge]. This theme of a calling from God was to come up several times in her life.

Bear in mind, Florence’s life was a round of socializing to prepare her for an appropriate marriage, nothing more. For an intellectually gifted woman wanting to make a difference in the world, the tension between the life she wanted and the life she had was immense. It’s not a surprise to hear that she was often withdrawn and on the verge of a nervous breakdown; in modern times, she may well have been diagnosed with depression. By the age of 30, Florence wasn’t married, something that wasn’t respectable - however, she was to shock her family with a very disreputable request.

Introduction to nursing

Florence decided that nursing was her calling, unfortunately, her parents violently objected, and with good reason.

At the time, nursing was considered a disreputable profession. Hospitals were filthy and nurses were both ill-trained and poorly educated. In many cases, their role was little more than cleaning up the hospital messes, and in the worst cases, they were promiscuous with doctors and surgeons [Huxley]. It was also known that nurses were present at operations, which in the 1850s were bloody, gruesome affairs. Even Charles Dickens had a poor view of nurses. In Martin Chuzzlewit, published in 1843, Dickens created a character, Sarah Gamp, who was sloppy, a drunk, and a nurse. Dickens was playing to a well-known stereotype and adding to it.

Nursing as a profession was about as far away from a suitable occupation for Florence as you can imagine. Her family knew all about nursing’s reputation and vigorously objected to Florence having anything to do with it. Her mother in particular opposed Florence learning or practicing nursing for a very long time, going as far as actively blocking Florence’s training. However, Florence could read about nursing and health, which she did copiously.

There was one bright nursing light; the Institution of Deaconesses at Kaiserworth (Germany) was a quasi-religious institute that sought to improve nursing standards. Florence wanted to study there, but her parents stopped her. She managed to go for two weeks in 1850, but only with some shenanigans. Perhaps because of the deception, when she came back, she anonymously published a 32-page pamphlet on her experience which is her first known published work [Nightingale 1851]. After some blazing stand-up rows with her mother, she finally went for three months of training in 1853. Bear in mind, her family still controlled her life, even at this late age.

The discipline at Kaiserworth was harsh and the living conditions were spartan. Days consisted of prayer and patient support, in effect, it was living a religious life while learning nursing, fulfilling two of Florence’s needs. She learned the state of nursing as it stood at the time, even witnessing amputations and other operations, which would have horrified her parents had they known. However, Florence appreciated the limitations of the Kaiserworth system.

On her return to Britain, her appetite for nursing wasn’t diminished, in fact, she read widely about nursing, disease in general, and statistics - broadening her knowledge base. What was missing was an opportunity to practice what she’d learned, which finally arrived in April 1853. 

Through her extensive family connections, she was made superintendent of a new ‘Institution for the Care of Sick Gentlewomen’ based in Harley Street in London. This was a combination of hospital and recuperation unit for sick women, with the goal of providing a better standard of care than was currently offered. With Florence, the founders thought they were getting a hands-off lady of leisure, instead, they got a human dynamo who was waiting to put into practice years of learning and preparation. Not only did Florence do nursing, she also fought on committees to get the funding she needed, became a tough people manager, and put the institution’s finances in order. Under Florence’s guidance, the institution became groundbreaking in simple but effective ways; it treated its patients well, it was clean, and its nurses were professional.

Had she continued in Harley Street, she probably would have still been a founding figure of modern nursing, but events elsewhere were conspiring to thrust her into the limelight and make her a national hero.

The Crimean War

Britain has fought almost every country in Europe many times. Sometimes with the French and sometimes against the French. By the mid-1850s, Britain and France were becoming worried about the influence of Russia in the Middle East, which resulted in the Crimean War, where Britain and France fought Russia [Britannica]. This was a disastrous war for pretty much everyone.

Painting of the Siege of Sevastapol
(Siege of Sevastopol (1854–55), Franz Roubaud)

British troops were shipped to Turkey to fight the Russians. Unfortunately, cholera, diarrhea, and dysentery ripped through the men, resulting in large numbers of casualties before the war had even started; the men were too sick to fight. Of the 30,000 British troops dispatched to Turkey, 1,000 died of disease before a single shot was fired [Bostridge].

Hospitals were squalid and poorly equipped; the main British hospital at Scutari was a national shame; men were trying to recover from their injuries in filthy conditions with poor food and limited supplies. The situation was made worse by bureaucratic blundering and blind rule-following, there were instances of supplies left to rot because committees hadn’t approved their release. By contrast, the French were well-equipped and were running effective field hospitals.

In an early example of embedded journalism, William Howard Russell provided dispatches for The Times exposing the poor treatment of the troops, incompetent management, and even worse, the superiority of the French. His reports riled up the British people, who in turn pressured politicians to do something; it became politically imperative to take action [Huxley].

Florence in Crimea

War and medicine were male preserves, but politicians needed votes, meaning change came quickly. Russell’s dispatches made it clear that troops were dying in hospital, not on the battlefield, so medical support was needed. This is where Florence’s family connections came in. Sidney Herbert, Secretary at War, wrote to Florence asking her to run nursing operations in the Crimea. The War Office needed to give Florence a title, so they called her ‘Superintendent of the Female Nursing Establishment of the English General Military Hospitals in Turkey’. Nothing like this had ever been done before - women had never been sent to support war - which would cause problems later.

Florence was asked to recruit 50 nurses, but there were no female nurses at all in the British Army, and nursing was in its infancy. She found 14 women with hospital experience and several nuns from various religious orders - 38 women in total. On October 21st, 1854, this rag-tag army set out from England to go to the war in the Crimea.

The conditions they found in the barrack hospital at Scutari were shocking. The place was filthy and vermin-infested, rats were running around in plain view, and even the kitchens weren’t clean. Bedding and clothing weren’t washed, which meant soldiers preferred to keep their existing filthy bedding and clothing rather than changing them for someone else's equally unclean items - better to have your own lice bite you than someone else’s.  Basics like furniture were in short supply, there weren’t even enough tables for operations. Soldiers were left untreated for long periods of time, and there were many cases when maggots weren’t cleaned out of wounds. Unsurprisingly, cholera and dysentery were rampant. The death rate was high. As a further twist, the military wasn’t even using the whole building, the cellars had refugees living in them, and there was a prostitution ring operating there [Huxley].


(The military hospital at Scutari. Image source: The Wellcome Collection. License: Creative Commons.)

Florence wanted to make a difference, but military rules and misogyny prevented her nurses from taking up their duties. Her title was, “Superintendent of the Female Nursing Establishment of the English General Hospitals in Turkey”, but military orders didn’t say what she was to do. This was enough of an excuse for the (male) doctors and surgeons to block her nurses. Despite being blocked, the nurses did what they could to improve things, by ensuring clean bedding and better quality food for example.

Things changed, but for the worst reason. The Battle of Balaclava brought a tidal wave of wounded into the hospital, too many for the existing system to cope with, so the military gave in and let the women in. Florence’s nurses finally got to nurse.

Given her opportunity, Florence moved quickly to establish hygiene, cleanliness, and good nutrition. The rats were dispatched, the tenants in the basement were removed, and food quality was improved. Very unusually for the time, Florence insisted on hand washing, which of itself reduced the death rate [Globalhandwashing]. Back in London, The Times had established a fund to care for wounded soldiers, so Florence had a pot of money to spend as she chose, free of military rules. She set up contracts with local suppliers to improve the food supply, she set up washrooms to clean bedding and clothes, and she provided soldiers with new, clean clothing.

Her nurses tended to the men during the daytime, treating their wounds and ensuring they were clean and cared for. Florence’s administrative work tied her up in the daytime, but she was able to walk the wards at night to check on the men. She nursed them too and stayed with them as they died. Over the winter of 1855/1856, it’s estimated she saw something like 2,000 men die.

To light her way on her nocturnal rounds, she used a Turkish lamp. This is where the legend of the ‘lady with the lamp’ came from. Under desperate conditions, men would see a beacon of hope in the darkness. This is such a strong legend in UK culture that even 170 years later, it still resonates.

Drawing of Florence doing her rounds
(Illustrated London News, 24 Feb 1855, Source: Wikimedia Commons)

The difference Florence’s nurses made was eagerly reported back to the British public who were desperate for a good news story. The story was perfect, a heroine making a difference under terrible conditions while being blocked by the intransigence of military bureaucracy, and the ‘lady with the lamp’ image sold well. The donations came rolling in.

A highly fanciful representation of Florence
(A fanciful depiction of Florence doing her rounds. Creative Commons license.)

In May 1855, Florence got closer to the Crimean War when she toured Balaclava in the Crimea itself. Unfortunately, on 13th May 1855, she collapsed through exhaustion and became gravely ill, suffering fevers and delirium. The word was, she was close to death. On hearing of her condition, it’s said the patients in the Scutari hospital turned towards the wall and wept. Florence recovered, but she continued to suffer debilitating illness for the rest of her long life.

The war finally ended on 30th March 1856, and Florence returned to England in July of the same year. She left an unknown but came back a celebrity.

Florence as a data analyst and statistician

The Crimean War was a disaster for the British military and the public was angry; the political fall-out continued after the war was over and the poor medical treatment the troops received was a hot topic. After some delay, a “Royal Commission on the Health of the Army” was formed to investigate the health of the British Army, and Florence was its powerhouse. Sadly, as a woman, she couldn't formally be appointed to the Commission, so her role was less formal. Despite the informality, she was determined to prove her points with data and to communicate clearly with the public.

In the 1850s, statistics was in its infancy, but there were some early pioneers, including Willam Farr at the General Registry Office who was an early epidemiologist and one of the founders of medical statistics. Of course, Florence was a friend of Farr’s. Farr had introduced the idea of comparing the mortality rates of different occupations, which Florence was to run with [Cohen]. He also had a dismal view of data visualization which Florence disagreed with.

Florence’s stand-out piece of work is her report “Mortality of the British Army: at home and abroad, and during the Russian war, as compared with the mortality of the civil population in England.” which was appended to the Commission's main report. She knew she needed to reach the general public who wouldn’t read a huge and dull tome, she had to make an impact quickly and clearly, and she did so through the use of tables and data visualization. Bear in mind, the use of charts was in its infancy.

Here's one of the tables from her report, it's startlingly modern in its presentation. The key column is the one on right, the excess of deaths in the army compared to the general population. The excess deaths weren't due to warfare.

Incredibly, the excess of deaths was due to disease as we can see in the table below. The death rate for the general population for 'chest and tubercular disease' was 4.5 per 1,000, but for the army, it was 10.1. Tubercular disease isn't a disease of war, it's a disease of poor living conditions and poor sanitation.

The report is full of these kinds of tables, presented in a clear and compelling way that helped tell the terrible story: the British Army was killing its own soldiers through neglect.

Of course, tables are dry; charts make a more immediate impression and Florence used bar charts to great effect. Here's a bar chart of death by age group for the British Army (red) and the general population (black). Bear in mind, the period leading up to the Crimean War was peaceful - there were no major engagements, so the excess deaths aren't battle casualties. In fact, as Florence showed in the tables and in the charts, these excess death were avoidable.

In private, Florence was more forceful about the effect of poor medical treatment on the strength of the army. Salisbury Plain was (and is), a big British Army practice area, and she said: "it is as criminal to have a mortality of 17, 19, and 20 per thousand in the Line, Artillery and Guards, when in civilian life it is on 11 per thousand as it would be to take 1,100 men every year out upon Salisbury Plain and shoot them" [Kopf].

The death toll is shocking in human terms, but it also has a profound impact in terms of the army's efficiency, fighting ability, and recruitment needs. Men dying early means a loss of experience and a continued high need for recruitment. Florence illustrated the impact of early deaths with a pair of charts I've shown below.

The chart on the left showed the effect of disease at home on the army. The chart on the right showed what would happen if death rates came down to those of the general population. If people didn't care about lives, they might care about the strength of the army and do something about medical care.

The Royal Commission wasn't the end of it. A little later, Florence produced yet another report, "Notes on matters affecting the health, efficiency, and hospital administration of the British Army: founded chiefly on the experience of the late war". This report is notable because it contains the famous coxcomb plot. If you read anything about Florence and visualization online, this is what you'll find. I'm going to take some time to explain it because it's so fundamental in the history of data visualization.

(I should note that Florence never called these plots coxcomb plots, the use of the term came far later and not from her. However, the internet calls these charts coxcomb plots and I'm going to follow the herd for now.)

The visualization takes its name from the comb on a rooster's head.

(Image credit: Lander. Source. License Creative Commons.)

There are two coxcomb plots in the report, appearing on the same pull-out page. To make it easier to understand them, I'm going to show you the two plots separately.

The plot is divided into twelve segments, one for each month from April 1854 to March 1855. The area of each segment represents the number of deaths. The red wedges are deaths from wounds, the blue (gray in the image) represents deaths from preventable diseases, and the black wedges are deaths from other causes. You can plainly see the battle deaths. But what's really shocking is the number of deaths from preventable diseases. Soldiers are dying in battle, but many more of them are dying from preventable diseases. In other words, the soldiers didn't have to die.

Here's the other part of the diagram, from April 1855 to March 1856 (the end of the war) - not to scale with the previous plot.

Interestingly, Florence preferred the coxcomb plots to bar charts because she felt they were more mathematically accurate.

Although William Farr was an advisor to Florence and involved in building the coxcomb plots, he wasn't a fan of data visualization. He advised her that 'statistics should be as dry as possible' [Bostridge]. But Florence's aim was influencing the public, not a stone-cold presentation of data. In the introduction, I said there were lessons that modern analysts could learn from Florence, and this is the key one: you have to communicate your results clearly to a general audience to influence opinion and effect change.

The lessons from Florence's analysis are very clear: the men in the British Army were dying through poor treatment. They were dying at home, and dying after battle. The disaster in the Crimea was avoidable.

The Commission had far-reaching effects, specifically, the radical restructuring of the British Army's healthcare system, including the construction of a new army hospital. Florence had firm views on hospital design, which the new hospital didn't meet. Unfortunately, by the time she was involved in the project, it was too late to change some of the design basics, but she did manage to make it less bad. Radical reform doesn't happen overnight, and that was the case here. 

Florence's friend, Lord Herbert carried out a series of reforms over many years. Unfortunately, he died 1861. Two years later, Florence published a monograph in his honor, "Army Sanitary Administration, and Its Reform under the Late Lord Herbert", which included more charts and data [McDonald]. As before, Florence's goal was communication, but this time communicating the impact her friend and collaborator had on saving lives.

Florence was famous by the 1860s, famous enough to have an early photograph taken.


Florence and nursing

Quite rightly, Florence is considered one of the founding figures of modern nursing. She wrote a short book (75 pages), called "Notes on nursing: what it is and what it is not", which was by far her most widely read publication and stayed in print for a long time. In 1860, St Thomas's hospital in London opened a nursing school with Florence as an advisor, this was the "Nightingale Training School for Nurses", which was to set the standard for nursing education.

Florence and public health

The illness she picked up in the Crimea prevented her from traveling but didn't prevent her from absorbing data and influencing public health. In 1859, she took part in a Royal Commission, the "Royal Commission on the Sanitary State of the Army in India", which aimed to do for the British Army in India what the previous Royal Commission did for the Army in Britain. Sadly, the story was the same as the Crimea, poor health leading to premature death. Once again, Florence illustrated her work with visualizations and statistics. 

This report is notable for another type of visualization: woodcut drawings. Royal Commission reports are known to be dull, worthy affairs, but Florence wanted her work to be read and she knew she had to reach a wider audience (the same lesson about communicating effectively to create change). Her relative, Hilary Bonham Carter, drew the woodcuts she included in her report. The Treasury balked at the printing costs and wanted the report without the woodcuts, but Florence knew that some people would only read the report for the woodcuts, so she insisted they be included. Her decision was the right one, by communicating clearly, she was more effective in winning reforms.

(Image source: Wikimedia Commons)

Sadly, as a woman, Florence couldn't formally be part of the Commission, despite her huge input.

To use statistics to understand what's going on requires agreement and consistency in data collection. If different authorities record illnesses differently, then there can be no comparison and no change. Florence realized the need for consistent definitions of disease and proposed a classification scheme that was endorsed by the International Statistical Congress, held in London in 1860 [Magnello]. Sadly, only a few hospitals adopted her scheme and an opportunity to improve healthcare through data was lost.

Hospital design 

In 1859, Florence's writings on hospital design were consolidated into a book 'Notes on Hospitals' which led her to become the leading authority on hospital design.  Many British cities asked her to consult on their proposed hospital-building programs, as did the Government of India, the Queen of Holland, and the King of Portugal.

Decline and death

She never enjoyed good health after the Crimea, and never again traveled far from home. In her later years, she spent her time at home with her cats, occasionally doling out nursing or public health advice. In her last few years, her mental acuity fell away, and she retreated from public life. She died in 1910, aged 90.

(Florence shortly before her death in 1910. Lizzie Caswall Smith. Source: Wikimedia Commons.)

Florence as a Victorian

Florence was very much a product of her time and her class, she wasn't a feminist icon and she wasn't an advocate for the working classes - in many ways, she was the reverse [Stanley]. I've read some quotes from her which are quite shocking to modern ears [Bostridge]. However, I'm with the historians here, we have to understand people in their context and not expect them to behave in modern ways or judge them against modern standards.

Florence’s legacy

During her life, she received numerous honors, and the honors continued after her death.

The Royal Statistical Society was founded in 1834 as the Statistical Society of London, and Florence became its first female member in 1858 and was elected a Fellow in 1859. The American Statistical Association gave her honorary membership in 1874.

The Queen’s head appears on all British banknotes, but on the other side, there’s usually someone of historical note. On the £10 note, from 1975-1992, it was Florence Nightingale, the first woman to be featured on a banknote [BoE].

(UK £10 note)

For a very long time, many British hospitals have had a Nightingale ward. Things went a step further in response to the coronavirus pandemic; the British Army turned large conference centers into emergency hospitals for the infected, for example, the ExCel Center in London was turned into a hospital in nine days. Other large conference venues in the UK were also converted. The name of these hospitals? Nightingale Hospitals.

Her legend and what it says about society

Florence Nightingale is a revered figure in nursing, and rightly so, but her fame in the UK extends beyond the medical world to the general population. She’s known as the founder of nursing, and the story of the “lady with the lamp” still resonates. But less well-known is her analysis work on soldiers’ deaths during the war, her work on hospital design, and her role in improving public health. She probably saved more lives with her work after Crimea than she did during the Crimean War. Outside of the data analytics world, her ground-breaking visualizations are largely unknown. In my view, there’s definitely gender stereotyping going on; it’s fine for a woman to be a caring nurse, but not fine for her to be a pioneering public health analyst. Who society chooses as its heroes is very telling, but what society chooses to celebrate about them is even more telling.

The takeaways for analysts

I've read a lot on Florence's coxcomb charts, but less on her use of tables, and even less on her use of woodcut illustrations. The discussions mostly miss the point; Florence used these devices as a way of communicating a clear message to a wide audience, her message was all about the need for change. The diagrams weren't the goal, they were a means to an end - she spent a lot of time thinking about how to present data meaningfully; a lesson modern analysts should take to heart.

References

[BofE] https://www.bankofengland.co.uk/museum/noteworthy-women/historical-women-on-banknotes
[Bostridge] Mark Bostridge, “Florence Nightingale The Making Of An Icon”, Farrar, Straus, and Giroux, New York, 2008
[Britannica] https://www.britannica.com/event/Crimean-War
[Cohen] I Bernard Cohen, "Florence Nightingale", Scientific American, 250(3):128-137, March 1984 
[Kopf] Edwin Kopf, "Florence Nightingale as Statistician", Publications of the American Statistical Association, Vol. 15, No. 116 (Dec., 1916), pp. 388-404
[Globalhandwashing] https://globalhandwashing.org/about-handwashing/history-of-handwashing/
[Huxley] Elspeth Huxley, “Florence Nightingale”, G.P. Putnam’s Sons, New York, 1975
[Magnello] https://plus.maths.org/content/florence-nightingale-compassionate-statistician 
[McDonald] https://rss.onlinelibrary.wiley.com/doi/10.1111/1740-9713.01374
[Nightingale 1851] Florence Nightingale, “The institution of Kaiserswerth on the Rhine, for the practical training of deaconesses”, 1851
[Stanley] David Stanley, Amanda Sherratt, "Lamp light on leadership: clinical leadership and Florence Nightingale", Journal of Nursing Management, 18, 115–121, 2010

Saturday, May 2, 2020

It's a mugs' game: corporate failures, ceramics, and t-shirts

Mike's law of business

One day, I'm going to write some laws of business. One of them's going to be: "any corporate initiative where mugs or other swag is given to employees is doomed to failure." Let me tell you some sorry tales that led me to my law.

A failed campaign

I worked for a company where employee engagement was becoming an issue. At the time, the management fad was harnessing employee ideas to move companies forward. The company held an all-hands meeting where we were told about a new corporate initiative; we (the staff) were all going to have ideas about how the company could make more money. To show how serious the executive team was, they gave us all mugs. The mugs said "I'm Making A Difference" (MAD), had a pterodactyl on them, and the company name at the bottom; though what the MAD logo and the pterodactyl had to do with anything was never explained.  In the few sessions we ever had to discuss ideas, the staff focused on better management, which the executive team didn't like very much.  The mugs were the only part of the scheme that lasted.

(Enron mug available from Amazon. I never worked for Enron!)

Let's kill the competition!

At another company, there was severe competitive pressure, so the company created a development team whose mission was to create a competition killer. Close to the killer's release date, the people on the team were given mugs (which were actually really cool) and posters. Unfortunately, the killer failed miserably in the market. Much later, I found out that there were serious problems with the killer project prior to release, which led to poor team morale. The mugs and posters were a failed attempt to turn things around.

(Topologically speaking, mugs are the same as donuts. Image credit: Lucas Vieira License:Public Domain. Image source.)

Let's boost morale

As I've written elsewhere, I was at a company where a quality project imploded massively. When it became obvious the initiative was failing, the leadership floated ideas to revive the project, which, as you might have guessed, included giving staff mugs with the quality standard's name on them.

By this stage in my career, I was starting to see the link; only failed projects and initiatives hand out mugs.

(A mug from 3,700 years ago - maybe the ancient Greeks had failed projects too. Image Credit: British Museum. License: Creative Commons.)

Fleeces are the new mugs

But, as it turned out, mugs=failure wasn't entirely true. I've seen an exciting trend over the last few years. Instead of giving out mugs, companies have started to give out t-shirts or occasionally, fleeces or backpacks. In some cases, I've seen t-shirts with project logos, but more often than not, they're just corporate t-shirts. When it comes time for a quick morale boost, nowadays it's generic corporate swag all the way.  Maybe I should change my law to "giving out corporate swag might indicate impending failure".

(Not a corporate fleece. Photographer: Mike Nass License: Creative Commons Source: Wikimedia)

I'm guessing that when the COVID-19 pandemic abates, there will be a large increase in companies giving their staff mugs, or t-shirts, or fleeces, or backpacks. This is swag as heroin for morale.

The wrong cure

I know I'm being unfair to mugs and swag in general here, they're a symptom of diseases, but not the cause. The diseases are mismanagement, poor communication, and failed projects. Mugs and corporate swag are often an attempt at boosting morale when things are going too badly wrong to ignore. Unfortunately, swag isn't enough, and sometimes, it's the wrong thing to do - which is why I've come to associate mugs with failure.

Instead of marking a new beginning, mugs often mark the end; they become tombstones, not birthstones.

Saturday, April 25, 2020

The worst technical debt ever

Over the last few years, I've heard engineering teams rightly talk about technical debt and its consequences. Even non-technical executives are starting to understand its importance and the need to invest to avoid it. The other day as I was setting up a computer, I was reminded of the worst case I've ever seen of technical debt. I thought the story was worth telling here, but with a few details obscured to protect the guilty.

A few years ago, I visited one of company X's data centers. The data center was located in an older building in a slightly run-down part of town. The data center was a little hard to find because it wasn't marked in any way - there was nothing at all that made the building stand out. Outside the building, there was some trash on the sidewalk, including remnants of last night's take-outs that people had dropped on the street as they partied.

Once inside, things were different. Security at the entrance was shabby, but efficient and effective and we got through quickly. The interior was clean, but it was obvious the building hadn't been decorated in several years. Even the coffee machines had seen better days, but they worked.

We were given a detailed tour of the data center and built a good relationship with our guide. The data center had been one of the company's first and had been on the same site for several years. As you might expect, there were racks and racks of computers with technicians walking around fixing things and installing cables to connect new computers to the network. The air conditioning was loud and strong, which meant you had to be close to one another to talk - which also meant it was impossible to overhear conversations.

Late in the tour, I tripped on a loose floor tile that was a centimeter or two raised above the floor. Our guide apologized and told us we needed to be careful as we walked along. We asked why. This is where we discovered the technical debt.

Connecting computers in a data center means installing a physical cable from one computer (or router etc.) to another. You can either route the cable under the floor or on overhead trackways. Most data centers use some form of color-coded cables so you have some indication of what kind of data a cable's carrying (red cables mean one sort of data, blue another, yellow another, and so on). Some even go further and give unique labels or identifiers to cables, so you can identify a cable's pathway from end to end. Routing cables is something of an art form, and in fact, there's a sub-Reddit devoted to it: https://www.reddit.com/r/cableporn/ - from time to time I look at the pictures when I need an ordered view of the world. As you might expect, there's a sub-Reddit that focuses on the reverse: https://www.reddit.com/r/cablegore/.

Our guide told us that right from the start, the management at the data center wanted to save money and do things quickly. From time to time, routers and servers were moved or removed. Instead of removing the old cable, they just left it under the false floor and added the new cable on top of it. New cable was laid on top of old cable in any order or in any fashion, so long as the job was done cheaply and quickly, it was fine. Over time, the layers of cabling built up and up, like the strata in the rock you see at the Grand Canyon. You could even see when the company changed its cable supplier because the cable shade changed a little. Unfortunately, they always chose the same color cable (which happened to be the cheapest).

After a few years, management realized that leaving the old cable in place was a bad idea, so they instructed staff to try and remove the old cables. Unfortunately, there was so much cabling present, and it had been laid so haphazardly, it was physically impossible because the cables were so intertwined. In a few cases, they'd tried to pull up old cables by physical force, but this caused the insulation to be stripped off cables and connections failed. Obviously, leaving old cable connections just hanging around is a bad idea, so the management team told the technicians to cut off the ends of old cables as far along as they could. This meant that the old dead cable was left in place under the floor, but it all looked fine on the surface. Because the cabling ran under the floor, a superficial inspection would show that everything was working fine, especially because they'd cut the old cables as far back as they could.

Sweeping things under the rug went on for a while longer, but there was only so much false floor. By the time of my tour, there was no more space, in fact, the situation was so bad, the floor tiles wouldn't sit properly in their supports anymore. That's why we were tripping over tiles. When no one was looking, our tour guide removed one of the floor tiles to show us the cabling underneath. I was horrified by what I saw.

(Not the actual cables - but gives you a flavor of what I saw. Image source: https://commons.wikimedia.org/wiki/File:Pougny,_electric_cables_(4).jpg. License: Creative Commons. Photographer: Jean-Pierre)

Cables were packed together with no room at all between them. They had obviously been laid across each other with no organization. It was as if a demented person had been knitting with cables leaving no gaps. There was no give in the cables and it was plain it was more or less a solid mass down to the real floor.  By my estimate, the cabling went to a depth of 30cm or more. I could clearly see why it was impossible to pull out old cables: cables had no markings, so you couldn't tell them apart; they were so intertwined you couldn't unpick them, and there were so many cables, they were too heavy to lift. In fact, there was no room under the floor to do any kind of maintenance.

There were some gaps in the cables though. Our guide told us that the data center was starting to have a vermin problem. Of course, there was a ready supply of food outside, and rats and mice had found sufficiently large gaps in the cabling to set up home.

I asked what happened when they needed to connect up computers now there wasn't any room under the floor to lay anything. Our guide showed us some men working round the corner. They had stepladders and were installing overhead cable ducting. This time, the cables were properly color-coded and properly installed. It was a thing of beauty to see the ordered way they were working and how they'd laid out the cables. The cables were also individually labeled, making the removal of old cables much easier.

The next obvious question was, what about the old cable under the floor? The plan seemed to be to sweep everything under the rug. Create new overhead connections until all of the old connections were unnecessary and then leave the old cables and forget about it.

To his credit, our guide seemed ashamed of the whole thing. He seemed like a decent man who had been forced into doing bad things by poor management decisions. Notably, we never saw senior management on our tour.

A while later, I heard the data center was temporarily closed for improvements. These improvements went on for many months and I never heard exactly what they were. I suspect the executive team was embarrassed by the whole thing once they found out the extent of the problem and ordered a proper cleanup. At the time of my tour, I wondered about the fire risk, and obviously having a vermin problem is never a good thing for any business, so maybe something bad happened that made the problem impossible to ignore.

I heard a rumor sometime later that the data center had passed an external quality inspection and received some form of quality certification. I can see how this might have happened; their new processes actually seemed decent, and if they could make the floor tiles sit flat, they could hide the horror under the floor. Most quality inspections focus on paperwork trails and the inspectors I've met didn't seem like the kind of people who would want to get their hands dirty by lifting floor tiles.

So what did I learn from all of this?

  • Technical debt is real. You eventually have to pay for short-term time and money-saving decisions. There's never a good time to pay and the longer you leave it, the bigger and more expensive the fix becomes.
  • Just because something's been done a certain way for a long time, doesn't mean it's good. It might just mean the problems haven't surfaced yet.
  • If you're inspecting something, always get your hands dirty and always talk to the people doing the work. Things may look good on the outside, but might be rotten underneath. If we hadn't established a good rapport with our guide and I hadn't tripped on the floor tile, we would never have discovered the cable issue.
  • If something looks bad, look carefully for the cause. It would have been easy to blame the technicians for the cable nightmare, but it wasn't their fault. They were responding to the demands placed on them by their management. Ultimately, management is the cause of most failures.

Saturday, April 11, 2020

How to be more persuasive when you speak: using ‘catchphrases’

One of the most famous speeches in history used ‘catchphrases’ for incredibly powerful effect; you’ll know the speech by its catchphrase alone. I’ve seen modern American politicians use the same rhetorical technique to heighten energy and to unify and drive home their message. You can use it in your speeches too; I’m going to show you how and why.

Like many rhetorical techniques, this one relies on the considered use of repetition. Specifically, it’s the repetition of a phrase or sentence throughout a speech as a kind of catchphrase.

Let me give you an example. Let’s say you’re an engineering leader and you’re trying to convince your team to take data security seriously. Using this technique, your speech might look something like this (catchphrase in bold).

If we lapse in securing our data, our company can be fined large amounts of money, putting our livelihoods at risk. By being secure, we prevent this from happening.

Security is our security.

If we have a data breach, our reputation will be sullied and it’ll be harder for us to win new business, with all that entails.

Security is our security,

Companies have suffered data breaches of employee data too, putting social security numbers and other personal information out on the web for the highest bidder.

Security is our security,

Speakers use this approach to draw the audience’s attention to a key theme again and again and again, they use it to unify and focus a speech. It drives the point home in a forceful, but elegant way.

My real example is by an influential African-American Christian preacher. He repeats one of the most famous lines in rhetoric as a catchphrase again and again. You’ll know it as soon as you hear it – in fact, you already know the words. Here's the YouTube link to the appropriate section.


(Image credit: WikiMedia Commons, open-source)

Here’s part of his speech, the catchphrase is in bold.

I have a dream that my four little children will one day live in a nation where they will not be judged by the color of their skin but by the content of their character.

I have a dream today.

I have a dream that one day, down in Alabama, with its vicious racists, with its governor having his lips dripping with the words of interposition and nullification; one day right there in Alabama, little black boys and black girls will be able to join hands with little white boys and white girls as sisters and brothers.

I have a dream today.

Martin Luther King repeats ‘I have a dream’ to bring the listener back to his point and to reinforce his message. ‘I have a dream – paragraph – ‘I have a dream’ – paragraph – ‘I have a dream’ - paragraph. He unifies his speech and drives home his point. (King’s speech is rhetorically interesting in other ways too; he uses a wide variety of techniques to make his points.)

I’ve done my homework on rhetoric and searched for this method in the books on techniques from antiquity. As far as I can tell, this technique is known as epimone. It's not one of the famous techniques and I think it's very underrated.

It seems to be used a lot in African-American Christian preaching and has spread to American politics from there. (As an aside, I've looked for resources on the analysis of rhetorical techniques used in African-American churches, but I've not been able to find any good ones. If anyone knows of some good analysis, please let me know.) I've heard a well-known American politician use it and I suspect we'll be hearing it more as we head into election season. Bear in mind that politicians use techniques like this deliberately because they know they work.

Here’s my recommendation for using this technique; if you’re trying to persuade or emotionally influence an audience, use it to hammer home your message and provide a simple unifying concept for people to take to heart.

Sunday, April 5, 2020

Sherlock Holmes, business books, Nazi fighters, and survivor bias

In the Sherlock Holmes story, Silver Blaze, Holmes solved the case by a deduction from something that didn’t happen. In the story, the dog didn’t bark, implying the dog knew the horse thief. This a neat twist on something called survivor bias, the best example of which is Abraham Wald’s analysis of surviving bombers. I’ll talk about how survivor bias rears its ugly head and tell you about Wald and what he did.

Survivor bias occurs when we look at the survivors of some process and we try to deduce something about their commonality without considering external factors. An obvious example might be collating details on lottery winners’ lives in an attempt to figure out what factors might lead to winning the lottery. For example, I might study what day of the week and time of day winners bought their tickets, I might look at where winning tickets were bought, and I might look at the age and gender of winners. From this, I might conclude that to improve my chances of winning the lottery I need to be a 45-year-old man who buys tickets at 3:40pm on Wednesday afternoon at a gas station. But the lottery is a random process and all we've done is analyze who's playing, not the causes of winning. Put like this, it seems almost incredible that anyone could have problems with survivor bias, but survivor bias doesn’t always manifest itself in obvious ways.

Let’s imagine I want to write a bestselling business book unraveling the secrets of how to win at business. I could select businesses that have been successful over several years and look for factors they have in common. I might call my book “Building excellent businesses that last”. As you surely know, there have been several bestselling books based on this premise. Unfortunately, they age like milk; it turns out that most of the companies these books identify as winners subsequently performed poorly - which is a regression to the mean. The problem is, other factors may have contributed to these businesses' success, for example, the competitive environment, new product innovation, a favorable economy, and so on. Any factors I derived from commonalities between winning companies today are just like an analysis of the common factors of lottery winners. By focusing on (current) winners, the door is open to survivor bias [Shermer, Jones].

The most famous example of survivor bias is Wald and the bombers. It is a little cliched to tell the story, but it’s such a great story, I’m going to tell it again, but my way.

Abraham Wald (1902-1950) was a great mathematician who made contributions to many fields, including econometrics, statistics, and geometry. A Hungarian Jew, he suffered discrimination and persecution while looking for work in Vienna, Austra in 1938, and so emigrated with his family to New York. During World War II, he worked in the Statistical Research Group at Columbia University. This is where he was given the task of improving bomber survivability; where should armor go to best protect bombers given that armor is heavy and not everywhere can be protected [Wallis]?

Not all bombers came home after bombing runs over Nazi-occupied Europe. Nazi fighter planes attacked bombers on the way out and the way back, and of course, they shot down many planes. To help his analysis, Wald had data on where surviving planes were hit. The image below is a modern simulation of the kind of data he had to work with; to be clear, this is not the exact data Wald had, it’s simulated data. The visualization shows where the bullet holes were on returning planes. If you had this data, where would you put the extra armor to ensure more planes came home?


(Simulated data on bomber aircraft bullet holes. Source: Wikipedia - McGeddon, License: Creative Commons)

Would you say, put the extra armor where the most bullet holes are? Doesn’t that seem the most likely answer?

Wrong.

This is the most famous example of survivor bias - and it’s literally about survival. Wald made the reasonable assumption that bullets would hit the plane randomly, remember, this is 1940’s technology and aerial combat was not millimeter precision. This means the distribution of bullet holes should be more or less even on planes. The distribution he saw was not even - there were few bullet holes in the engine and cockpit, but he was looking at surviving planes. His conclusion was, planes that were hit in key places did not survive. Look at the simulated visualization above - did you notice the absence of bullet holes in the engine areas? If you got hit in an engine, you didn’t come home. This is the equivalent of the dog that didn’t bark in the night. The conclusion was of course to armor the places where there were not bullet holes.

A full appreciation of survivor bias will mean you're more skeptical of many self-help books. A lot of them proceed on the same lines: let's take some selected group of people, for example, successful business people or sports people, and find common factors or habits. By implication, you too can become a winning athlete or business person or politician just by adopting these habits. But how many people had all these habits or traits and didn't succeed? All Presidents breathe, but if you breathe, does this mean you'll become President? Of course, this is ludicrous, but many self-help books are based on similar assumptions, it's just harder to spot.

Survivor bias manifests itself on the web with e-commerce. Users visit websites and make purchases or not. We can view those who make purchases as survivors. One way of increasing the number of buyers (survivors) is to focus on their commonalities, but as we’ve seen, this can give us biased results, or even the wrong result. A better way forward may be to focus on the selection process (the web page) and understand how that’s filtering users; in other words, understanding why people didn't buy.

One of the things I like about statistics in business is that correctly applying what seems like esoteric ideas can lead to real monetary impact, and survivor bias is a great example.