Tuesday, October 6, 2020

Faster Python BI app development through code generation

Back to the future: design like it's 1999

Back in 1999, you could build Visual Basic apps by dragging and dropping visual components (widgets) onto a canvas. The Visual Basic IDE handled all the code generation, leaving you with the task of wiring up your new GUI to your business data. It wasn't just Visual Basic though, you could do the same thing with Visual C++ and other Microsoft versions of languages. The generated code wasn't the prettiest, but it worked, and it meant you could get the job done quickly.

(Microsoft Visual Basic. Image credit: Microsoft.)

Roll forward twenty years. Python is now very popular and people are writing all kinds of software using it, including software that needs UIs. Of course, the UI front-end is now the browser, which is another change. Sadly, nothing like the UI-building capabilities of the Microsoft Visual Studio IDE exists for Python; you can't build Python applications by dragging and dropping widgets onto a canvas.

Obviously, BI tools like Tableau and Qlik fulfill some of the need to quickly build visualization tools; they've inherited the UI building crown from Microsoft. Unfortunately, they run out of steam when the analysis is complex; they have limited statistical capabilities and they're not good as general-purpose programming languages.

If your apps are 'simple', obviously, Tableau or Qlik are the way to go. But what happens if your apps involve more complex analysis, or if you have data scientists who know Python but not Tableau?

What would it take to make a Visual Basic or Tableau-like app builder for Python? Could we build something like it?

Start with the end in mind

The end goal is to have a drag-and-drop interface that looks something like this.

(draw.io. Image credit: draw.io.)

On the left-hand side of the screenshot, there's a library of widgets the user can drag and drop onto a canvas. 

Ideally, we'd like to be able to design a multi-tabbed application and move widgets onto each tab from a library. We'd do all the visualization layout on the GUI editor and maybe set up some of the properties for the widgets from the UI too. For example, we might set up the table column names, or give a chart a title and axis titles. When we're done designing, we could press a button and generate outline code that would create an application with the (dummy) UI we want.

A step further would be to import existing Python code into the UI editor and move widgets from tab to tab, add new widgets, or delete unwanted widgets.

Conceptually, all the technology to do this exists right now, just not in one place. Unfortunately, it would take considerable effort to produce something like it. 

If we can't go all the way, can we at least go part of the way?

A journey of a thousand miles begins with a single step

A first step is code generation from a specification. The idea is simple: you define your UI in a specification file that software uses to generate code. 

For this first simple step (and the end goal), there are two things to bear in mind:

  • Almost all UI-based applications can be constructed using a Model-View-Controller architecture (pattern) or something that looks like it.
  • Python widgets are similar and follow well-known rules. For example, the widgets in Bokeh follow an API; a button follows certain rules, a dropdown menu follows certain rules and so on.

Given that there are big patterns and small patterns here, we could use a specification file to generate code for almost all UI-based applications.

I've created software that does this, and I'm going to tell you about it.

JSON and the argonauts

Here's an overview of how my code generation software works.

  • The Model-View-Controller code exists as a series of templates, with key features added at code generation time.
  • The application is specified in a JSON file. The JSON file contains details of each tab in the application, along with details of the widgets on the tab. The JSON file must follow certain rules; for example, no duplicate names.
  • Most of the rules for code generation are in a JSON schema file that contains details for each Bokeh widget. For example, the JSON schema has rules for how to implement a button, including how to create a callback function for a button.

Here's how it works in practice.

  1. The user creates a specification file in JSON. The JSON file has details of:
    • The overall project (name, copyright, author etc.)
    • Overall data for each tab (e.g. name of each tab and a description of what it does).
    • For each tab, there's a specification for each widget, giving its name, its argument, and a comment on what it does.
  2. The system checks the user's JSON specification file for consistency (well-formed JSON etc.)
  3. Using a JSON schema file that contains the rules for constructing Bokeh widgets, the system generates code for each Bokeh widget in the specification.
    • For each widget that could have a callback, the system generates the callback code.
    • For complex widgets like DataTable and FileInput, the system generates skeleton example code that shows how to implement the widget. In the DataTable case, it sets up a dummy data source and table columns.
  4. The system then adds the generated code to the Model-View-Controller templates and generates code for the entire project.
    • The generated code is PEP8 compliant by design.
The generated code is runnable, so you can test out how the UI looks.

Here's an excerpt from the JSON schema defining the rules for building widgets:

            "allOf":[

                    {

                      "$comment":"███ Button ███",

                      "if":{

                        "properties":{

                          "type":{

                            "const":"Button"

                          }

                        }

                      },

                      "then":{

                        "properties":{

                          "name":{

                            "$ref":"#/definitions/string_template_short"

                          },

                          "description":{

                            "$ref":"#/definitions/string_template_long"

                          },

                          "type":{

                            "$ref":"#/definitions/string_template_short"

                          },

                          "arguments":{

                            "type":"object",

                            "additionalProperties":false,

                            "required":[

                              "label"

                            ],

                            "properties":{

                              "label":{

                                "type":"string"

                              },

                              "sizing_mode":{

                                "type":"string",

                                "default":"stretch_width"

                              },

                              "button_type":{

                                "type":"string",

                                "default":"success"

                              }

                            }

                          },

Here's an excerpt from the JSON file defining an application's UI:

{

      "name":"Manage data",

      "description":"Panel to manage data sources.",

      "widgets":[

        {

          "name":"ECV year allocations",

          "description":"Displays the Electoral College Vote allocations by year.",

          "type":"TextInput",

          "disabled":true,

          "arguments":{

            "title":"Electoral College Vote allocations by year in system",

            "value":"No allocations in system"

          }

        },

        {

          "name":"Election results",

          "description":"Displays the election result years in the system.",

          "type":"TextInput",

          "disabled":true,

          "arguments":{

            "title":"Presidential Election results in system",

            "value":"No allocations in system"

          }

What this means in practice

Using this software, I can very rapidly prototype BI-like applications. The main task left is wiring up the widgets to the business data in the Model part of the Model-View-Controller architecture. This approach reduces the tedious part of UI development but doesn't entirely eliminate it. It also helps with widgets like DataTable that require a chunk of code to get them working - this software generates most of that code for you.

How things could be better

The software works, but not as well as it could:

  • It doesn't do layout. Laying out Bokeh widgets is a major nuisance and a time suck. 
  • The stubs for Bokeh DataTable are too short - ideally, the generated code should contain more detail which would help reduce the need to write code.
  • The Model-View-Controller architecture needs some cleanup.

The roadmap

I have a long shopping list of improvements:

  • Better Model-View-Controller
  • Robust exception handling in the generated code
  • Better stubs for Bokeh widgets like DataTable
  • Automatic Sphinx documentation
  • Layout automation

Is it worth it?

Yes and no.

For straightforward apps, it will still be several times faster to write apps in Tableau or Qlik. But if the app requires more statistical firepower, or complex analysis, or linkage to other systems, then Python wins and this approach is worth taking. If you have access to Python developers, but not Tableau developers, then once again, this approach wins.

Over the longer term, regardless of my efforts, I can clearly see Python tools evolving to the state where they can compete with Qlik and Tableau for speed of application development.

Maybe in five years' time, we'll have all of the functionality we had 25 years ago. What's old is new again.

Monday, September 28, 2020

Blame free learning

What is blame-free learning?

I’m going to suggest something that’s going to sound innocuous but is a radical and a major departure from the way many organizations are run. I’m suggesting a blame-free learning approach to deal with business failures - I’m advocating it at the corporate level and the personal level.

Here’s what I mean by blame-free learning:

  • After some bad event, the team of people involved meets to hold a post-mortem review.
  • The team reviews what happened dispassionately, without pointing the finger at anyone. The team produces a shared narrative of what happened that’s factual and not value-based.
  • The team comes up with changes to existing processes to prevent a re-occurrence. No blame is apportioned.

Blame-free learning means approaching a failure with a spirit of solving the problem and preventing its re-occurrence, not finding blame. It’s about learning from what’s happened and not repeating it, all without finger-pointing. Blame pits people against one another, shuts down thinking, and creates a conservative and closed mindset. Blame-free learning means people can work together to resolve problems, it also means no one gets off the hook.

Projects gone wild

Let me give you an example of a project gone wrong and two different responses.

A company is bidding for a large contract. To win the contract, the engineering team needs to make some changes to the product, the QA team has to test the product, and the data team has to supply the test data. Unfortunately, the engineering team deprioritized the work, leaving it to the last minute. The QA team manager got sick and didn’t delegate the task. The data team had scheduled a data migration just at the time when they needed to pull the data for the test. Finally, the salesperson had to attend a family wedding so couldn’t submit the RFP on time. The company loses the contract.

In a case like this, you might expect a post-mortem review to examine what happened (many companies don’t even do that). The post-mortem review could easily degenerate into finger-pointing with the most politically powerful group pinning the blame on the least powerful. It’s the search for a scapegoat. Everyone walks away aggrieved, knowing that this will probably come up in performance reviews later. Human relationships are damaged.


(Most project post-mortems are fights for blame. Image credit: Old Book Illustrations - Image in public domain)

A blame-free learning approach refuses to apportion blame, but rather focuses on changes to the process and preventing the problem from recurring. The first step is to develop a shared narrative of what happened. This should be factual and neutrally written. In the example I gave you above, it might say “The QA Team Manager allocated the work to himself, but before he could execute it, he became sick and couldn’t work on the project. Because he was sick, he couldn’t delegate the work.” Participants in the post-mortem are forbidden from blaming anyone but instead are encouraged to think about how the problem can be avoided in the future - by changing processes for example. The result of the post-mortem might be instructions to managers to improve communications, proper setting of priorities, some overall project management, etc.

Heathrow Terminal 5

A version of this approach was used for the construction of Heathrow’s Terminal 5 building [OECD]. Typical large construction projects can end up quite adversarial, with the contractors pitted against project management to apportion blame and extract extra money. Heathrow airport’s operator, BAA had conducted research that indicated that no construction project of this size had been completed on time in the UK, the safety record of these types of projects were poor, and that quality standards were often not met. To get the project done on time, and on budget, BAA did things very differently. The contract had BAA assuming the risk, but with contractors incentivized to solve problems not engage in finger-pointing. Importantly, profits were negotiated and baked into the contract which meant that if a problem occurred, the contractors were incentivized to fix it quickly and maintain their profit margin rather than arguing. An issue with the air-traffic control tower construction illustrates the point. The steel towers were out by 9mm, but instead of blaming each other and litigating, the contractors were incentivized to work together to find a solution, which they did [Gil]. The construction work was completed on time and on budget.

Bureaucracy and fear

I worked in a company with a very formal review system. Someone in a different group applied a process very rigidly and blocked my group from moving forward on a project. They wouldn’t budge and wouldn’t explain why. I had to escalate my issue to move my project forward, which we did. What happened next was enlightening. The person who wouldn’t budge came to see me and explained what was going on. They told me they had previously been marked down in the performance review system for being too flexible so they were scared of allowing any deviation from the process and that’s why they had stopped my project. They came to see me because they were terrified that they would be blamed this time for not being flexible - they begged me not to take the issue further. I told them I wasn’t out for revenge and I would drop the issue - the person looked immensely relieved. Blame was a huge part of the performance review system; it was effective at establishing control but hopeless at innovation and flexibility. My colleague learned one thing from the entire process: fear. They did change their process after this, but only just enough to let my project through, they stayed rigid, just in another way.

With blame-free learning, this would have played out differently. The person would have still blocked my project and I would still have escalated to get things moving, but my colleague wouldn’t be afraid of the review system and we could have had a constructive post-mortem review of how the process could be changed; maybe with a more flexible process that might have retained the original process goals but let other projects through as well as mine. We would all have walked out of this with good working relationships intact.

But what if I can't get everyone's buy-in?

You can do blame-free learning on your own. In fact, this might be the most productive way to do it. Imagine you have some kind of bruising experience at work, for example, a project was late or delayed or some other bad thing happened, also imagine there were several other parties involved, with a lot of blame to go around. You’re hurting and you want to lash out, so you blame others - which may well be correct, but it’s unhelpful. By blaming others, even if you’re correct, you lose the opportunity to learn. Here’s what I suggest in this case:

  • Wait until your anger/frustration subsides. You have to be calm for this to work.
  • Think through the situation as if you were an outsider and construct a neutrally worded description. No loaded phrases (‘he should have...’). If you can’t do it with neutrality, you may need to let more time pass to calm down and gain perspective.
  • With all the knowledge you have now, think about what you could have done differently. It’s only about you, not anyone else.

Yes, this is hindsight, but it’s not about apportioning blame to you, it’s about learning what you could do better in the future. You have the knowledge now that you didn’t have before, so you should be able to do things better. This is all about figuring out what you can do to change you.

Get started

Implementing something like blame-free learning can be very hard across multiple groups, but it's easier to start within a single group. Even easier to start with yourself.

References

[Gil] https://personalpages.manchester.ac.uk/staff/nuno.gil/Teaching%20case%20studies/BAA%20T5%20Agreement%20.pdf

[OECD] https://www.oecd.org/governance/procurement/toolbox/search/allocation-risks-during-construction-heathrow-airport-terminal-5.pdf

Monday, September 21, 2020

The gamblers' fallacy

An offer you can't refuse?

Imagine you're in a casino playing craps, a game where you bet on the outcome of two dice thrown at the same time. The probability of a double six coming up is 1/36, but no one has thrown a double six for over 110 throws. The table is starting to get crowded and noisy with people betting on a double six. It's due to come up, and it must come up soon. 

(Still no double six. Source: Wikimedia Commons. License: Creative Commons. Author: Gaz.)

A new player rolls the dice; snake-eyes (double ones) - still no double six.

You feel a tap on your elbow. A lady in a cocktail dress whispers to you that she'll give you odds of 20 to 1 for a double six.

Another player rolls the dice; easy-four (one and three) - the expectation for a double six mounts.

Your new friend whispers that she'll reduce the odds soon; she asks if you want to take the bet.

It's now 130 throws since a double six has occurred and it should have occurred 3 or 4 times by now.

Do you take the bet?

The gambler's fallacy 

The gambler's fallacy is the belief that the outcome of a random event is somehow influenced by previous random events. In our craps case, some examples might be:

  • double six hasn't come up in 130 throws, so it's much more likely to come up now (the probability is higher than 1/36)
  • double one has just come up, therefore it's not likely to come up again soon (the probability is less than 1/36).

It's a fallacy because each roll of the dice is completely independent; it doesn't matter what the previous throws were. There could have been 1,000 throws without a double six, but the probability of a double six will always be 1/36. The logic same applies to the snake-eyes example, if a snake-eyes has been thrown, the probability of throwing another snake-eyes immediately after is still 1/36.

Let me lay this out even more starkly, in craps:

  • At the very first roll of the dice, the probability of a double six is 1/36.
  • After ten rolls of the dice, the probability of the next roll being a double six is 1/36.
  • After 100 rolls without a double six, the probability of the next roll being a double six is 1/36.
  • After 200 rolls without a double six, the probability of the next roll being a double six is 1/36.
  • After 1,000 rolls without a double six, the probability of the next roll being a double six is 1/36.

Otherwise rational people are fooled by the gambler's fallacy all the time. As the money increases and the emotion heightens, the gambler's fallacy becomes easier and easier to fall for, as we'll see.

The Italian lottery

The story starts in Venice, Italy in May 2003. The Venice lottery was a game where 6 numbered balls (plus a bonus ball) were selected from a set of 90 numbered balls. The lottery was run twice a week. Each number should come out on average once every 7-8 weeks. As with all government-sponsored lotteries, the results were well-publicized.

In May 2003, the number 53 came up. Then it didn't come up again.

By October, people realized the number 53 was overdue. They started to gamble on 53 occurring - it was overdue, so it must come up. But 53 just didn't come up.

News of the 53 drought started to spread, and more and more Italians started to bet that 53 would occur, but it didn't. It didn't come up in November or December either. 

In January of 2004, a woman from Carrara committed suicide because she'd spent her family's life-saving gambling that 53 would come up. It didn't.

Still, 53 didn't come up.

People went crazy betting money that 53 would come up, they became known as '53 addicts'. They were sure it must come up. Sadly, it didn't. A man from Signa shot his wife, his son, and himself after losing money gambling on 53.

Still, 53 didn't come up.

Italians gambled and lost a huge amount of money on 53, an estimated 4 billion Euros. They had fallen for the gambler's fallacy and believed that 53 must come up soon.

Eventually, 53 did come up - in February 2005, after 182 draws (remember, each draw was seven balls).

The Venice lottery made a lot of money, but the Italian gamblers did not.

How the cocktail dress lady (and casinos) makes money

To understand if the cocktail dress lady was offering a good deal, we need to relate probability to odds.

The probability of a double six is 1/36.

The odds are the ratio of the probability the event will occur divided by the probability the event will not occur:

\[odds = \frac{P}{1-P}\]

The odds of a double six are:

\[ odds_{66} = \frac{\frac{1}{36}}{\frac{35}{36}} = \frac {1}{35}\]

which a bookie might quote as 35 to 1.

Generally speaking, casinos and bookies make money in one of two ways:

  • The probabilities don't add up to 1.
  • They rely on the gamblers' fallacy and offer worse odds than a fair analysis would suggest.

Let's imagine there are ten horses in a race. Each horse has a 10% chance of winning, which are odds of 9 to 1. If you win, you get your stake money back, so a winning bet of $1 gives you $10. If ten punters bet $1 on each horse, the bookie takes $10, but one of the horses must win, so the bookie pays out $10. 

(Bookmakers make money. You don't. Image source: Wikimedia Commons. License: Creative Commons. Author: Grand Island Tourism )

To make money, the bookie reduces the odds. Instead of offering 9 to 1 on each of the horses, the bookie offers 8 to 1. The bookie still takes in $10, but this time only pays out $9. In the real world, it's more complicated, but you get the idea.

The other way to make money is to underprice probabilities. A double six should be offered at 35 to 1, but you could offer it at 20 to 1. This is a horrible deal, but if gamblers have a bad case of the gambler's fallacy, they may be convinced the probability is much higher than 1/36 and they may even view a horrible deal as the deal of a lifetime. The casino, or the lady in the cocktail dress, makes money by knowing the odds and knowing when to offer a deal that seems attractive, but isn't.

Not only should you not accept the 20-to-1 offer, but you should also offer it to other players.

Gambler's fallacy in Reno, Nevada and Monte Carlo

Obviously, there are naive gamblers in Las Vegas, but do people really fall for the gamblers' fallacy at the roulette table? After all, you have to have some level of sophistication to understand and play the game, so surely gamblers are savvy and know how to price bets appropriately? It seems that they don't always.

Using videotape data supplied by a casino in Reno, Nevada, two researchers tracked the pattern of gambling on roulette. If gamblers have fallen for the gambler's fallacy, you might expect to see certain patterns of betting, for example, if red hasn't come up as often as expected, they might bet more on red. The researchers found small, but significant examples of the gambler's fallacy The reality is then, there are people who fall for the fallacy, even those playing a sophisticated game like roulette.

(Image source: Wikimedia Commons. License: Creative Commons. Author: Ken Lund.)

Another object lesson in the gambler's fallacy occurred at a roulette table in a casino, this time at a casino in Monte Carlo. In 1943, the ball landed on red 32 times in a row. The people who thought black must come up were cleaned out. 

The gamblers' fallacy elsewhere

The gambler's fallacy has been an active area of research for some time. Variations of it have been found in different places:

Let's imagine you're an asylum judge. You're aware of the average 'success' rate for applicants and you don't want to be too far from the average. Let's assume that cases are randomly assigned (deserving and undeserving). By random chance, you might get a long string of deserving or undeserving cases, maybe as many as twenty in a row. The gambler's fallacy may kick in after a series of similar cases, for example, the first ten cases were deserving, so the eleventh 'must' be undeserving, as a result, you judge more harshly based on expectation.

The gambler's fallacy in business

If you listen closely enough, you hear business people make the gambler's fallacy all the time. How often have you heard these kinds of phrases:

  • We've won the last 8 contracts, so we must win the next one.
  • We just failed to land the last 6 deals, so the odds of us landing the next deal are high.

Despite what people say, business can be strongly driven by belief and not rationality. If everyone needs a deal to be landed, then the collective view might become that a deal will be landed, regardless of what a realistic measure of the probabilities is.

How to guard against the gamblers' fallacy

There's something about humanity and our (mis)understanding of statistics that makes us vulnerable to the gambler's fallacy. The best teacher might be experience. How many Italians who bet on 53 would do so again? There's some evidence that the gambler's fallacy is particularly strong when the data evolves over time, which ties in with the Italian lottery and casino examples. Perhaps the best defense is to take a step back and view the data as a whole, then make a decision away from the influence of others.

The existence of opulent casinos should be a lesson that those who understand probability can make money from those who do not.

Monday, September 14, 2020

The datasaurus: always visualize your data

The summary is not the whole picture

If you just use summary statistics to describe your data, you can miss the bigger picture, sometimes literally so. In this blog post, I'm going to show you how relying on summaries alone can lead you catastrophically astray and I'm going to tell you how you can avoid making career-damaging mistakes.

The datasaurus is why you need to visualize your data. Source: Alberto Cairo. Open source.

What are summary statistics?

Summary statistics are parameters like the mean, standard deviation, and correlation coefficient; they summarize the properties of the data and the relationship between variables. For example, if the correlation coefficient, r, is about 0.8 for two data sets x and y, we might think there's a relationship between them, but if it's about 0, we might think there isn't.

The use of summary statistics is widely taught, every textbook emphasizes them, and almost everyone uses them. But if you use summary statistics in isolation from other methods you might miss important relationships - you should always visualize your data as we'll see.

Anscombe's Quartet

Take a look at the four plots below. They're obviously quite different, but they all have the same summary statistics!

Here are the summary statistics data:

PropertyValue
Mean of x9
Sample variance of x : 11
Mean of y7.50
Sample variance of y : 4.125
Correlation between x and y0.816
Linear regression liney = 3.00 + 0.500x
Coefficient of determination of the linear regression : 0.67

These plots were developed in 1973 by the statistician Francis Anscombe to make exactly this point: you can't rely on summary statistics, you need to visualize your data. The graphical relationship between the x and y variables is different in each case and implies different things. By plotting the data out, we can see what the relationships are, but summary statistics hide what's going on.

The datasaurus

Let's zoom forward to 2016. The justly famous Alberto Cairo tweeted about Anscombe's quartet and illustrated the point with this cool set of summary statistics. He later expanded on his tweet in a short blog post.

Property Value
n 142
mean 54.2633
x standard deviation 16.7651
y mean 47.8323
y standard deviation 26.9353
Pearson correlation -0.0645

What might you conclude from these summary statistics? I might say, the correlation coefficient is close to zero so there's not much of a relationship between the x and the y variables. I might conclude there's no interesting relationship between the x and y variables - but I would be wrong.

The summary might not mean anything to you, but the visualization surely will. This is the datasaurus data set, the x and the y variables draw out a dinosaur.

The datasaurus dozen

Two researchers at Autodesk Research took things a stage further. They started with Alberto Cairo's datasaurus and created a dozen other charts with exactly the same summary statistics as the datasaurus. Here they all are.

The summary statistics look like noise, but the charts reveal the underlying relationships between the x and y variables. Some of these relationships are obviously fun, like the star, but there are others that imply more meaningful relationships.

If all this sounds a bit abstract, let's think about how this might manifest itself in business. Let's imagine you're an analyst working for a large company. You have data on sales by store size for Europe and you've been asked to analyze the data to gain insights. You're under time pressure, so you fire up a Python notebook and get some quick summary statistics. You get summary statistics that look like the ones I showed you above. So you conclude there's nothing interesting in the data, but you might be very wrong.

You should plot the data out and look at the chart. You might see something that looks like the slanting charts above, maybe something like this:



the individual diagonal lines might correspond to different European countries (different regulations, different planning rules, different competition, etc.). There could be a very significant relationship that you would have missed by relying on summary data.

(The Autodesk Research team have posted their work as a paper you can read here.)

Lessons learned

The lessons you should take away from all this are simple:

  • summary statistics hide a lot
  • there are many relationships between variables that will give summary statistics that look like noise
  • always visualize your data!

Tuesday, September 8, 2020

Can you believe the polls?

Opinion polls have known sin

Polling companies have run into trouble over the years in ways that render some poll results doubtful at best. Here are just a few of the problems:

  • Fraud allegations.
  • Leading questions
  • Choosing not to publish results/picking methodologies so that polls agree.

Running reliable polls is hard work that takes a lot of expertise and commitment. Sadly, companies sometimes get it wrong for several reasons:

  • Ineptitude.
  • Lack of money. 
  • Telling people what they want to hear. 
  • Fakery.

In this blog post, I'm going to look at some high-profile cases of dodgy polling and I'm going to draw some lessons from what happened.

(Are some polls real or fake? Image source: Wikimedia Commons. Image credit: Basile Morin. License: Creative Commons.)

Allegations of fraud part 1 - Research 2000

Backstory

Research 2000 started operating around 1999 and gained some solid early clients. In 2008, The Daily Kos contracted with Research 2000 for polling during the upcoming US elections. In early 2010, Nate Silver at FiveThirtyEight rated Research 2000 as an F and stopped using their polls. As a direct result, The Daily Kos terminated their contract and later took legal action to reclaim fees, alleging fraud.

Nate Silver's and others' analysis

After the 2010 Senate elections, Nate Silver analyzed polling results for 'house effects' and found a bias towards the Democratic party for Research 2000. These kinds of biases appear all the time and vary from election to election. The Research 2000 bias was large (at 4.4%), but not crazy; the Rasmussen Republican bias was larger for example. Nonetheless, for many reasons, he graded Research 2000 an F and stopped using their polling data.

In June of 2010, The Daily Kos publicly dismissed Research 2000 as their pollster based on Nate Silver's ranking and more detailed discussions with him. Three weeks later, The Daily Kos sued Research 2000 for fraud. After the legal action was public, Nate Silver blogged some more details of his misgivings about Research 2000's results, which led to a cease and desist letter from Research 2000's lawyers. Subsequent to the cease-and-desist letter, Silver published yet more details of his misgivings. To summarize his results, he was seeing data inconsistent with real polling - the distribution of the numbers was wrong. As it turned out, Research 2000 was having financial trouble around the time of the polling allegations and was negotiating low-cost or free polling with The Daily Kos in exchange for accelerated payments. 

Others were onto Research 2000 too. Three statisticians analyzed some of the polling data and found patterns inconsistent with real polling - again, real polls tend to have results distributed in certain ways and some of the Research 2000 polls did not.

The result

The lawsuit progressed with strong evidence in favor of The Daily Kos. Perhaps unsurprisingly, the parties agreed a settlement, with Research 2000 agreeing to pay The Daily Kos a settlement fee. Research 2000 effectively shut down after the agreement.

Allegations of fraud part 2 - Strategic Vision, LLC

Backstory

This story requires some care in the telling. At the time of the story, there were two companies called Strategic Vision, one company is well-respected and wholly innocent, the other not so much. The innocent and well-respected company is Strategic Vision based in San Diego. They have nothing to do with this story. The other company is Strategic Vision, LLC based in Atlanta. When I talk about Strategic Vision, LLC from now on it will be solely about the Atlanta company.

To maintain trust in the polling industry, the American Association for Public Opinion Research (AAPOR) has guidelines and asks polling companies to disclose some details of their polling methodologies. They rarely censure companies, and their censures don't have the force of law, but public shaming is effective as we'll see. 

What happened

In 2008, the AAPOR asked 21 polling organizations for details of their 2008 pre-election polling, including polling for the New Hampshire Democratic primary. Their goal was to quality-check the state of polling in the industry.

One polling company didn't respond for a year, despite repeated requests to do so. As a result, in September 2009, the AAPOR published a public censure of Strategic Vision, LLC which you can read here

It's very unusual for the AAPOR to issue a censure, so the story was widely reported at the time, for example in the New York Times, The Hill, and The Wall Street Journal. Strategic Vision LLC's public response to the press coverage was that they were complying but didn't have time to submit their data. They denied any wrongdoing.

Subsequent to the censure, Nate Silver looked more closely at Strategic Vision LLC's results. Initially, he asked some very pointed and blunt questions. In a subsequent post, Nate Silver used Benford's Law to investigate Strategic Vision LLC's data, and based on his analysis he stated there was a suggestion of fraud - more specifically, that the data had been made up. In a post the following day, Nate Silver offered some more analysis and a great example of using Benford's Law in practice. Again, Strategic Vision LLC vigorously denied any wrongdoing.

One of the most entertaining parts of this story is a citizenship poll conducted by Strategic Vision, LLC among high school students in Oklahoma. The poll was commissioned by the Oklahoma Council on Public Affairs, a think tank. The poll asked eight various straightforward questions, for example:

  • who was the first US president? 
  • what are the two main political parties in the US?  

and so on. The results were dismal: only 23% of students answered George Washington and only 43% of students knew Democratic and Republican. Not one student in 1,000 got all questions correct - which is extraordinary. These types of polls are beloved of the press; there are easy headlines to be squeezed from students doing poorly, especially on issues around citizenship. Unfortunately, the poll results looked odd at best. Nate Silver analyzed the distribution of the results and concluded that something didn't seem right - the data was not distributed as you might expect. To their great credit, when the Oklahoma Council on Public Affairs became aware of problems with the poll, they removed it from their website and put up a page explaining what happened. They subsequently terminated their relationship with Strategic Vision, LLC.

In 2010, a University of Cincinnati professor awarded Strategic Vision LLC the ''Phantom of the Soap Opera" award on the Media Ethics site. This site has a little more back story on the odd story of Strategic Vision LLC's offices or lack of them.

The results

Strategic Vision, LLC continued to deny any wrongdoing. They never supplied their data to the AAPOR and they stopped publishing polls in late 2009. They've disappeared from the polling scene.

Other polling companies

Nate Silver rated other pollsters an F and stopped using them. Not all of the tales are as lurid as the ones I've described here, but there are accusations of fraud and fakery in some cases, and in other cases, there are methodology disputes and no suggestion of impropriety. Here's a list of pollsters Nate Silver rates an F.

Anarchy in the UK

It's time to cross the Atlantic and look at polling shenanigans in the UK. The UK hasn't seen the rise and fall of dodgy polling companies, but it has seen dodgy polling methodologies.

Herding

Let's imagine you commission a poll on who will win the UK general election. You get a result different from the other polls. Do you publish your result? Now imagine you're a polling analyst, you have a choice of methodologies for analyzing your results, do you do what everyone else does and get similar results, or do you do your own thing and maybe get different results from everyone else?

Sadly, there are many cases when contrarian polls weren't published and there is evidence that polling companies made very similar analysis choices to deliberately give similar results. This leads to the phenomenon called herding where published poll results tend to herd together. Sometimes, this is OK, but sometimes it can lead to multiple companies calling an election wrongly.

In 2015, the UK polls predicted a hung parliament, but the result was a working majority for the Conservative party. The subsequent industry poll analysis identified herding as one of the causes of the polling miss. 

This isn't the first time herding has been an issue with UK polling and it's occasionally happened in the US too.

Leading questions

The old British TV show 'Yes, Prime Minister' has a great piece of dialog neatly showing how leading questions work in surveys. 'Yes, Prime Minister' is a comedy, but UK polls have suffered from leading questions for a while.

The oldest example I've come across dates from the 1970's and the original European Economic Community membership referendum. Apparently, one poll asked the following questions to two different groups:

  • France, Germany, Italy, Holland, Belgium and Luxembourg approved their membership of the EEC by a vote of their national parliaments. Do you think Britain should do the same?
  • Ireland, Denmark and Norway are voting in a referendum to decide whether to join the EEC. Do you think Britain should do the same?

These questions are highly leading and unsurprisingly elicited the expected positive result in both (contradictory) cases.

Moving forward in time to 2012, leading questions or artful question wording, came up again. The background is press regulation. After a series of scandals where the press behaved shockingly badly, the UK government considered press regulation to curb abuses. Various parties were for or against various aspects of press regulation and they commissioned polls to support their viewpoints. 

The polling company YouGov published a poll, paid for by The Media Standards Trust, that showed 79% of people thought there should be an independent government-sanctioned regulator to investigate complaints against the press. Sounds comprehensive and definitive. 

But there was another poll at about the same time, this time paid for by The Sun newspaper,  that found that only 24% of the British public wanted a government regulator for the press - the polling company here was also YouGov! 

The difference between the 79% and 24% came through careful question wording - a nuance that was lost in the subsequent press reporting of the results. You can listen to the story on the BBC's More Or Less program that gives the wording of the question used.

What does all this mean?

The quality of the polling company is everything

The established, reputable companies got that way through high-quality reliable work over a period of years. They will make mistakes from time to time, but they learn from them. When you're considering whether or not to believe a poll,  you should ask who conducted the poll and consider the reputation of the company behind it.

With some exceptions, the press is unreliable

None of the cases of polling impropriety were caught by the press. In fact, the press has a perverse incentive to promote the wild and outlandish, which favors results from dodgy pollsters. Be aware that a newspaper that paid for a poll is not going to criticize its own paid-for product, especially when it's getting headlines out of it.

Most press coverage of polls focuses on discussing what the poll results mean, not how accurate they are and sources of bias. If these things are discussed, they're discussed in a partisan manner (disagreeing with a poll because the writer holds a different political view). I've never seen the kind of analysis Nate Silver does elsewhere - and this is to the great detriment of the press and their credibility.

Vested interests

A great way to get press coverage is by commissioning polls and publishing the results; especially if you can ask leading questions. Sometimes, the press gets very lazy and doesn't even report who commissioned a poll, even when there's plainly a vested interest.

Anytime you read a survey, ask who paid for it and what the exact questions were.

Outliers are outliers, not trends

Outlier poll results get more play than results in line with other pollsters. As I write this in early September 2020, Biden is about 7% ahead in the polls. Let's imagine two survey results coming in early September:

  • Biden ahead by 8%.
  • Trump ahead by 3%

Which do you think would get more space in the media? Probably the shocking result, even though the dull result may be more likely. Trump-supporting journalists might start writing articles on a campaign resurgence while Biden-supporting journalists might talk about his lead slipping and losing momentum. In reality, the 3% poll might be an anomaly and probably doesn't justify consideration until it's backed by other polls. 

Bottom line: outlier polls are probably outliers and you shouldn't set too much store by them.

There's only one Nate Silver

Nate Silver seems like a one-man army, routing out false polling and pollsters. He's stood up to various legal threats over the years. It's a good thing that he exists, but it's a bad thing that there's only one of him. It would be great if the press could take inspiration from him and take a more nuanced, skeptical, and statistical view of polls. 

Can you believe the polls?

Let me close by answering my own question: yes you can believe the polls, but within limits and depending on who the pollster is.

Reading more

This blog post is one of a series of blog posts about opinion polls.