Sunday, February 16, 2020

Coin tossing: more interesting than you thought

Are the books right about coin tossing?

Almost every probability book and course starts with simple coin-tossing examples, but how do we know that the books are right? Has anyone tossed coins several thousand times to see what happens? Does coin-tossing actually have any relevance to business? (Spoiler alert: yes it does.) Coin tossing is boring, time-consuming, and badly paid, so there are two groups of people ideally suited to do it: prisoners and students.

(A Janus coin. Image credit: Wikimedia Commons. Public domain.)

Prisoner of war

John Kerrich was an English/South African mathematician who went to visit in-laws in Copenhagen, Denmark. Unfortunately, he was there in April 1940 when the Nazis invaded. He was promptly rounded up as an enemy national and spent the next five years in an internment camp in Jutland. Being a mathematician, he used the time well and conducted a series of probability experiments that he published after the War [Kerrich]. One of these experiments was tossing a coin 10,000 times. The results of the first 2,000 coin tosses are easily available on Stack Overflow and elsewhere, but I've not been able to find all 10,000, except in outline form.

We’re going to look at the cumulative mean of Kerrich’s data. To get this, we’ll score a head as 1 and a tail as 0. The cumulative mean is the cumulative mean of all scores we’ve seen so far; if after 100 tosses there are 55 heads then it’s 0.55 and so on. Of course, we expect to go to 0.5 ‘in the long run’, but how long is the long run? Here’s a plot of Kerrich’s data for the first 2,000 tosses

(Kerrich's coin-flipping data up to 2,000 tosses.)

I don’t have all of Kerrich’s tossing data for individual tosses, but I do have his cumulative mean results at different numbers of tosses, which I’ve reproduced below.

Number of tosses	Mean	Confidence interval (±)
10	0.4	0.303
20	0.5	0.219
30	0.566	0.177
40	0.525	0.155
50	0.5	0.139
60	0.483	0.126
70	0.457	0.117
80	0.437	0.109
90	0.444	0.103
100	0.44	0.097
200	0.49	0.069
300	0.486	0.057
400	0.498	0.049
500	0.51	0.044
600	0.52	0.040
700	0.526	0.037
800	0.516	0.035
900	0.509	0.033
1090	0.461	0.030
2000	0.507	0.022
3000	0.503	0.018
4000	0.507	0.015
5000	0.507	0.014
6000	0.502	0.013
7000	0.502	0.012
8000	0.504	0.011
9000	0.504	0.010
10000	0.5067	0.009

Do you find something surprising in these results? There are at least two things I constantly need to remind myself when I’m analyzing A/B test results and simple coin-tossing serves as a good wake-up call.

The first piece is how many tosses you need to do to get reliable results. I won’t go into probability theory too much here, but suffice to say, we usually quote a range, called the confidence interval, to describe our level of certainty in a result. So a statistician won’t say 0.5, they’d say 0.5 +/- 0.04. You can unpack this to mean “I don’t know the number exactly, but I’m 95% sure it lies in the range 0.46 to 0.54”. It’s quite easy to calculate a confidence interval for an unbiased coin for different numbers of tosses. I've put the confidence interval in the table above.

The second piece is the structure of the results. Naively, you might have thought the cumulative mean would smoothly approach 0.5, but it doesn’t. The chart above shows a ‘blip’ around 100 where the results seem to change, and this kind of ‘blip’ happens very often in simulation results.

There’s a huge implication for both of these pieces. A/B tests are similar in some ways to coin tosses. The ‘blip’ reminds us we could call a result too soon and the number of tosses needed reminds us that we need to carefully calculate the expected duration of a test. In other words, we need to know what we're doing and we need to interpret results correctly.

Students

In 2009, two Berkeley undergraduates, Priscilla Ku and Janet Larwood, tossed a coin 20,000 times each and recorded the results. It took them about one hour a day for a semester. You can read about their experiment here. I've plotted their results on the chart below.

The results show a similar pattern to Kerrich’s. There’s a ‘blip’ in Priscilla's results, but the cumulative mean does tend to 0.5 in the ‘long run’ for both Janet and Priscilla.

These two are the most quoted coin-tossing results you see on the internet, but in textbooks, Kerrich’s story is told more because it’s so colorful. However, others have spent serious time tossing coins and recording the results; they’re less famous because they only quoted the final number and didn’t give the entire dataset. In 1900, Karl Pearson reported the results of tossing a coin 24,000 times (12,012 heads), which followed on from the results of Count Buffon who tossed a coin 4,040 times (2,048 heads).

Derren Brown

I can’t leave the subject of coin tossing without mentioning Derren Brown, the English mentalist. Have a look at this YouTube video where he flips an unbiased coin heads ten times in a row. It’s all one take and there’s no trickery. Have a think about how he might have done it.

Got your ideas? Here’s how he did it; the old-fashioned way. He recorded himself flipping coins until he got ten heads in a row. It took hours.

But what if?

So far, all the experimental results match theory exactly and I expect they always will. I had a flight of fancy one day that there’s something new waiting for us out past 100,000 or 1,000,000 tosses - perhaps theory breaks down as we toss more and more. To find out if there is something there, all I need is a coin and some students or prisoners.

More technical details

I’ve put some coin tossing resources on my Github page under the coin-tossing section.

Kerrich is the Kerrich data set out to 2,000 tosses in detail and out to 10,000 tosses in summary. The Python code kerrich.py displays the data in a friendly form.
Berkeley is the Berkeley dataset. The Python code berkeley.py reads in the data and displays it in a friendly form. The file 40000tosses.xlsx is the Excel file containing the Berkeley data.
coin-simulator is some Python code that shows multiple coin-tossing simulations. It's built as a Bokeh app, so you'll need to install the Bokeh module to use it.

References

[Kerrich] “An Experimental Introduction to the Theory of Probability". - Kerrich’s 1946 monograph on his wartime exploration of probability in practice.

Wednesday, February 12, 2020

Presentation advice from a professional comedian

In 2019, I heard a piece of presentation advice I'd never heard before. It shook me out of my complacency and made me rethink some key and overlooked aspects of giving a talk. Here's the advice:

"You should have established your persona before you get to the microphone."

There's a lot in this one line. I'm going to explore what it means and its consequences for anyone giving a talk.

(Me, not a professional comedian, speaking at an event in London. Image credit: staff photographer at Drapers.)

Your stage persona is who you are to the audience. People have expectations for how a CEO will behave on stage, how a comedian will behave, and how a developer will behave, etc. For example, let's say you're giving a talk on a serious business issue (e.g. bankruptcy and fraud) to a serious group (lawyers and CEOs), then you should also be serious. You can undermine your message and credibility by straying too far from what people expect.

One of the big mistakes I've seen people make is forgetting the audience doesn't know them. You might be the funniest person in your company, or considered as the most insightful, or the best analyst, but your stage audience doesn't know that; all they know about you is what you tell them. You have to communicate who you are to your audience quickly and consistently; you have to bend to their expectations. This is a lesson performers have learned very well, and comedians are particularly skilled at it.

If they think about their stage persona at all, most speakers think about what they say and how they say it. For example, someone giving a serious talk might dial back the jokes, a CEO rallying the troops might use rhetorical techniques to trigger applause, and a sales manager giving end-of-year awards might use jokes and funny anecdotes. But there are other aspects to your stage persona.

Let's go back to the advice: "you should have established your persona before you get to the microphone". In the short walk to the microphone, how might you establish who you are to the audience? As I see it, there are three areas: how you look, how you move, and how you interact with the audience.

How you look includes your shoes and clothing and your general appearance (including your haircut). Audiences make an immediate judgment of you based on what you're wearing. If I'm speaking to a business audience, I'll wear a suit. If I'm talking to developers, I'll wear chinos and a shirt (never a t-shirt). Shoes are often overlooked, but they're important; you can undermine an otherwise good choice of outfit by a poor choice of shoes (usually, wearing cheap shoes). Unless you're going for comic effect, your clothes should fit you well. Whatever your outfit and appearance, you need to be comfortable with it. I've seen men in suits give talks where they're clearly uncomfortable with a suit and tie. Being obviously uncomfortable undermines the point of dressing up - to be blunt, you look like a boy in a man's clothes.

How you move is a more advanced topic. You can stride confidently to the microphone, or walk normally, or walk timidly with your head down. If you're stressed and tensed, an audience will see it in how you move. By contrast, more fluid movements indicate that you're relaxed and in control. What you do with your hands also conveys a message. Audiences can see if you have a death grip on your notes or if you're using your hands to acknowledge applause. Think about the stage persona you want to create and think about how that person moves to the microphone. For example, if you're a newly appointed CEO, you might want to establish comfortable authority, which you might try and do by the way you walk, what you do with your hands and how you face the audience.

The moment the audience first sees you is where communication between you and the audience begins. How and when you look at an audience sends a message to them. For example, your gaze acknowledges you've seen your audience and that you're with them. Your facial expression tells the audience how you're feeling inside - a comfortable smile communicates one message, a blank expression communicates something else, and a scowl warns the audience the talk might be uncomfortable. You can also use your body to acknowledge applause, telling them that you hear them, which is an obvious piece of non-verbal communication between you and your audience.

In short: think about the seconds before you begin talking. How might you communicate who and what you are without saying a single word?

Where did I hear this advice? I've been listening to a great podcast, 'The Comedian's Comedian' by Stuart Goldsmith, a UK comic. Stuart interviews comics about the art and business of comedy. In one podcast, a comedian told the story of advice he'd received early in his career. The comedian was saying that after five years of performing, he'd managed to establish his stage persona within a minute or so of speaking. The advice he got was: "You should have established your persona before you get to the microphone."

A good presentation is a subtle dialog between the presenter and the audience. The speaker does or says something and the audience responds (or doesn't). Comedians are the purest example of this, they respond in the moment to the audience and they live or die by the interaction. If communicating your message to your audience is important to you, you have to interact with them - including the moments before you begin speaking. Ultimately, all presentations are performance art.

Saturday, February 8, 2020

The Anna Karenina bias

Russian novels and business decisions

What has the opening sentence of a 19th-century Russian novel got to do with quantitative business decisions in the 21st century? Read on and I'll tell you what the link is and why you should be aware of it when you're interpreting business data.

Anna Karenina

The novel is Leo Tolstoy's 'Anna Karenina' and the opening line is: "All happy families are alike; each unhappy family is unhappy in its own way". Here's my take on what this means. For a family to be happy, many conditions have to be met, which means that happy families are all very similar. Many things can lead to unhappiness, either on their own or in combination, which means there's more diversity in unhappy families. So how does this apply to business?

(Leo Tolstoy's family. Do you think they were happy? Image source: Wikimedia Commons. License: Public Domain)

Survivor bias

The Anna Karenina bias is a form of survivor bias, which is, in turn, a form of selection bias. Survivor bias is the bias introduced by concentrating on the survivors of some selection process and ignoring those that did not. The famous story of Wald and the bombers is, in my view, the best example of survivor bias. If Wald had focused on the surviving bombers, he would have recommended putting armor in the wrong place.

When we look at the survivors of some selection process, they will necessarily be more alike than non-survivors because of the selection process (unhappy families vs. happy families). Let me give you an example, buying groceries on the web. Imagine a group of people surfing a grocery store. Some won't buy (unhappy families), but some will (happy families). To buy, you have to find an item you want to buy, you have to have the money, you have to want to buy now, and so on. This selection process will give a group of people who are very similar in a number of dimensions - they will exhibit less variability than the non-purchasers.

Some factors will be important to a purchaser's decision and other factors might not be. In the purchaser group, we might expect to see more variation in factors that aren't important to the buying decision and less variation in factors that are. To quote Shugan [Shugan]:

"Moreover, variables exhibiting the highest levels of variance in survivors might be unimportant for survival because all observed levels of those variables have resulted in survival. One implication is a possible inverse correlation between the importance of a variable for survival and the variable’s observed variability"

In the opinion poll world, the Anna Karenina bias rears its ugly head too. Pollsters often use robocalls to try and reach voters. To successfully record an opinion, the call has to go through, it has to be answered, and the person has to respond to the survey questions. This is a selection process. Opinion pollsters try and correct for biases, but sometimes they miss them. If the people who respond to polls exhibit less variability than the general population on some key factor (e.g. education), then the poll may be biased.

In my experience, most forms of B2C data analysis can be viewed as a selection process, and the desired outcomes of most analysis is figuring out the factors that lead to survival (in other words, what made people buy). The Anna Karenina bias warns us that some of the observed factors might be unimportant for survival and gives us a way of trying to understand which factors are relevant.

Leo Tolstoy in 1897. (Image credit: Wikipedia. Public domain image.)

The takeaways

If you're analyzing business data, here's what to be aware of:

Don't just focus on the survivors, you need to look at the non-survivors too.
Survivors will all tend to look the same - there will be less variability among survivors than among non-survivors.
Survivors may look the same on many factors, only some of which may be relevant.
The factors that vary the most among survivors might be the least important.

References

[Shugan] "The Anna Karenina Bias: Which Variables to Observe?", Marketing Science, Vol. 26, No. 2, March–April 2007, pp. 145–148

Wednesday, February 5, 2020

Reverse engineering a sensor

A few years ago, I was working at an excellent company (BitSight). The only problem was, I felt the office was either too hot or too cold. I bought a temperature/humidity sensor and reverse engineered the control system to build my own UI to control the device. Here's the story of how I did it. If you want to learn how to reverse engineer a sensor interface, this story might be useful to you.

(Image credit: Uni-Trend)

I bought a UT330B temperature and humidity sensor from AliExpress for $35. This is a battery-powered temperature and humidity data logger that can record 60,000 measurements of temperature and humidity. The device has a USB port to download the recordings to a computer and it came with control software to read the device data via USB. The price was right, the accuracy was good, but there was a problem. The control software for the device worked on Windows only. My work and home computers were all Macs. So what could I do?

My great colleague, Philip Gladstone, suggested the way forward. He said, why not reverse engineer the commands and data going back and forth over the USB port so you can write your own control software? So this is what we did. (Philip got the project started, did most of the USB monitoring, and helped with deciphering commands.)

The first step was monitoring the USB connection. We got hold of a Windows machine and ran some USB monitoring software on it. This software reported the bytes sent back and forth to the UT330B. The next step was to load up and run the UT330B control software. Here's the big step; we used the control software to get the device to do things and monitored the USB traffic. From that traffic, we were able to reverse engineer the commands and responses.

The first thing we found was that every command to the device had a particular form:
[two byte header] [two byte command] [two byte modbus CRC]
e.g
0xab 0xcd 0x03 0x18 0xb1 0x05
(this is the command to delete data from the device)
we were able to infer how the command bytes related to actual commands (e.g. 0x03 0x18 means delete data). In some cases, we found pressing one button on the control software resulted in several commands to the device.

The next step was figuring out the CRC. Given the data we had and our knowledge that the CRC was two bytes, I set about looking for CRC schemes that would give us the observed data. I found the UT330B used a modbus CRC check.

Now I could replicate every command to the UT330B.

Figuring out the device data format came next. The device data was what the UT330B sent back in response to commands. Some of this was straightforward, e.g. dates and times, but some of it was more complex. To get a full range of device temperatures, I put the UT330B in a freezer (to get below 0c) and held it over a toaster to get over 100c. Humidity data was a little harder, I managed to get very high humidities by holding the device in the output of a kettle but I had to wait for very dry days to get low humidities. It turns out, data was coded using two's complement (this caused me some problems with negative temperatures until I realized what was going on).

So now I knew every command and how to interpret the results.

Writing some Python code was the next order of business. To talk to the UT330B via the USB port, we needed to use the PySerial library. The UT330B used a Silicon Labs chip, so we also needed to download and install the correct software. With some trial and error, I was able to write a Python class that sent commands to the UT330B and read the response. Now I was in complete control of my device and I didn't need the Windows command software.

A Python class is fine, but the original command software had a nice UI where you could plot charts of temperature and humidity. I wanted something like that in Python, so I used the Bokeh library to build an app. Because I wanted a full-blown app that was written properly, I put some time into figuring out the software architecture. I decided on a variant of the model-view-controller architecture and built my app with a multi-tab display, widgets, and of course, a figure.

Here are some screenshots of what I ended up with.

The complete project is on Github here.

So what did I learn from all of this?

It's possible to reverse engineer (some) USB-controlled devices using the appropriate software, but it takes some effort and you have to have enough experience to do it.
You can control hardware using Python.
Bokeh is a great library for writing GUIs that are focused on charts.
If I were ever to release a USB hardware device, I would publish the interface specification (e.g. 0x03 0x18 means delete data) and provide example (Python or C) code to control the device. I wouldn't limit my sales to Windows users only.

Saturday, February 1, 2020

Company culture and the hall of mirrors

Company culture: real and distorted

When I was a child, my parents took me to a funfair and we walked through the hall of mirrors. The mirrors distorted reality, making us look taller or smaller, fatter or thinner than we really were. I've read a number of popular business books that extoll the virtues of company culture, but I couldn't help feeling they were a hall of mirrors; distorting what's really there to achieve higher book sales.

(Reading about company culture in popular business books is like entering the house of mirrors. Image credit: Wikimedia Commons, License: Creative Commons, Author: SJu)

I was fortunate enough to spend time researching the academic literature on company culture to understand what's really there, and the results surprised me. I took a course at the Harvard Extension School on Organizational Behavior which was taught by the excellent Carmine Gibaldi. Part of this course was a research project and I chose to look at the relationship between company culture and financial performance.

What books claim vs. evidence

Given the long list of popular business books making very definitive claims about company culture, you might expect the research literature to show some clear results. The reality is, the literature doesn't. In fact, when I investigated the support for claims made in books, I couldn't find much in the way of corroborating evidence. Going back to the popular books, in many cases I found no references, in other cases I found evidence based on the author's work alone, and in others, the evidence was just unverified anecdotes. What I did find in the research literature was a more complex story than I expected.

Company culture and financial performance

Despite what the popular business books said, it wasn't true that a strong company culture of itself led to financial success, in fact, many companies with strong cultures went bankrupt. It wasn't true that there were a magic set of values that led to company success either; different successful companies had different values and different cultures. It wasn't true that company values are immutable, even a large company like IBM could and did change its values to survive.

What I found is that there are mechanisms by which a strong and unified company culture can lead to better performance; coordination and control, goal alignment, and enhanced employee effort. For these reasons, under stable market conditions, a strong culture does appear to offer competitive advantages. However, under changing market conditions, a strong culture may be a disadvantage because strong cultures don't have the variability necessary to adapt to change. There also seemed to be a link between culture and industry, with some cultures better suited to some industries.

I read about DEC, Arthur Anderson, and Enron. All companies with strong cultures and all companies that had been held up at the time as examples of the benefits of a strong culture. DEC's culture had been studied, extensively written about, and emulated by others. Enron had several glowing case studies written about them by adoring business schools. In all these cases, company culture may well have been a factor in their demise. Would you take DEC or Enron as your role model now? Are the companies you're viewing as your role models now going to be around in five or ten years?

Where's the beef?

Am I falling into the trap of not providing you with references for my assertions? No, because I'm going to go one stage further. I'm going to give you a link to what I wrote that includes all my references and all my arguments in more detail. Here's the report I wrote which lays out my findings - you can judge for yourself.

Popular books are mostly wrong

During my research, I read some detailed critiques of popular business books. I must admit, I was shocked at the extent to which the Emperor really has no clothes and the extent of the post hoc fallacy and survivor bias in the popular books. Many of the great companies touted in these books have fallen from grace, disproving many of the assertions the books made. If you make statements that company culture leads to long-lasting success, and your champions fail while maintaining their values, it kind of looks like your arguments aren't great. Given the lack of research support for many business books' claims, this should come as no surprise.

What to do?

Here's my advice: any popular business book that purports to reveal a new underlying truth about business success is probably wrong. Before you start believing the claims, or extolling the book's virtues, or implementing policy changes based on what you've read, check that the book stands up to scrutiny and ideally, that there's independent corroboration.

Tuesday, January 28, 2020

Future directions for Python visualization software

The Python charting ecosystem is highly fragmented and still lags behind R, it also lacks some of the features of paid-for BI tools like Tableau or Qlik. However, things are slowly changing and the situation may be much better in a few years' time.

(Image credit: Alvesgaspar - Wikimedia Commons - license)

Theoretically, the ‘grammar of graphics’ approach has been a substantial influence on visualization software. The concept was introduced in 1999 by Leland Wilkinson in a landmark book and gained widespread attention through Hadley Wickham’s development of ggplot2 The core idea is that a visualization can be represented as different layers within a framework, with rules governing the relationship between layers.

Bokeh was influenced by the 'grammar of graphics' concept as were other Python charting libraries. The Vega project seeks to take the idea of the grammar of graphics further and creates a grammar to specify visualizations independent of the visualization backend module. Building on Vega, the Altair project is a visualization library that offers a different approach from Bokeh to build charts. It’s clear that the grammar of graphics approach has become central to Python charting software.

If the legion of charting libraries is a negative, the fact that they are (mostly) built on the same ideas offers some hope for the future. There’s a movement to convergence by providing an abstraction layer above the individual libraries like Bokeh or Matplotlib. In the Python world, there’s precedence for this; the database API provides an abstraction layer above the various Python database libraries. Currently, the Panel project and HoloViews are offering abstraction layers for visualization, though there are discussions of a more unified approach.

My take is, the Python world is suffering from having a confusing array of charting library choices which splits the available open-source development efforts across too many projects, and of course, it confuses users. The effort to provide higher-level abstractions is a good idea and will probably result in fewer underlying charting libraries, however, stable and reliable abstraction libraries are probably a few years off. If you have to produce results today, you’re left with choosing a library now.

The big gap between Python and BI tools like Tableau and Qlik is the ease of deployment and speed of development. BI tools reduce the skill level to build apps, deploy them to servers, and manage tasks like access control. Projects like Holoviews may evolve to make chart building easier, but there are still no good, easy, and automated deployment solutions. However, some of the component parts for easier deployment exist, for example, Docker, and it’s not hard to imagine the open-source community moving its attention to deployment and management once the various widget and charting issues of visualization have been solved.

Will the Python ecosystem evolve to be as good as R’s and be good enough to take on BI tools? Probably, but not for a few years. In my view, this evolution will happen slowly and in public (e.g. talks at PyCon, SciPy etc.). The good news for developers is, there will be plenty of time to adapt to these changes.

Saturday, January 25, 2020

Clayton Christensen, RIP

I was saddened today to read of the death of Clayton Christensen. For me, he was one of the great business thinkers of the last 100 years.

(Image credit: Remy Steinegger - Wikimedia Commons - Creative Commons License)

I remember finding his wonderful book, "The Innovator's Dilemma: When New Technologies Cause Great Firms to Fail", on the shelf at City University in London and devouring it on the way home on the tube. I only recommend a handful of business books and Christensen's book is number one on my list.
Christensen was fascinated by technology disruption, especially how industry leaders are caught out by change and why they don't respond in time. The two big examples in his book are steam shovels and disk drives.

The hydraulic shovel was a cheap and cheerful innovation that didn't seem to threaten the incumbent steam shovel makers. If anything, it was good for their profitability because hydraulic technology addressed the low end of the market where margins were poor, leaving the high-end more profitable market niches to steam shovels. Unfortunately for the steam shovel makers, the hydraulic shovel makers kept on innovating and pushed their technology into more and more niches. In the end, the steam shovel makers were relegated to just a few small niches and it was too late to respond.

The disk drive industry is Christensen's most powerful example. There have been successive waves of innovation in this industry as drives increased in capacity and reduced in size. The leaders in one wave were mostly not the leaders in subsequent waves. Christensen's insight was, the same factors were at work in the disk drive industry as in the steam shovel industry. Incumbents wanted to maximize profitability and saw new technologies coming in at the bottom end of the market where margins were low. Innovations were dismissed as being low-end technologies not competing with their more profitable business. They didn't respond in time as the disruptive technologies increased their capabilities and started to compete with them.

Based on these examples, Christensen teases out why incumbents didn't respond in time and what companies can do to not be caught out by these kinds of disruptive innovations.

Company culture is obviously a factor here, and Christensen poked into this more with a very accessible 2000 paper where he and Overdorf discussed the factors that can lead to a company culture that ignores disruptive innovation.

I never met Christensen, but I heard a lot of good things about him as a person. My condolences to his family.

Further reading

"The Innovator's Dilemma: When New Technologies Cause Great Firms to Fail" - Clayton Christensen

"Meeting the Challenge of Disruptive Change", Clayton M. Christensen and Michael Overdorf, Harvard Business Review, https://hbr.org/2000/03/meeting-the-challenge-of-disruptive-change