Monday, January 11, 2021

How to grow a market segment

Growing a new business

I'm going to tell you how I grew a market segment from almost nothing to multi-millions. It's kind of an instruction manual if you're trying to grow a new segment within a larger business and I hope you find something useful in it. I'm going to be deliberately vague about the segment and the company and I've obscured some of the details.

(Every new business segment starts from small seeds. Image source: Wikimedia Commons, License: Creative Commons, Author: Laitche)

Some background

A few years ago, I was working for a large company that produced products that could be used in many different industries. Part of my role was to find new business segments to sell into, but I had no budget to research or grow segments. On the upside, I had access to a large team of very good salespeople and sales engineers.

First, catch your hare

The first job was to find the market segment to sell into. I was friendly with one of the company's very experienced sales engineers. He knew what I was trying to do and suggested a market he was very familiar with. We'd had this conversation before and I was skeptical. This time, I decided to take a closer look.

I didn't fully understand the market segment, but my friend was correct. The company's products had sold into this segment. They'd sold because my friend had customized the products for that market and developed a sales pitch that worked. He'd largely been ignored and was the only person who sold into the market. Bottom line: there was maybe something there.

How does this market work?

To sell into a segment, you need to:

understand it
know what your value proposition really is
know where you sit in the value chain.

I needed a crash course in understanding the segment and I needed prospective customers to tell me their pain points.

Fortunately, there was a major trade show/conference coming up and I went to it. I got the speaker list and identified over ten people who I thought could help. By guessing emails, I reached out to them and asked for an informational interview at the conference. Of course, not everyone responded or talked to me, but I got enough feedback to understand how my company's products could fit and I understood the pain points.

Market sizing

This is the piece that gets all the attention, but it shouldn't. Lots of people are spilling lots of ink talking about market sizing and offering paid-for sizing products. I found no one who could give me a good market size for my new segment; they could offer me details on some aspects of the market, but not the ones useful to me. In the end, I found the best market sizing data came from free resources on the web. I coupled this data with my own analysis, viewing the segment in different ways and calculating different sizing estimates. I got slightly different estimates of market size, but the difference was immaterial, the segment was big enough to be profitable.

Making marketing content

I wanted the sales team to sell, but they were skeptical. Their belief was, the experienced sales engineer could sell the story to the segment, but no one else could. I needed to change this perception. I also needed to gather customer proof that the product worked in the segment.

One of the salespeople told me of a well-known company that had bought the product for use in our new market segment. There were some things that weren't ideal about their use of the product, but it was a start.

Fortunately, about this time I had a new SLR camera and was experimenting with photography. I was also doing a part-time management degree and had chosen a business writing option. Putting both together, I flew out to the customer and interviewed them for a case study, taking photos to illustrate the piece. Normally, this would be done by a writer, but because this was so new, I wanted to take control. I wrote up the case study and my company published it. I had my first piece of marketing content for my segment.

Not long after, I found another segment customer who was using the product. We asked about a case study, but they gave a firm no. We then asked if we could do a ghost-written article that would appear in their name and that they would have editorial control over ('nothing published without your consent'), they said yes, and we found a trade magazine that would publish the story. Once again, I flew out and interviewed the customer. I wrote up the ghost-written article, but I did it carefully and subtly; my company's product wasn't the biggest feature of the piece (it was a quarter or less) and other company's products were mentioned in the article too. The goal was to establish credibility, and the piece succeeded brilliantly. The customer was hugely pleased with the article, to the extent that the person I interviewed took credit for writing it. I submitted the piece for part of my writing class and got an A grade.

I did this a couple of times and ended up with several pieces of content, crucially, they included usable customer quotes (not by accident).

(Like new market segments, saplings need attention and nutrients to grow. Image source: Wikimedia Commons, License: Creative Commons, Author: RobbieRoss123)

Selling

By this stage, I had marketing content to help sell the product and I had a known segment user base. The next step was to convince the sales team to sell and for sales engineers to help sell it.

Salespeople have quotas to fulfill so they have to be extremely careful about how they fill their time. This means they can be very suspicious about new market segments; they don't know if it'll be worth investing their time.

I found a sales rep who was willing to try selling into the market. It helped that he was also very friendly with the sales engineer too. We did some pitches together using the new marketing content and the sales rep worked closely with the sales engineer. To cut a long story short, the sales rep brought in a $300,000 order. This got people's attention. Other, smaller orders came in too.

Sales engineering management decided to invest in the segment and started to train some sales engineers in how to sell to it.

Salespeople started to get interested in selling into the market. I created some sales presentations for them, and of course, they had the case studies to use.

Pseudo-freemium

The engineering team had other priorities and was unable to customize the product for the segment, but I needed some good demos. Fortunately, my sales engineer came to the rescue again. He had developed a number of demos that worked very well. Another sales engineer had developed some simpler models too. Collectively, we had enough to do something, but the packaging was bad.

Because I have an engineering background, I was able to create a form of product customization that combined the existing demos. In effect, it was a shadow product. We put the product online on the company's website, for free, in exchange for registration. In other words, we had a lead generation tool based on a free product.

Now we had demos and a website, we ran a series of webinars to drive traffic to the shadow product website. The leads went through the standard process and were handed to sales. Bear in mind, by this time, we had a sales deck, demos, and the sales team and sales engineers could sell into the market.

(Eventually, your segment may grow into something big. Image source: Wikimedia Commons, License: Public Domain, Author: AlabamaGuy2007)

Big boys can be bad boys

I did learn a negative lesson in this process. There were a couple of large and prestigious companies in this space. While we were selling into the smaller companies, I faced no political interference, but that changed as soon as we had a big fish on the hook.

I visited a group in a very large and well-known company to talk to them about the market segment. Before visiting the group, I was warned they were doing weird things and had a reputation for giving people the run-around. But what they wanted to do was cool. I came back to the office with a positive message about the big company.

As soon as people found out who the large company was, they wanted to be involved. I went to a meeting where 15 people sat around discussing the sales strategy. Soon, I was cut out of the discussion as more and more strategy meetings were held. The meetings were divorced from reality because no one in them had spoken to the account. However, the meetings were high-profile.

Sadly, the warnings turned out to be right. The group really was doing weird things, and soon they moved on and forgot they'd ever spoken to us. The strategy meetings died off after a month or two as it dawned on the attendees that the opportunity wasn't going anywhere.

After that, I was skeptical of large players. I purposefully downplayed large accounts and kept things technical.

Becoming an expert

I had a very limited background in the segment, but I found I had developed some useful knowledge through this market-building process. I ended up speaking at a segment conference and running an IEEE tutorial. It was bizarre speaking on industry panels next to people who had spent their entire careers in the segment.

Where did this end up?

The market segment went from being less than about $100,000 a year to several million $ per year. Sales reps went from ignoring it, to actively selling into the market, and we went from one sales engineer focused on it to several. We started with zero marketing pieces, and by the end, we had about 15 pieces of focused marketing content, including webinars, articles, and case studies.

Checklist

Here's my checklist for growing a new segment:

Be humble: listen to others and learn from them.
Share credit: make sure the people who work with you get credit.
Be there for others: this isn't a solo endeavor, you have to support your colleagues.

Find out if anyone in your organization has experience in the segment. Learn from them.
Talk to and learn from industry experts. Never sell to them at this point.
Create marketing content:

Create case studies.
Create ghost-written articles.
Create great content that adds value.
Have a webpage to capture leads.
Run webinars.

Sell internally.

Understand the dynamics of the sales and sales engineering team.
Hold their hand until they get the first sales, and even beyond that.
Make sure they know you'll stand by them.

Avoid politics.

Watch out for high-profile accounts, they can mislead and distract and they invite internal politics.

Could I do this again?

I'm going to be honest with you. I got lucky. I benefited from a one-off combination of circumstances that let me succeed:

I stumbled on the segment. If it hadn't been for the sales engineer, I would never have looked at it.
Benign neglect. Except towards the end, I didn't suffer company politics or people stopping me.
Pre-existing content. The sales engineer had developed much of the content I needed.
Skills. I had the photography and writing skills I needed, I also had the technical skills to take the sales engineer's work further.

I owe a lot to that sales engineer, as does the company I worked for. Without him, this wouldn't have happened.

Could I do this again? Maybe. I tried again in a different company in different circumstances but had more limited success. Company politics really held things back.

Would I try and do it again? Yes, but. If you want to know what the 'but' is, you'll have to talk to me.

Monday, January 4, 2021

COVID and soccer home team advantage - winning less often

Home advantage

Is it easier for a sports team to win at home? The evidence from sports as diverse as soccer [Pollard], American football [Vergina], rugby [Thomas], and ice hockey [Leard] strongly suggest there is a home advantage and it might be quite large. But what causes it? Is it the crowd cheering the home team, or closeness to home, or playing on familiar turf? One of the weirder side-effects of COVID is the insight it's proving into the origins of home advantage, as we'll see.

(Premier League teams playing in happier times. Image source: Wikimedia Commons, License: Creative Commons, Author: Brian Minkoff)

The EPL - lots of data makes analysis easier

The English Premier League is the world's wealthiest sports' league [Robinson]. There's worldwide interest in the league and there has been for a long time, so there's a lot of data available, which makes it ideal for investigating home advantage. One of the nice features of the league is that each team plays every other team twice, once at home and once away.

Expectation and metric

If there were no home team advantage, we would expect the number of home wins and away wins to be roughly equal for the whole league in a season. To investigate home advantage, the metric I'll use is:
\[home \ win \ proportion = \frac{number\ of\ home\ wins}{total\ number\ of\ wins}\]
If there were no home team advantage, we would expect this number to be close to 0.5.

EPL home team advantage

Let's look at the mean home-win proportion per season for the EPL. In the chart, the error bars are the 95% confidence interval.

For most seasons, the home win proportion is about 0.6 and it's significantly above 0.5 (in the statistical sense). In other words, there's a strong home-field advantage in the EPL.

But look at the point on the right. What's going on in 2020-2021?

COVID and home wins

Like everything else in the world, the EPL has been affected by COVID. Teams are playing behind closed doors for the 2020-2021 season. There are no fans singing and chanting in the terraces, there are no fans 'oohing' over near misses, and there are no fans cheering goals. Teams are still playing matches home and away but in empty and silent stadiums.

So how has this affected home team advantage?

Take a look at the chart above. The 2020-2021 season is the season on the right. Obviously, we're still partway through the season, which is why the error bars are so big, but look at the mean value. If there were no home team advantage, we would expect a mean of 0.5. For 2020-2021, the mean is currently 0.491.

Let me put this simply. When there are fans in the stadiums, there's a home team advantage. When there are no fans in the stadiums, the home team advantage disappears.

COVID and goals

What about goals? It's possible that a team that might have lost is so encouraged by their fans that they reach a draw instead. Do teams playing at home score more goals?

I worked out the mean goal difference between the home team and the away team and I've plotted it for every season from 2000-2001 onwards.

If there were no home team advantage, you would expect the goal difference to be 0. But it isn't. It mostly hovers around 0.35. Except of course for 2020-2021. For 2020-2021, the goal difference is about zero. The home-field advantage has gone.

What this means

Despite the roll-out of the vaccine, it's almost certain the rest of the 2020-2021 season will be played behind closed doors (assuming the season isn't abandoned). My results are for a partial season, but it's a good bet the final results will be similar. If this is the case, then it will be very strong evidence that fans cheering their team really do make a difference.

If you want your team to win, you need to go to their games and cheer them on.

References

[Leard] Leard B, Doyle JM. The Effect of Home Advantage, Momentum, and Fighting on Winning in the National Hockey League. Journal of Sports Economics. 2011;12(5):538-560.

[Pollard] Richard Pollard and Gregory Pollard, Home advantage in soccer: a review of its existence and causes, International Journal of Soccer and Science Journal Vol. 3 No 1 2005, pp28-44

[Robinson] Joshua Robinson, Jonathan Clegg, The Club: How the English Premier League Became the Wildest, Richest, Most Disruptive Force in Sports, Mariner Books, 2019

[Thomas] Thomas S, Reeves C, Bell A. Home Advantage in the Six Nations Rugby Union Tournament. Perceptual and Motor Skills. 2008;106(1):113-116

[Vergina] Roger C.Vergina, John J.Sosika, No place like home: an examination of the home field advantage in gambling strategies in NFL football, Journal of Economics and Business Volume 51, Issue 1, January–February 1999, Pages 21-31

Monday, December 28, 2020

I won an award! How to lose by winning

Company work anniversary awards

Sometimes, companies try and do a good thing but go about it so poorly, they end up doing something bad.

A few years ago, I worked for a large company. I got to a work anniversary which triggered an award; a plastic slab I was supposed to display on my desk. How it was delivered was eye-opening.

(Winning a trophy like this would be meaningful. Image source: Wikimedia Commons. License: Public Domain.)

I was working at a different office from my manager, so the award was sent directly to me, including the written instructions to my manager on how to give me the award.

How to do it wrong

The award was a tombstone-shaped piece of transparent plastic with some vaguely encouraging words embossed on it. Other than the company logo, there was no customization of any kind (not even the employee's name), it was completely generic. The instructions gave a formal pattern for how the plastic was to be awarded. They went something like this:

Allocate about 20 minutes for the award ceremony.
Gather the employee's colleagues together.
Thank the employee by name for their service to the company. Mention any noticeable successes. Be warm and encouraging. Use their name. Look them in the eye.
Hand over the award, being sure to note that it's a recognition of their service. Use their name.
State that you're looking forward to working with them in the future.
Start a round of applause.

I told my manager that this had happened and we both laughed. I told him I was going to have an award ceremony for myself and hand myself the award using the instructions in the box. He chuckled and told me to go for it. In other words, the whole thing meant nothing to either of us.

Obviously, the company's intention was to thank employees for not leaving. They'd thought it through sufficiently well enough to have a trophy that would be displayed on desks and that wouldn't cost very much. Of course, the goal of the ceremony was to celebrate the individual and make them feel special.

Unfortunately, the trophy wasn't meaningful to anyone - it didn't even look good. The instructions left a bad taste in my mouth. My guess is, the leadership was trying to reach managers who wouldn't normally celebrate individuals' contributions. By mandating the form of the ceremony, they were trying to introduce consistency and enforce meaning, but by describing the ceremony in detail, they undermined managers - this was a form of micro-managing and hinted at bigger issues with managers' people skills.

How to do it right

By contrast, I worked for another large organization that made a very big deal of work anniversaries. People who reached a significant anniversary were called into a big meeting and personally thanked by the CEO. There were meaningful gifts for reaching multiples of 5 years. Looking back on that experience, I believe the company, and the CEO were sincere - they put a lot of effort into thanking and recognizing people. The fact that the recognition was led by the CEO made a huge difference.

Don't fake it

Employee recognition is a fraught topic and work anniversaries can be tricky. Do you celebrate or not and why? If you do celebrate, then it needs to be meaningful and focused on the person; you can't fake or mandate sincerity. If you're going to do it, do it well.

Monday, December 21, 2020

The $10 screwdriver: a cautionary management tale

Managers gone mild

I've told this story to friends several times. It's a simple story, but the lessons are complex and it touches on many different areas. See what you think.

I was a software developer for a large organization working on network-related software. For various reasons I won't go into, we had to frequently change network cards in our test computers and re-install drivers. My bosses' boss put a rule in place that we had to use IT Support to change cards and re-install drivers - we weren't to change the cards ourselves. No other team had a similar rule and there had been no incidents or injuries. Despite asking many times, he wouldn't explain why he put the rule in place.

At first, IT Support was OK with it. But as time wore on, we wanted to change cards twice a day or more. IT Support had a lot of demands on their time and got irritated with the constant requests. They wanted to know why we couldn't do it ourselves. One of the IT guys burned us a CD with the drivers on it and told us to get our own screwdrivers and change the cards ourselves. They started to de-prioritize our help requests because, quite rightly, they had other things to do and we could swap the cards ourselves. It got to the stage where we had to wait over two hours for someone to come, unscrew two screws, swap the card, and screw the two screws back in.

We were very sympathetic to IT Support, but the situation was becoming intolerable. My software development team complained to our management about the whole thing. My bosses' boss still wouldn't budge and insisted we call IT Support to change cards, he wouldn't explain why and he wouldn't escalate the de-prioritization of tickets.

Excalibur the screwdriver

I got so fed up with the whole thing, I went out one lunchtime and bought a £7 ($10) screwdriver. It was a very nice screwdriver, it had multiple interchangeable heads, a ratchet action, and it was red. I gave it to the team. We used the screwdriver and stopped calling IT Support - much to their relief.

(This isn't the actual screwdriver I bought, but it looks a lot like it. Image source: Wikimedia Commons, Author: Klara Krieg, License: Creative Commons.)

The consequences

I then made a big mistake. I put in an expense claim for the screwdriver.

It went to my boss, who didn't have the authority to sign it off. It then went to his boss, who wasn't sure if he could sign it off. It then went to his boss, who did have the authority but wanted to know more. He called a meeting (my boss, my bosses' boss, my bosses' bosses' boss) to discuss my expenses claim. I heard they talked about whether it was necessary or not and whether I had bought a screwdriver that was too expensive when a cheaper one would have done. They decided to allow my expenses claim this one time.

I was called into a meeting with my bosses' bosses' boss and told not to put in a claim like that again. I was called into a meeting with my bosses' boss who told me not to put in an expense claim like that again and that I should have used IT Support every single time and if I were to do it again to buy a cheaper screwdriver. I was then called into a meeting with my boss who told me it was all ridiculous but next time I should just eat the cost. Despite asking, no one ever explained why there had been a 'rule'. Once the screwdriver existed, we were expected to use it and not call IT Support.

Of course, the team all knew what was going on and there was incredulity about the company's behavior. The team lost a lot of respect for our leadership. The screwdriver was considered a holy relic to be treasured and kept safe.

What happened next

Subsequent to these events, I left and got another job. In my new job, I ended up buying thousands of pounds worth of equipment with no one blinking an eye (my new boss told me not to bother him with pre-approval for anything under £1,000).

All the other technical people in my old group left not long after me.

A competitor had been making headway in the market while I was there and really started to break through by the time I left. To respond to the competitive challenge, new leadership came in to make the company more dynamic and they replaced my entire management chain.

What I learned

Here's what I learned from all this. I should have eaten the cost of the screwdriver and avoided a conflict with my management chain, at the same time, I should have been looking for another job. The issue was a mismatch of goals: I wanted to build good things quickly but my management team didn't want to rock the boat. Ultimately, you can't bridge a gap this big. Buying the screwdriver was a subversion of the system and not a good thing to do unless there was a payoff, which there wasn't.

I promised myself I would never behave like the management I experienced, and I never have. With my teams now, I'm careful to explain the why behind rules; it feels more respectful and brings people on side more. I listen to people and I've reversed course if they can make a good case. I've told people to be wise about expenses, to minimize what they spend, but when something needs to be bought, they need to buy it.

What do you think?

If you liked this post, you might also like these ones...

I won an award! How to lose by winning - how a company tried to be authentic but failed, and why another company succeeded.
Serial killer! How to lose business by the wrong serial numbers - how a company lost business through a poor choice of serial numbers, and how businesses put themselves at a negotiating disadvantage by their invoices
The worst technical debt ever - truly, the worst example I've ever seen of short-sight management decisions leading to long-term problems
Sad! How not to create a career structure: good intentions gone bad - how a company tried to create a consistent and clear career structure for software engineers but ended up making things worse
Drunk and disorderly: not funny at corporate events - examples of terrible drunken behavior I've seen in the professional world
The Emperor's new objects: a two-year failed project - how a company invested in a new technology area, mismanaged it, and got nothing
It's a mugs' game: corporate failures, ceramics, and t-shirts - why giving away corporate swag might be a sign of failure

Sunday, December 13, 2020

What's a probability distribution?

Why should you care about probability distributions?

Using the wrong probability distribution can be extremely expensive for businesses:

for businesses using machinery (factories, vehicles, aircraft, etc.), it can lead to parts being changed too frequently or too infrequently
for businesses relying on returning customers, it can lead to substantial under or over-estimates of revenue and/or targeting the wrong customers with promotions
for businesses forecasting future sales by territory and/or product, it can lead to poor territory allocation or poor product resource allocation.

Given that it's so important, what is a probability distribution, and what are some examples?

What's a probability distribution?

At its simplest, a probability distribution tells you how likely an outcome is given some input. For example, how is sales probability distributed by price, or how likely is a component to fail in the next month?

If something is certain to occur, the probability is 1, if it's certain not to occur, the probability is zero. Let's imagine a component lasts a maximum of 6 months before failure. Our probability distribution might show the probability of failure on days 1 to 180. The sum of all failure probabilities for all days must sum to 1.

In the real world, data is noisy and we don't expect real data to exactly follow theoretical distributions, but given enough data, the match should be close enough for us to reason about what's going on.

Uniform distribution - gambling and manufacturing

If the probability is the same for all input values, the distribution is uniform.

Let's imagine we're manufacturing candy, and we want to have equal numbers of red, blue, green, black, and white sweets in a packet. In theory, here's what we should observe.

But in reality, there's random noise so we might see something like this below. We can quantify the difference between the expected distribution and the actual distribution, which tells us something about the variability in the manufacturing process.

The uniform distribution also occurs in gambling, for example, lotteries or dice games.

Reading more

Uniform distribution description by NIST

Binomial distribution - pass/fail and conversion

Each customer who comes into a store or who visits a website will either buy or not buy, which we can turn into a conversion rate. We can model these kinds of pass/fail processes using the binomial distribution. Here's the probability distribution.

The binomial distribution shows us the probability of measuring different results given an underlying 'truth'. Let's imagine the 'true' conversion rate was 0.04, we might not measure 0.04 due to sampling error, instead, we might measure 0.045 or 0.055, depending on how many samples we take. It's important to understand what this means:

There is uncertainty in our measurement.
The smaller the sample, the bigger the uncertainty.

Although many technical people understand this, most non-technical people do not, which can lead to tension.

Reading more

Yale stats

Poisson distribution - waiting in line

Imagine you're a bank serving customers with ATMs at a location. ATMs are expensive, but you don't want to keep people waiting in long lines to do their transactions, it's bad for business. So how do you balance the cost of an ATM against its use? By modeling how many people are using the ATM over a time period.

It turns out, the number of people who visit an ATM over a time period can be modeled using the Poisson distribution, which I've shown below. This gives us a way of assessing how much variation there might be in usage and therefore how many machines we might want to install.

The Poisson distribution is often used to model counting processes. It's very attractive because it has an unusual feature, the standard deviation for the distribution is $\sqrt{\gamma}$ where $\gamma$ is the mean. Unfortunately, this property makes it a little too attractive; it's sometimes used when it shouldn't be.

Reading more

The Poisson Distribution and Poisson Process Explained

Exponential distribution

How long does a car battery last? How long do phone calls last? When will the next earthquake occur? These durations typically follow the exponential distribution (which is strongly related to the Poisson distribution). I've shown this distribution below.

Reading more

The exponential distribution

Power law distribution - finding fraud

How are incomes distributed in a population? How might you find fraud in the pattern of digits in expenses? It turns out, the distribution of the first digits in invoices follows a power-law distribution. The chart below shows a generic power-law distribution - for fraud detection, it's 'flipped'.

Reading more

Power law distribution

Normal distribution - almost everywhere, but not quite

What's the probability distribution for male soldiers' chest measurements? How are the results of A/B tests distributed? What about the distribution of measurement errors? All these, and many, many more follow the normal distribution, which is also called the Gaussian distribution or the bell curve. If you only learn one distribution, this is the one to learn.

The properties of this distribution are extremely well-known, and every student of statistics and probability theory will know them. It's ubiquitous because of something called the Central Limit Theorem, which, simplifying a great deal, says that the sum of samples from any distribution follows a normal distribution.

Because it's everywhere, for some people, it's the only distribution they know. Like the old saying goes, if you only have a hammer, every problem is a nail. This distribution can be over-used, with bad consequences.

Here's the distribution. It ought to look familiar.

Reading more

The normal distribution

Lognormal distribution

How long do visitors spend on web pages? What about the distribution of internet traffic? Or the distribution of city sizes? These all follow a log-normal distribution that looks like the example below. The lognormal distribution is quite common in business.

Note the 'fat tail' or 'long tail' on the right-hand side. Many businesses have been caught out because they assumed sales or market risk followed a normal distribution when in fact they followed a lognormal distribution.

There's a variation of the Central Limit Theorem that yields log-normal distributions instead of normal distributions.

Reading more

Limpert, Eckhard, Werner A. Stahel, and Markus Abbt. "Log-normal distributions across the sciences: keys and clues" BioScience 51.5 (2001): 341-352.

Other distributions

There are lots and lots of different distributions. I saw a list of 90 the other day. Almost all of them are esoteric and apply in a very limited set of cases. You don't have to know all of them but you should be aware that choosing the right distribution is important to make the correct estimates. The distributions I've listed in this blog post are probably the most important, and you should know them and their properties.

As you asked nicely, here is a list of some distributions.

Alpha Distribution

Anglit Distribution

Arcsine Distribution

Beta Distribution

Beta Prime Distribution

Bradford Distribution

Burr Distribution

Burr12 Distribution

Cauchy Distribution

Chi Distribution

Chi-squared Distribution

Cosine Distribution

Double Gamma Distribution

Double Weibull Distribution

Erlang Distribution

Exponential Distribution

Exponentiated Weibull Distribution

Exponential Power Distribution

Fatigue Life (Birnbaum-Saunders) Distribution

Fisk (Log Logistic) Distribution

Folded Cauchy Distribution

Folded Normal Distribution

Fratio (or F) Distribution

Gamma Distribution

Generalized Logistic Distribution

Generalized Pareto Distribution

Generalized Exponential Distribution

Generalized Extreme Value Distribution

Generalized Gamma Distribution

Generalized Half-Logistic Distribution

Generalized Inverse Gaussian Distribution

Generalized Normal Distribution

Gilbrat Distribution

Gompertz (Truncated Gumbel) Distribution

Gumbel (LogWeibull, Fisher-Tippetts, Type I Extreme Value) Distribution

Gumbel Left-skewed (for minimum order statistic) Distribution

HalfCauchy Distribution

HalfNormal Distribution

Half-Logistic Distribution

Hyperbolic Secant Distribution

Gauss Hypergeometric Distribution

Inverted Gamma Distribution

Inverse Normal (Inverse Gaussian) Distribution

Inverted Weibull Distribution

Johnson SB Distribution

Johnson SU Distribution

KSone Distribution

KStwo Distribution

KStwobign Distribution

Laplace (Double Exponential, Bilateral Exponential) Distribution

Left-skewed Lévy Distribution

Lévy Distribution

Logistic (Sech-squared) Distribution

Log Double Exponential (Log-Laplace) Distribution

Log Gamma Distribution

Log Normal (Cobb-Douglass) Distribution

Log-Uniform Distribution

Maxwell Distribution

Mielke’s Beta-Kappa Distribution

Nakagami Distribution

Noncentral chi-squared Distribution

Noncentral F Distribution

Noncentral t Distribution

Normal Distribution

Normal Inverse Gaussian Distribution

Pareto Distribution

Pareto Second Kind (Lomax) Distribution

Power Log Normal Distribution

Power Normal Distribution

Power-function Distribution

R-distribution Distribution

Rayleigh Distribution

Rice Distribution

Reciprocal Inverse Gaussian Distribution

Semicircular Distribution

Student t Distribution

Trapezoidal Distribution

Triangular Distribution

Truncated Exponential Distribution

Truncated Normal Distribution

Tukey-Lambda Distribution

Uniform Distribution

Von Mises Distribution

Wald Distribution

Weibull Maximum Extreme Value Distribution

Weibull Minimum Extreme Value Distribution

Wrapped Cauchy Distribution

Continuous or discrete - shaken or stirred?

Some quantities are discrete and some are continuous. A discrete quantity is something like a sales territory (e.g. Germany, Ireland, Spain) or customer count (you can't have 0.5 of a customer). A continuous quantity can take any value, for example, speed can be 45.2 kph, 120.01 kph, and so on. Some distributions apply to both continuous and discrete, and some apply only to continuous or discrete. To muddy the waters, sometimes continuous distributions are used to approximately model discrete quantities.

Business examples

Vehicles

Imagine you're running a delivery vehicle fleet. You need to keep your vehicles on the road, but you need to keep an eye on maintenance costs. You decide to use math to guide your decisions, so you work out the average lifetime for different components. You have two components A and B with the same lifetimes in miles. If either component fails, you have to tow the vehicle, which is very expensive.

Component A. Lifetime is 150,000 miles.
Component B. Lifetime is 150,000 miles.

A vehicle comes in for maintenance with 149,000 miles on the odometer. Should you replace components A and B?

As you might expect, there's a gotcha. Without knowing the probability distribution for failures, we can't make these decisions. For example, a windshield might have a uniform failure rate distribution, with the probability of failure for miles 1-100 the same as the probability of failure for miles 100,000-100,100. A clutch may have a failure rate that increases with mileage, the probability of failure at miles 100,000-100,100 being much higher than the probability of failure at miles 0-100. Because we know what a clutch and a windshield are, we might decide to replace the clutch and leave the windshield. But what if A and B were a serpentine belt and a heat shield?

The only way to make rational decisions is to understand what distribution the probability of failure follows, which may well be very different for different components (e.g. car seats vs. tires).

Marketing

A new analyst is studying the market for luxury goods in Germany. They have partial data for the fraction of the population that have a certain income. Using what they have, they assume their data is normally distributed and they make a forecast for the fraction of the population that will have an income high enough to afford luxury items. Do you think their forecast will be too low, just right, or too high?

Incomes are usually log-normally distributed, so the analyst, in this case, has chosen the wrong distribution. Because the lognormal has a very long right tail, the analyst's estimate is likely to be an underestimate and may be substantially out. A competitor might not make the same mistake.

Takeaways

I've interviewed people who claim data science on their resumes, but only know the normal distribution. If you assume your data is normal, when in reality it's log-normal or Poisson, things are going to go badly wrong for you. Any analyst in business needs to be very comfortable with different distributions and needs to know which may be applicable and when.

Monday, December 7, 2020

How to (maybe) win at dice

Does God play dice with the universe?

Imagine I gave you an ordinary die, not special in any way, and I asked you to throw the die and record your results (how many 1s, how many 2s, etc.). What would you expect the results to be? Do you think you could win by choosing some numbers rather than others? Are you sure?

(Image source: Wikimedia Commons. Author: Diacritica. License: Creative Commons.)

What you might expect

Let's say you thew the die 12,000 times, you might expect a probability distribution something like this. This is a uniform distribution where all results are equally likely.

You know you'll never get an absolutely perfect distribution, so in reality, your results might look something like this for 12,000 throws.

The deviations from the expected values are random noise that we can quantify. Further, we know that by adding more dice throws, random noise gets less and less and we approach the ideal uniform distribution more closely.

I've simulated dice throws in the plots below, the top chart is 12,000 throws and the chart on the bottom is 120,000 throws. The blue bars represent the actual results, the black circle represents the expected value, and the black line is the 95% confidence interval. Note how the results for 120,000 throws are closer to the ideal than the results from 12,000 throws.

What happened in reality - not what you expect

My results are simulations, but what happens when you throw dice thousands of times in the real world?

There's a short history of probability theorists and statisticians throwing dice and recording the results.

Weldon threw 12 dice 26,306 times by hand and sent the results to his friend Francis Galton.
Iversen ran an experiment where 219 dice were rolled 20,000 times.

Weldon's data set is widely used to illustrate statistical concepts, especially after Pearson used it to explain his $\chi^2$ technique in 1900.

Despite the excitement you see at the craps tables in Las Vegas, throwing dice thousands of times is dull and is, therefore, an ideal job for a computer. In 2009, Zachariah Labby created apparatus for throwing dice and recording the scores using a camera and image processing. You can read more about his apparatus and experimental setup here. He 'threw' 12 dice 26,306 times and his machine recorded the results.

In the chart below, the blue bars are his results, the black circle is the expected result, and the black line is the 95% confidence interval. I've taken the results from all 12 dice, so my throw count is $12 \times 26,306$.

This doesn't look like a uniform distribution. To state the obvious, 1 and 6 occurred more frequently than theory would suggest - the deviation from the uniform distribution is statistically significant. The dice he used were not special dice, they were off-the-shelf standard unbiased dice. What's going on?

Unbiased dice are biased

Take a very close look at a normal die, the type pictured at the start of this post which is the kind of die you buy in shops.

By convention, opposite faces on dice sum to 7, so 1 is opposite 6, 3 is opposite to 4, and so on. Now look very closely again at the picture at the start of the post. Look at the dots on the face of the dice. Notice how they're indented. Each hole is the same size, but obviously, the number of holes on each face is different. Let's think of this in terms of weight. Imagine we could weigh each face of the dice. Let's pair up the faces, each side is paired with the face opposite it. Now let's weigh the faces and compare them.

The greatest imbalance in weights is the 1-6 combination. This imbalance is what's causing the bias.

Obviously, the bias is small, but if you roll the die enough times, even a small bias becomes obvious.

Vegas here I come - or not...

So we know for dice bought in shops that 1 and 6 are ever so slightly more likely to occur than theory suggests. Now you know this, why aren't you booking your flight to Las Vegas? You could spend a week at the craps tables and make a little money.

Not so fast.

Let's look at the dice they use in Vegas.

(Image source: Wikimedia Commons. Author: Alper Atmaca License: Creative Commons.)

Notice that the dots are not indented. They're filled with colored material that's the same density as the rest of the dice. In other words, there's no imbalance, Vegas dice will give a uniform distribution, and 1 and 6 will occur as often as 2, 3, 4, or 5. You're going to have to keep punching the clock.

Some theory

Things are going to get mathematical from here on in. There won't be any new stories about dice or Vegas.

How did I get the expected count and error bars for each dice score? Let's say I threw the dice $x$ times, it seems obvious we would get an expected count of $\frac{x}{6}$ for each score, but why? What about the standard error?

Let's re-think the dice as a Bernoulli trial. Let's choose a score, say 1. If we throw the dice and it shows a 1, we consider that a success. If it shows anything else, we consider it a failure. Because we have a Bernoulli trial, we can use the binomial distribution to model the results.

Using Wikipedia's notation:

$n$ is the number of throws
$p$ is the probability of getting a 1, which is $\frac{1}{6}$
$q = 1- p$ is the probability of getting 2-6, which is $\frac{5}{6}$

So, again using Wikipedia's handy summary, for $n$ throws:

The mean is $np = 12 \times 26,306 \times \frac{1}{6} = 52,612$
The standard deviation is $\sqrt{npq} = \sqrt{12 \times 26,306 \times \frac{1}{6} \times \frac{5}{6}} = 209.388$
The 95% confidence interval is $52,202 $ to $53,022$ (standard deviation by 1.96).

Publications

Academics live or die by publications and by citations of their publications. Labby's work has rightly been widely cited on the internet. I keep hoping that some academic will be inspired by Labby and use modern robotic technology and image recognition to do huge (million-plus) classical experiments, like tossing coins or selecting balls from an urn. It seems like an easy win to be widely cited!

Sunday, November 29, 2020

Am I diseased? An introduction to Bayes theorem

What is Bayes' theorem and why is it so important?

Bayes' theorem is one of the key ideas of modern data science; it's enabling more accurate forecasting, it's leading to shorter A/B tests, and it's fundamentally changing statistical practices. In the last twenty years, Bayes' theorem has gone from being a cute probability idea to becoming central to many disciplines. Despite its huge impact, it's a simple statement of probabilities: what is the probability of an event occurring given some other event has occurred? How can something almost trivial be so revolutionary? Why all this change now? In this blog post, I'm going to give you a brief introduction to Bayes' theorem and show you why it's so powerful.

(Bayes theorem. Source: Wikimedia Commons. Author: Matt Buck. License: Creative Commons.)

A disease example without explicitly using Bayes' theorem

To get going, I want to give you a motivating example that shows you the need for Bayes' theorem. I'm using this problem to introduce the language we'll need. I'll be using basic probability theory to solve this problem and you can find all the theory you need in my previous blog post on probability. This example is adapted from Wayne W. LaMorte's page at BU; he has some great material on probability and it's well worth your time browsing his pages.

Imagine there's a town of 10,000 people. 1% of the town's population has a disease. Fortunately, there's a very good test for the disease:

If you have the disease, the test will give a positive result 99% of the time (sensitivity).
If you don't have the disease, the test will give a negative result 99% of the time (specificity).

You go into the clinic one day and take the test. You get a positive result. What's the probability you have the disease? Before you go on, think about your answer and the why behind it.

Let's start with some notation.

D+ and D- represent having the disease and not having the disease
T+ and T- represent testing positive and testing negative
P(D+) represents the probability of having the disease (with similar meanings for P(D-), P(T+), P(T-))
P(T+ | D+) is the probability of testing positive given that you have the disease.

We can write out what we know so far:

P(D+) = 0.01
P(T+ | D+) = 0.99
P(T- | D-) = 0.99

We want to know P(D+ | T+). I'm going to build a decision tree to calculate what I need.

There are 10,000 people in the town, and 1% of them have the disease. We can draw this in a tree diagram like so.

For each of the branches, D+ and D-, we can draw branches that show the test results T+ and T-:

For example, we know 100 people have the disease, of whom 99% will test positive, which means 1% will test negative. Similarly, for those who do not have the disease, (9,900), 99% will test negative (9,801), and 1% will test positive (99).

Out of 198 people who tested positive for the disease (P(T+) = P(T+ | D+) + P(T+ | D-)), 99 people have it, so P(D+ | T+) = 99/198. In other words, if I test positive for the disease, I have a 50% chance of actually having it.

There are two takeaways from all of this:

Wow! Really, only a 50% probability! I thought it would be much higher! (This is called the base rate fallacy).
This is a really tedious process and probably doesn't scale. Can we do better? (Yes: Bayes' theorem.)

Who was Bayes?

Thomas Bayes (1702-1761), was an English non-conformist minister (meaning a protestant minister not part of the established Church of England). His religious duties left him time for mathematical exploration, which he did for his own pleasure and amusement; he never published in his lifetime in his own name. After his death, his friend and executor, Richard Price, went through his papers and found an interesting result, which we now call Bayes' theorem. Price presented it at the Royal Society and the result was shared with the mathematical community.

(Plaque commemorating Thomas Bayes. Source: Wikimedia Commons Author:Simon Harriyott License: Creative Commons.)

For those of you who live in London, or visit London, you can visit the Thomas Bayes memorial in the historic Bunhill Cemetery where Bayes is buried. For the true probability pilgrim, it might also be worth visiting Richard Price's grave which is only a short distance away.

Bayes' theorem

The derivation of Bayes' theorem is almost trivial. From basic probability theory:

\[P(A \cap B) = P(A) P(B | A)\]

\[P(A \cap B) = P(B \cap A)\]

With some re-arranging we get the infamous theorem:

\[P(A | B) = \frac{P(B | A) P(A)}{P(B)}\]

Although this is the most compact version of the theorem, it's more usefully written as:

\[P(A | B) = \frac{P(B | A) P(A)}{P(B \cap A) + P(B \cap \bar A)} = \frac{P(B | A)P(A)}{P(B | A)P(A) + P(B | \bar A) P( \bar A)}\]

where $\bar A$ means not A (remember $1 = P(A) + P(\bar A)$). You can get this second form of Bayes using the law of total probability and the multiplication rule (see my previous blog post).

So what does it all mean and why is there so much excitement over something so trivial?

What does Bayes' theorem mean?

The core idea of Bayesian statistics is that we update our prior beliefs as new data becomes available - we go from the prior to the posterior. This process is often iterative and is called the diachronic interpretation of Bayes theorem. It usually requires some computation; something that's reasonable to do given today's computing power and the free availability of numeric computing languages. This form of Bayes is often written:

\[P(H | D) = \frac{P(D | H) P(H)}{P(D)}\]

with these definitions:

P(H) - the probability of the hypothesis before the new data - often called the prior
P(H | D) - the probability of the hypothesis after the data - the posterior
P(D | H) - the probability of the data under the hypothesis, the likelihood
P(D) - the probability of the data, it's called the normalizing constant

A good example of the use of Bayes' theorem is its use to better quantify the health risk an individual faces from a disease. Let's say the risk of suffering a heart attack in any year is P(HA), however, this is for the population as a whole (the prior). If someone smokes, the probability becomes P(HA | S), which is the posterior, which may be considerably different from P(HA).

Let's use some examples to figure out how Bayes works in practice.

The disease example using Bayes

Let's start from this version of Bayes:

\[P(A | B) = \frac{P(B | A)P(A)}{P(B | A)P(A) + P(B | \bar A) P( \bar A)}\]

and use the notation from our disease example:

\[P(D+ | T+) = \frac{P(T+ | D+)P(D+)}{P(T+ | D+)P(D+) + P(T+ | D-) P( D-)}\]

Here's what we know from our previous disease example:

P(D+) = 0.01 and by implication P(D-) = 0.99
P(T+ | D+) = 0.99
P(T- | D-) = 0.99 and by implication P(T+ | D-) = 0.01

Plugging in the numbers:

\[P(D+ | T+) = \frac{0.99\times0.01}{0.99\times0.01 + 0.01\times0.99} = 0.5\]

The decision tree is easier for a human to understand, but if there are a large number of conditions, it becomes much harder to use. For a computer on the other hand, the Bayes solution is straightforward to code and it's expandable for a large number of conditions.

Predicting US presidential election results

I've blogged a lot about this, but not about using Bayesian methods. The basic concepts are fairly simple.

To predict a winner, you need to model the electoral college, which implies a state-by-state forecast.
For each state, you know who won last time, so you have a prior in the Bayesian sense.
In competitive states, there are a number of opinion polls that provide evidence of voter intention, this is the data or normalizing constant in Bayes-speak.

In practice, you start with a state-by-state prior based on previous elections or fundamentals, or something else. As opinion polls are published, you calculate a posterior probability for each of the parties to win the state election. Of course, you do this with Bayes theorem. As more polls come in, you update your model and the influence of your prior becomes less and less. In some versions of this type of modeling work, models take into account national polling trends too.

The landmark paper describing this type of modeling is by Linzer.

Using Bayes' theorem to prove the existence of God

Over history, there have been many attempts to prove the existence of God using scientific or mathematical methods. All of them have floundered for one reason or another. Interestingly, one of the first uses of Bayes' theorem was to try and prove the existence of God by proving miracles can happen. The argument was put forward by Richard Price himself. I'm going to repeat his analysis using modern notation, based on an explanation from Cornell University.

Price's argument is based on tides. We expect tides to happen every day, but if a tide doesn't happen, that would be a miracle. If T is the consistency of tides, and M is a miracle (no tide), then we can use Bayes theorem as:

\[P(M | T) = \frac{P(T | M) P(M)}{P(T | M) P(M) + P(T | \bar M) P(\bar M)}\]

Price assumed the probability of miracles existing was the same as the probability of miracles not existing (!), so $P(M) = P(\bar M)$. If we plug this into the equation above and simplify, we get:

\[P(M | T) = \frac{P(T | M)}{P(T | M) + P(T | \bar M)}\]

He further assumed that if miracles exist, they would be very rare (or we would see them all the time), so:

\[P(T | \bar M) >> P(T | M)\]

he further assumed that $P(T | M) = 1e^{-6}$ - in other words, if a miracle exists, it would happen 1 time in 1 million. He also assumed that if there were no miracles, tides would always happen, so $P(T | \bar M) = 1$. The upshot of all this is that:

\[P(M | T) = 0.000001\]

or, there's a 1 in a million chance of a miracle happening.

There are more holes in this argument than in a teabag, but it is an interesting use of Bayes' theorem and does give you some indication of how it might be used to solve other problems.

Monty Hall and Bayes

The Monty Hall problem has tripped people up for decades (see my previous post on the problem). Using Bayes' theorem, we can rigorously solve it.

Here's the problem. You're on a game show hosted by Monty Hall and your goal is to win the car. He shows you three doors and asks you to choose one. Behind two of the doors are goats and behind one of the doors is a car. Once you've chosen your door, Monty opens one of the other doors to show you what's behind it. He always chooses a door with a goat behind it. Next, he asks you the key question: "do you want to change doors?". Should you change doors and why?

I'm going to use the diachronic interpretation of Bayes theorem to figure out what happens if we don't change:

\[P(H | D) = \frac{P(D | H) P(H)}{P(D)} = \frac{P(D | H) P(H)}{P(D | H)P(H) + P(D | \bar H) P( \bar D)}\]

$P(H)$ is the probability our initial choice of door has a car behind it, which is $\frac{1}{3}$.
$ P( \bar H) = 1- P(H) = \frac{2}{3} $
$P(D | H) = 1$ this is the probability Monty will show me a door with a goat given that I have chosen the door with a car - it's always 1 because Monty always shows me the door with a goat
$P(D | \bar H) = 1$ this is the probability Monty will show me a door with a goat given that I have chosen the door with a goat - it's always 1 because Monty always shows me the door with a goat,

Plugging these numbers in:

\[P(H | D) = \frac{1 \times \frac{1}{3}}{1 \times \frac{1}{3} + 1 \times \frac{2}{3}} = \frac{1}{3}\]

If we don't change, then the probability of winning is the same as if Monty hadn't opened the other door. But there are only two doors, and $P(\bar H) + P(H) = 1$. In turn, this means our winning probability if we switch is $\frac{2}{3}$, so our best strategy is switching.

Searching for crashed planes and shipwrecks

On 1st June 2009, Air France Flight AF 447 crashed into the Atlantic. Although the flight had been tracked, the underwater search for the plane was complex. The initial search used Bayesian inference to try and locate where on the ocean floor the plane might be. It used data from previous crashes that assumed the underwater locator beacon was working. Sadly, the initial search didn't find the plane.

In 2011, a new team re-examined the data, with two crucial differences. Firstly, they had data from the first search, and secondly, they assumed the underwater locator beacon had failed. Again using Bayesian inference, they pointed to an area of ocean that had already been searched. The ocean was searched again (with the assumption the underwater beacon had failed), and this time the plane was found.

You can read more about this story in the MIT Technology Review and for more in-depth details, you can read the paper by the team that did the analysis.

It turns out, there's quite a long history of analysts using Bayes theorem to locate missing ships. In this 1971 paper, Richardson and Stone show how it was used to locate the wreckage of the USS Scorpion. Since then, a number of high-profile wrecks have been located using similar methods.

Sadly, even Bayes' theorem hasn't led to anyone finding flight MH370.

Other examples of Bayes' theorem

Bayes has been applied in many, many disciplines. I'm not going to give you an exhaustive list, but I will give you some of the more 'fun' ones.

Why now?

Using Bayes theorem can involve a lot of fairly tedious arithmetic. If the problem requires many iterations, there are lots of tedious calculations. This held up the adoption of Bayesian methods until three things happened:

Cheap computing.
The free and easy availability of mathematical computing languages.
Widespread skill to program in these languages.

By the late 1980s, computing power was sufficiently cheap to make Bayesian methods viable, and of course, computing has only gotten cheaper since then. Good quality mathematical languages were available by the late 1980s too (e.g. Fortran, MATLAB), but by the 2010s, Python and R had all the necessary functionality and were freely and easily available. Both Python and R usage had been growing for a while, but by the 2010s, there was a very large pool of people who were fluent in them.

As they say in murder mysteries, by the 2010s, Bayesian methods had the means, the motive, and the opportunity.

Bayes and the remaking of statistics

Traditional (non-Bayesian) statistics are usually called frequentist statistics. It has a long history and has been very successful, but it has problems. In the last 50 years, Bayesian analysis has become more successful and is now challenging frequentist statistics.

I'm not going to provide an in-depth critique of frequentist statistics here, but I will give you a high-level summary of some of the problems.

p-values and significance levels are prone to misunderstandings - and the choice of significance levels is arbitrary
Much of the language surrounding statistical tests is complex and rests on convention rather than underlying theory
The null hypothesis test is frequently misunderstood and misinterpreted
Prior information is mostly ignored.

Bayesian methods help put statistics on a firmer intellectual foundation, but the price is changing well-understood and working frequentist statistics. In my opinion, over the next twenty years, we'll see Bayesian methods filter down to undergraduate level and gradually replace the frequentist approach. But for right now, the frequentists rule.

Conclusion

At its heart, Bayes' theorem is almost trivial, but it's come to represent a philosophy and approach to statistical analysis that modern computing has enabled; it's about updating your beliefs with new information. A welcome side-effect is that it's changing statistical practice and putting it on a firmer theoretical foundation. Widespread change to Bayesian methods will take time, however, especially because frequentist statistics are so successful.

Reading more

Computer age statistical inference

Think Bayes

Bayesian Data Analysis