Sunday, January 1, 2023

My advice to students looking for a job

I can't talk to you all

Every week, I get several contacts on LinkedIn from students looking for their first job in Data Science.  I'm very sympathetic, but I just don't have the time to speak to people individually. I do want to help though, which is why I put together this blog post. I'm going to tell you what hiring managers are looking for, the why behind the questions you might be asked, and what you can do to stand out as a candidate.

(Chiplanay, CC BY-SA 4.0, via Wikimedia Commons)

Baseline expectations

I don't expect new graduates to have industry knowledge and context, but I do expect them to have a basic toolkit of methods and to know when to apply them and when not to. In my view, the purpose of completing a postgraduate degree is to give someone that toolkit; I'm not going to hire a candidate who needs basic training when they've just completed a couple of years of study in the field.  

Having said that, I don't expect perfect answers, I've had people who've forgotten the word 'median' but could tell me what they mean and that's OK, everyone forgets things under pressure. I'm happy if people can tell me the principles even if they've forgotten the name of the method.

The goal of my interview process is to find candidates who'll be productive quickly. Your goal as a candidate is to convince me that you can get things done quickly, right, and well.

Classroom vs. the real world

In business, our problems are often poorly framed and ambiguous, it's very rare that problems look like classroom problems. I ask questions that are typical of the problems I see in the real world.

I usually start with simple questions about how you analyze a small data set with lots of outliers. It's a gift of a question; anyone who's done analysis themselves will have come across the problem and will know the answer, it's easy. Many candidates give the wrong answers, with some giving crazily complex answers. Of course, sales and other business data often have small sample sizes and large outliers. 

My follow-up is usually something more complex. I sometimes ask about detecting bias in a data set. This requires some knowledge of probability distributions, confidence intervals, and statistical testing. Bear in mind, I'm not asking people to do calculations, I'm asking them how they would approach the problem and the methods they would use (again, it's OK if they forget the names of the methods). This problem is relevant because we sometimes need to detect whether there's an effect or not, for example, can we really increase sales through sales training? Once again, some candidates flounder and have no idea how to approach the problem.

Testing knowledge and insight

I sometimes ask how you would tell if two large, normally distributed data sets are different. This could be an internet A/B test or something similar. Some candidates ask how large the data set is, which is a good question; the really good candidates tell me why their question is important and the basis of it. Most candidates tell me very confidently that they would use the Student-t test, which isn't right but is a sort-of OK answer. I've happily accepted an answer along the lines of "I can't remember the exact name of the test, but it's not the Student-t. The test is for larger data sets and involves the calculation of confidence intervals". I don't expect candidates to have word-perfect answers, but I do expect them to tell me something about the methods they would use.

Often in business, we have problems where the data isn't normally distributed so I expect candidates to have some knowledge of non-normal distributions. I've asked candidates about non-normal distributions and what kind of processes give them. Sadly, a number of candidates just look blank. I've even had candidates who've just finished Master's degrees in statistics tell me they've only ever studied the normal distribution.

I'm looking for people who have a method toolkit but also have an understanding of the why underneath, for example, knowing why the Student-t test isn't a great answer for a large data set.

Data science != machine learning

I've found most candidates are happy to talk about the machine learning methods they've used, but when I've asked candidates why they used the methods they did, there's often no good answer. XGBoost and random forest are quite different and a candidate really should have some understanding of when and why one method might be better than another.

I like to ask candidates under what circumstances machine learning is the wrong solution and there are lots of good answers to this question. Sadly, many candidates can't give any answer, and some candidates can't even tell me when machine learning is a good approach.

Machine learning solves real-world problems but has real-world issues too. I expect candidates with postgraduate degrees, especially those with PhDs, to know something about real-world social issues. There have been terrible cases where machine learning-based systems have given racist results. An OK candidate should know that things like this have happened, a good candidate should know why, and a great candidate should know what that implies for business. 

Of course, not every data science problem is a machine learning problem and I expect all candidates at all levels to appreciate that and to know what that means.

Do you really have the skills you claim?

This is an issue for candidates with several years of business experience, but recent graduates have to watch out for it too. I've found that many people really don't have the skills they claim to have or have wildly overstated their skill level. I've had candidates claim extensive experience in running and analyzing A/B tests but have no idea about statistical power and very limited idea of statistical testing. I've also had candidates massively overstate their Python and SQL skills. Don't apply for technical positions claiming technical skills you really don't have; you should have a realistic assessment of your skill level.

Confidence != ability

This is more of a problem with experienced candidates but I've seen it with soon-to-be graduates too. I've interviewed people who come across as very slick and very confident but who've given me wildly wrong answers with total confidence. This is a dangerous combination for the workplace. It's one thing to be wrong, but it's far far worse to be wrong and be convincing.

To be fair, sometimes overconfidence comes from a lack of experience and a lack of exposure to real business problems. I've spoken to some recent (first-degree) graduates who seemed to believe they had all the skills needed to work on advanced machine learning problems, but they had woefully inadequate analysis skills and a total lack of experience. It's a forgivable reason for over-confidence, but I don't want to be the person paying for their naïveté.

Don't Google the answer

Don't tell me you would Google how to solve the problem. That approach won't work. A Google search won't tell you that not all count data is Poisson distributed or why variable importance is a useful concept. Even worse, don't try and Google the answer during the interview.

Don't be rude, naive, or stupid

A receptionist once told me that an interview candidate had been rude to her. It was an instant no for that candidate at that point. If you think other people in the workplace are beneath you, I don't want you on my team.

I remember sitting in lecture theaters at the start of the semester where students would ask question after question about the exams and what would be in them. I've seen that same behavior at job interviews where some candidates have asked an endless stream of questions about the process and what they'll be asked at what stage. They haven't realized that the world of work is not the world of education. None of the people who asked questions like this made it passed the first round or two, they just weren't ready for the work place. It's fine to ask one or two questions about the process, but any more than that shows naivete. 

I was recruiting data scientists fresh out of Ph.D. programs and one candidate, who had zero industrial experience, kept asking how quickly they could be promoted to a management position. They didn't ask what we were looking for in a manager, or the size of the team, or who was on it, but they were very keen on telling me they expected to be managing people within a few months. Guess who we didn't hire?

Be prepared

Be prepared for technical questions at any stage of the interview process. You might even get technical questions during HR screening. If someone non-technical asks you a technical question, be careful to answer in a way they can understand. 

Don't be surprised if questions are ambiguously worded. Most business problems are unclear and you'll be expected to deal with poorly defined problems. It's fine to ask for more details or for clarification.

If you forget the name of a method, don't worry about it. Talk through the principle and show you understand the underlying ideas. It's important to show you understand what needs to be done.

If you can, talk about what you've done and why you did it. For example, if you've done a machine learning project, use that as an example, making sure you explain why you made the project choices you did. Work your experience into your answers and give examples as much as you can.

Be prepared to talk about any technical skills you claim or any area of expertise. Almost all the people interviewing you will be more senior than you and will have done some of the same things you have. They may well be experts.

If you haven't interviewed in many places, or if you're out of practice, role-playing interviews should help. Don't choose people who'll go easy on you, instead chose people who'll give you a tough interview, ideally tougher than real-world interviews.

Most interviews are now done over Zoom. Make sure your camera is on and you're in a good place to do the interview.

Stand out in a good way

When you answer technical questions, be sure to explain why. If you can't answer the question, tell me what you do know. Don't just tell me you would use the Wilson score interval, tell me why you would use it. Don't just tell me you don't know how to calculate sample sizes, tell me how you would go about finding out and what you know about the process.

I love it when a candidate has a GitHub page showing their work. It's fine if it's mostly class projects, in fact, that's what I would expect. But if you do have a GitHub page, make sure there's something substantive there and that your work is high quality. I always check GitHub pages and I've turned down candidates if what I found was poor.

Very, very few candidates have done this, but if you have a writing sample to show, that would be great. A course essay would do, but of course, a great one. You could have your sample on your GitHub page. A writing sample shows you can communicate clearly in written English, which is an essential skill.

If you're a good speaker, try and get a video of you speaking up on YouTube and link it from your GitHub page (don't do this if you're bad or just average). Of course, this shows your ability to speak well.

For heaven's sake, ask good questions. Every interviewer should leave you time to ask questions and you should have great questions to ask. Remember, asking good questions is expected

What hiring managers want

We need people who:

  • have the necessary skills
  • will be productive quickly
  • need a minimal amount of hand-holding
  • know the limitations of their knowledge and who we can trust not to make stupid mistakes
  • can thrive in an environment where problems are poorly defined
  • can work with the rest of the team.

Good luck.

Monday, November 28, 2022

Is this coin biased?

Tossing and turning

A few months ago, someone commented on one of my blog posts and asked how you work out if a coin is biased or not. I've been thinking about the problem since then. It's not a difficult one, but it does bring up some core notions in probability theory and statistics which are very relevant to understanding how A/B testing works, or indeed any kind of statistical test. I'm going to talk you through how you figure out if a coin is biased, including an explanation of some of the basic ideas of statistical tests.

The trial

A single coin toss is an example of something called a Bernoulli trial, which is any kind of binary decision you can express as a success or failure (e.g. heads or tails). For some reason, most probability texts refer to heads as a success. 

We can work out what the probability is of getting different numbers of heads from a number of tosses, or more formally, what's the probability \(P(k)\) of getting \(k\) heads from \(n\) tosses, where \(0 < k ≤ n\)? By hand, we can do it for a few tosses:

 Number of heads (k) Combinations (n)  Count  Probability 
 0 TTT 1 1/8
 1 HTT
 THT
 TTH
 3 3/8 
 2 THH
 HTH
 HHT
 3  3/8
 4 HHH 1 1/8

But what about 1,000 or 1,000,000 tosses - we can't do this many by hand, so what can we do? As you might expect, there's a formula you can use: 
\[P(k) = \frac{n!} {k!(n-k)!} p^k (1-p)^{n-k}\]
\(p\) is the probability of success in any trial, for example, getting a head. For an unbiased coin \(p=0.5\); for a coin that's biased 70% heads \(p=0.7\). 

If we plot this function for an unbiased coin (\(p=0.5\)), where \(n=100\), and \(0 < k ≤ n\), we see this probability distribution:

This is called a binomial distribution and it looks a lot like the normal distribution for large (\(> 30\)) values of \(n\). 

I'm going to re-label the x-axis as a score equal to the fraction of heads: 0 means all tails, 0.5 means \(\frac{1}{2}\) heads, and 1 means all heads. With this slight change, we can more easily compare the shape of the distribution for different values of \(n\). 

I've created two charts below for an unbiased coin (\(p=0.5\)), one with \(n=20\) and one with \(n=40\). Obviously, the \(n=40\) chart is narrower, which is easier to see using the score as the x-axis. 

As an illustration of what these charts mean, I've colored all scores 0.7 and higher as red. You can see the red area is bigger for \(n=20\) than \(n=40\). Bear in mind, the red area represents the probability of a score of 0.7 or higher. In other words, if you toss a fair coin 20 times, you have a 0.058 chance of seeing a score of 0.7 or more, but if you toss a fair coin 40 times, the probability of seeing a 0.7 score drops to 0.008.


These charts tell us something useful: as we increase the number of tosses, the curve gets narrower, meaning the probability of getting results further away from \(0.5\) gets smaller. If we saw a score of 0.7 for 20 tosses, we might not be able to say the coin was biased, but if we got a score of 0.7 after 40 tosses, we know this score is very unlikely so the coin is more likely to be biased.

Thresholds

Let me re-state some facts:

  • For any coin (biased or unbiased) any score from 0 to 1 is possible for any number of tosses.
  • Some results are less likely than others; e.g. for an unbiased coin and 40 tosses, there's only a 0.008 chance of seeing a score of 0.7.

We can use probability thresholds to decide between biased and non-biased coins.  We're going to use a threshold (mostly called confidence) of 95% to decide if the coin is biased or not. In the chart below, the red areas represent 5% probability, and the blue areas 95% probability.

Here's the idea to work out if the coin is biased. Set a confidence value, usually at 0.05. Throw the coin \(n\) times, record the number of heads and work out a score. Draw the theoretical probability chart for the number of throws (like the one I've drawn above) and color in 95% of the probabilities blue and 5% red. If the experimental score lands in the red zones, we'll consider the coin to be biased, if it lands in the blue zone, we'll consider it unbiased.

This is probabilistic decision-making. Using a confidence of 0.05 means we'll wrongly say a coin is biased 5% of the time. Can we make the threshold higher, could we use 0.01 for instance? Yes, we could, but the cost is increasing the number of trials.

As you might expect, there are shortcuts and we don't actually have to draw out the chart. In Python, you can use the binom_test function in the stats package. 

To simplify, binom_test has three arguments:

  • x - the number of successes
  • n - the number of samples
  • p - the hypothesized probability of success
It returns a p-value which we can use to make a decision.

Let's see how this works with a confidence of 0.05. Let's take the case where we have 200 coin tosses and 140 (70%) of them come up heads. We're hypothesizing that the coin is fair, so \(p=0.5\).

from scipy import stats
print(stats.binom_test(x=140, n=200, p=0.5))

the p-value we get is 1.5070615573524992e-08 which is way less than our confidence threshold of 0.05 (we're in the red area of the chart above). We would then reject the idea the coin is fair.

What if we got 112 heads instead?

from scipy import stats
print(stats.binom_test(x=115, n=200, p=0.5))

This time, the p-value is 0.10363903843786755, which is greater than our confidence threshold of 0.05 (we're in the blue area of the chart), so the result is consistent with a fair coin (we fail to reject the null).

What if my results are not significant? How many tosses?

Let's imagine you have reason to believe the coin is biased. You throw it 200 times and you see 115 heads. binom_test tells you you can't conclude the coin is biased. So what do you do next?

The answer is simple, toss the coin more times.

The formulae for the sample size, \(n\), is:

\[n = \frac{p(1-p)} {\sigma^2}\]

where \(\sigma\) is the standard error. 

Here's how this works in practice. Let's assume we think our coin is just a little biased, to 0.55, and we want the standard error to be \(\pm 0.04\). Here's how many tosses we would need: 154. What if we want more certainty, say \(\pm 0.005\), then the number of tosses goes up to 9,900. In general, the bigger the bias, the fewer tosses we need, and the more certainty we want the more tosses we need. 

If I think my coin is biased, what's my best estimate of the bias?

Let's imagine I toss the coin 1,000 times and see 550 heads. binom_test tells me the result is significant and it's likely my coin is biased, but what's my estimate of the bias? This is simple, it's actually just the mean, so 0.55. Using the statistics of proportions, I can actually put a 95% confidence interval around my estimate of the bias of the coin. Through math I won't show here, using the data we have, I can estimate the coin is biased 0.55 ± 0.03.

Is my coin biased?

This is a nice theoretical discussion, but how might you go about deciding if a coin is biased? Here's a step-by-step process.

  1. Decide on the level of certainty you want in your results. 95% is a good measure.
  2. Decide the minimum level of bias you want to detect. If the coin should return heads 50% of the time, what level of bias can you live with? If it's biased to 60%, is this OK? What about biased to 55% or 50.5%?
  3. Calculate the number of tosses you need.
  4. Toss your coin.
  5. Use binom_test to figure out if the coin deviates significantly from 0.5.

Other posts like this

Wednesday, November 23, 2022

The London weighting

London is an oddity

I was born and grew up in London but it was only after I left that I started to realize how much of an oddity it is; it's almost as if it's a different country from the rest of the UK. I thought other capital cities would have a similar disjointed relationship with their host countries, and I was partly right, but I was also mostly wrong. Let me explain why London is such an international oddity.

Zipf's law

Zipf's law refers to the statistical distribution of observations found in some types of data, for example, word frequency in human languages. It isn't a law in the sense of a scientific 'law', it's a distribution. 

In simple terms, for measurements that follow Zipf's law, the first item is twice the second item, three times the third, and so on. For example, in English, the word 'the' is the most frequent word and it occurs twice as often as the next most common word ('of') [https://www.cs.cmu.edu/~cburch/words/top.html]. 

I found some readable articles on Zipf's law here: [http://www.casa.ucl.ac.uk/mike-michigan-april1/mike's%20stuff/attach/Gabaix.pdf, https://gizmodo.com/the-mysterious-law-that-governs-the-size-of-your-city-1479244159].

It turns out that a number of real-world measurements follow Zipf's law, including city sizes. 

The US and elsewhere

Here's what city size looks like in the US. This is a plot of ln(Rank) vs ln(Population) with the biggest city (New York) being bottom right (ln meaning natural logarithm). 


It's close to an ideal Zipf law distribution.

You can see the same pattern in other cities around the world [https://arxiv.org/pdf/1402.2965.pdf].

One of the interesting features of the Zipf city distribution is that it's mostly persistent over time [http://www.casa.ucl.ac.uk/mike-michigan-april1/mike's%20stuff/attach/Gabaix.pdf]. Although the relative size of a few cities may change, for most of the cities in a country, the relationship remains the same. Think about what this means for a minute; if the largest city has a population of 1,000,000 and the second largest has a population of 500,000, then if the population increases by 150,000 we would expect the largest city to increase to 1,100,000 and the second to increase to 550,000; most of the increase goes to the bigger city [https://www.janeeckhout.com/wp-content/uploads/06.pdf]. The population increase is not evenly spread.

A notable aside is how the press manages to miss the point when census data is released. If the population increases, most of the increase will go to the bigger cities. The story ought to be that bigger cities are getting bigger (and what that means). Instead, the press usually focuses on smaller cities that are growing or shrinking more than the average growth rate.

The UK and the London weighting

There's a big exception to the Zipf law relationship. London is much bigger than you would expect it to be. Here's the Zipf law relationship for UK cities with London in red.

London is twice the size you would expect it to be.

There are many theories about why London is so big. Some authors flip the question around and ask why Britain's second cities aren't larger, but that doesn't help explain why [http://spatial-economics.blogspot.com/2012/10/are-britains-second-tier-cities-too.html]. Here are some theories I've seen:

  • The UK is an overly centralized country.
  • London was an imperial city for a long time and that drove London's growth. The comparison group should have been imperial cities, and now the empire has gone, London is left as an oddity.
  • London (was) in an economic zone that included the major cities of western Europe, so the comparison group isn't the UK, it's western Europe.

I think there's an element of truth in all of them. Certainly, UK governments (of all parties) have often prioritized spending on London, for example, there are no large-scale public construction projects anything like the Elizabeth Line elsewhere in the UK. Culture and the arts are also concentrated in London too, think of any large cultural organization in the UK (British Museum, National Theatre Company, Victoria & Albert...) and guess where they'll be located - and they're all government-funded. Of course, cause and effect are deeply intertwined here,  London gets more spending because it's big and important, therefore it stays big and important.

What are the implications?

London's size difference from other UK cities drives qualitative and quantitative differences. It's not the only UK city with a subway network but it's by far the largest (more than twice as big as the next networks).  It has more airports serving it than any other UK city. Its system of governance is different. Its politics are different. The fraction of people born overseas is different. And so on. Without change, these differences will continue and London will continue to look and feel very different from the rest of the UK, the country will be two countries in one.

As a child, I was right to pick up on the feeling that London was different; it really does feel like a different country. It's only as an adult that I've understood why. I've also realized that the UK's second-tier cities are falling behind, and that's a real problem. The UK is over-centralized and that's harming everyone who doesn't live in London.

London is considered a first-tier or world city [https://mori-m-foundation.or.jp/english/ius2/gpci2/index.shtml] and the challenge for UK governments is to bring the UK's other cities up while not dragging London down.

Wednesday, November 16, 2022

London and New York - different but similar

World cities but different geographies

In German, a world city (or "weltstadt") is a large, sophisticated, cosmopolitan city. There are only a handful of them and the list includes New York and London. Although there are obvious differences between these two cities, there are many, many similarities; no wonder they're sister or twin cities.

I was reading a National Geographic article about geographic misconceptions and it set me thinking about some of the less obvious, but profound differences between London and New York. 

Let's dive into them.

If London was in North America...

Cities north and south of New York
Cities north and south of New York

Let's line up some of the major North American cities in terms of miles north or south of New York. I'm going to ignore miles east or west and just focus on north and south. Here's the line on the left. 

As you can see, Quebec City is 421 miles north of New York and Charlotte is 379 miles south.

On this line, where do you think London would appear? How many miles north or south of New York is London? Take a guess before scrolling down and peeking.

Here's the answer: 745 miles north.

That's right, London is way further north than Quebec City. London is actually slightly further north than Calgary. In fact, the UK as a whole is entirely north of the contiguous United States. 

745 miles is a long way north and it has some consequences.


Daylight saving

Let's look at sunrise and sunset times and how they vary through the year. In the chart below, I've plotted sunrise and sunset times by month, removing daylight savings time shifts. 

To show the differences a bit more starkly, let's take sunrise and sunset on solstice days:

 Date City Sunrise Sunset Daylight time
         
2022-06-21 London 4:43:09 AM 9:21:41 PM 16h 38m 32s
2022-06-21 New York 5:25:09 AM 8:30:53 PM 15h 5m 44s
         
2022-12-21 London 8:03:52 AM 3:53:45 PM 7h 49m 53s
2022-12-21 New York 7:16:52 AM 4:32:12 PM 9h 15m 20s

That's a big difference. In London in the summer, you can party in daylight until 9pm, by which time in New York, it's gone dark. Conversely, in London in winter, the city is dark by 4pm, while New Yorkers can still enjoy the winter sunshine as they do their Christmas shopping.

On the face of it, it would seem like it's better to spend your summers in London and your winters in New York. If London is so far north of New York, surely New York winters must be better?

Blowing hot and cold

I've plotted the average monthly high and low temperatures for London and New York. London has cooler summers but warmer winters. Is this what you expected?

In winter, Londoners might still enjoy a drink outside, but in New York, this isn't going to happen. People in London don't wear hats in winter, but in New York they do. New Yorkers know how to deal with snow, Londoners don't. In the summer, New Yorkers use A/C to cool down, but Londoners don't even know how to spell it because it rarely gets hot enough to need it.

London's climate, and in fact much of Europe's, is driven by the Gulf Stream. This keeps the UK much warmer than you would expect from its latitude. Of course, the fact the UK is a small island surrounded by lots of water helps moderate the climate too.

The moderate London climate is probably the main reason why people think London and New York are much closer on the north-south axis than they really are.

Climate as an engine of culture

On the face of it, you would think cities with different climates would have different cultures, but New York and London show that's not always the case. These two cities are hundreds of miles apart (north/south) and have noticeably different climates, but they're culturally similar, and obviously, they're both 'world cities'. Perhaps the best we can say about climate is that it drives some features of city life but not the most fundamental ones.

Monday, August 22, 2022

How the post-hoc fallacy can cost businesses millions

The post-hoc fallacy

Over my career, I’ve seen companies make avoidable business mistakes that’ve cost them significant time and money. Some of these mistakes happened because people have misunderstood “the rules of evidence” and they’ve made a classic post-hoc blunder. This error is insidious because it comes in different forms and it can seem like the error is the right thing to do.

In this blog post, I’ll show you how the post-hoc error can manifest itself in business, I’ll give you a little background on it, and show you some real-world examples, finally, I’ll show you how you can protect yourself.

A fictitious example to get us started 

Imagine you’re an engineer working for a company that makes conveyor belts used in warehouses. A conveyor belt break is both very dangerous and very costly; it can take hours to replace, during which time at least part of the warehouse is offline. Your company thoroughly instruments the belts and there’s a vast amount of data on belt temperature, tension, speed, and so on.

Your Japanese distributor tells you they’ve noticed a pattern. They’ve analyzed 319 breaks and found that in 90% of cases, there’s a temperature spike within 15 minutes of the break. They’ve sent you the individual readings which look something like the chart below. 

The distributor is demanding that you institute a new control that stops the belt if a temperature spike is detected and prompts the customer to replace the belt.

Do you think the Japanese distributor has a compelling case? What should you do next? 

The post-hoc fallacy

The full name of this fallacy is “post hoc ergo propter hoc”, which is thankfully usually shortened to "post-hoc". The fallacy goes like this:

  • Event X happens then event Y happens
  • Therefore, X caused Y. 

The oldest example is the rooster crowing before dawn: “the rooster crows before dawn, therefore the rooster’s crow causes the dawn”. Obviously, this is a fallacy and it’s easy to spot.

Here's a statement using the same logic to show you that things aren’t simple: “I put fertilizer on my garden, three weeks later my garden grew, therefore the fertilizer caused my garden to grow”. Is this statement an error?

As we’ll see, statements of the form:

  • Event X happens then event Y happens
  • Therefore, X caused Y.

Aren’t enough of themselves to provide proof.

Classic post-hoc errors 

Hans Zinsser in his book “Rats, lice, and history” tells the story of how in medieval times, lice were considered a sign of health in humans. When lice left a person, the person became sick or died, so the obvious implication is that lice are necessary for health. In reality, of course, lice require a live body to feed on and a sick or dead person doesn’t provide a good meal.

In modern times, something similar happened with violent video games. The popular press set up a hue and cry that playing violent video games led to violent real-life behavior in teenagers, the logic being that almost every violent offender had played violent video games. In reality, a huge fraction of the teenage population has played violent video games. More careful studies showed no effect.

Perhaps the highest profile post-hoc fallacy in modern times is vaccines and autism.  The claim is that a child received a vaccine, and later on, the child was diagnosed with autism, therefore the vaccine caused autism. As we know, the original claims of a link were deeply flawed at best.

Causes of errors

Confounders 

A confounder is something, other than the effect you’re studying, that could cause the results you’re seeing. The classic example is storks in Copenhagen after the second world war. In the 12-year period after the second world war, the number of storks seen near Copenhagen increased sharply, as did the number of (human) babies. Do we conclude storks cause babies? The cause of the increase in the stork population, and the number of babies, was the end of the second world war so the confounder here was war recovery.

In the autism case, confounders are everywhere. Both vaccinations and autism increased at the same time, but lots of other things changed at the same time too:

  • Medical diagnosis improved
  • Pollution increased
  • The use of chemical cleaning products in the home increased
  • Household income went up (but not for everyone, some saw a decrease)
  • Car ownership went up.

Without further evidence, we can’t say what caused autism. Once again, it’s not enough of itself to say “X before Y therefore X causes Y”. 

Confounders can be very, very hard to find.

Biased data

The underlying data can be so biased that it renders subsequent analysis unreliable. A good example is US presidential election opinion polling in 2016 and 2020; these polls under-represented Trump’s support, either because the pollsters didn’t sample the right voters or because Trump voters refused to be included in polls. Whatever the cause, the pollsters’ sampling was biased, which meant that many polls didn't accurately forecast the result.

For our conveyor belt example, the data might be just Japanese installations, or it might be Asian installations, or it might be worldwide. It might even be just data on broken belts, which introduces a form of bias called survivor bias. We need to know how the data was collected.

Thinking correlation = causation

Years ago, I had a physics professor who tried to beat into us students the mantra “correlation is not causation” and he was right. I’ve written about correlation and causation before, so I’m not going to say too much here. For causation to exist, there must be correlation, but correlation of itself does not imply causation.

To really convince yourself that correlation != causation, head on over to the spurious correlations website where you’ll find lots of examples of correlations that plainly don’t have an X causes Y relationship. What causes the correlation? Confounders, for example, population growth will lead to increases in computer science doctorates and arcade revenue. 

Protections

Given all this, how can you protect yourself against the post-hoc fallacy? There are a number of methods designed to remove the effects of confounders and other causes of error. 

Counterexamples 

Perhaps the easiest way of fighting against post-hoc errors is to find counterexamples. If you think the rooster crowing causes dawn, then a good test is to shoot the rooster; if the rooster doesn’t crow and the dawn still happens, then the rooster can’t cause dawn. In the human lice example, finding a population of humans who were healthy and did not have lice would disprove the link between health and lice.

Control groups

Control groups are very similar to counterexamples. The idea is that you split the population you’re studying into two groups with similar membership. One group is exposed to a treatment (the treatment group) and the other group (the control group) is not. Because the two groups are similar, any difference between the groups must be due to the treatment. 

I talked earlier about a fertilizer example: “I put fertilizer on my garden, three weeks later my garden grew, therefore the fertilizer caused my garden to grow”. The way to prove the fertilizer works is to split my garden into two equivalent areas, one area gets the fertilizer (the treatment group) and the other (the control group) does not. This type of agricultural test was the predecessor of modern randomized control trials and it’s how statistical testing procedures were developed.

RCTs (A/B testing)

How do you choose membership of the control and treatment groups? Naively, you would think the best method is to carefully select membership to make the two groups the same. In practice, this is a bad idea because there’s always some key factor you’ve overlooked and you end up introducing bias. It turns out random group assignment is a much, much better way.

A randomized control trial (RCT) randomly allocates people to either a control group or a treatment group. The treatment group gets the treatment, and the control group doesn’t.

Natural experiments

It’s not always possible to randomly allocate people to control and treatment groups. For example, you can’t randomly allocate people to good weather or bad weather. But in some cases, researchers can examine the impact of a change if group allocation is decided by some external event or authority. For example, a weather pattern might dump large amounts of snow on one town but pass by a similar nearby town. One US state might pass legislation while a neighboring state might not. This is called a natural experiment and there’s a great deal of literature on how to analyze them.

Matching, cohorts, and difference-in-difference

If random assignment isn’t possible, or the treatment event happened in the past, there are other analysis techniques you can use. These fall into the category of quasi-experimental methods and I’m only going to talk through one of them: difference-in-difference.  

Difference-in-difference typically has four parts:

  • Split the population into a treatment group (that received the treatment) and a control group (that did not).
  • Split the control and treatment groups into multiple cohorts (stratification). For example, we could split by income levels, health indicators, or weight bands. Typically, we choose multiple factors to stratify the data.
  • Match cohorts between the control and treatment groups.
  • Observe how the system evolves over time, before and after the treatment event.

Assignment of the test population to cohorts is sometimes based on random selection from the test population if the population is big enough.

Previously, I said random assignment to groups out-performs deliberate assignment and it’s true. The stratification and cohort membership selection process in difference-in-difference is trying to make up for the fact we can’t use random selection. Quasi-experimental methods are vulnerable to confounders and bias; it’s the reason why RCTs are preferred.

Our fictitious example revisited 

The Japanese distributor hasn't found cause and effect. They’ve found an interesting relationship that might indicate cause. They’ve found the starting point for investigation, not a reason to take action. Here are some good next steps.

What data did they collect and how did they collect it? Was it all the data, or was it a sample (e.g. Japan only, breakages only, etc.)? By understanding how the data was collected or sampled, we can understand possible alternative causes of belt breaks.

Search for counterexamples. How many temperature spikes didn’t lead to breaks? They found 287 cases where there was a break after a temperature spike, but how many temperature spikes were there? If there were 293 temperature spikes, it would be strong evidence that temperature spikes were worth investigating. If there were 5,912 temperature spikes, it would suggest that temperature wasn’t a good indicator.

Look for confounders. Are there other factors that could explain the result (for example, the age of the belt)?

Attempt a quasi-experimental analysis using a technique like difference-in-difference.

If this sounds like a lot of work requiring people with good statistics skills, that’s because it does. The alternative is to either ignore the Japanese distributor’s analysis (which might be true) or implement a solution (to a problem that might not exist). In either case, the cost of a mistake is likely far greater than the cost of the analysis.

Proving causality

Proving cause and effect is a fraught area. It’s a witches’ brew of difficult statistics, philosophy, and politics. The statistics are hard, meaning that few people in an organization can really understand the strength and weaknesses of an analysis. Philosophically, proving cause and effect is extremely hard and we’re left with probabilities of correctness, not the certainty businesses want. Politics is the insidious part; if the decision-makers don’t understand statistics and don’t understand the philosophy of causality, then the risk is decisions made on feelings not facts. This can lead to some very, very costly mistakes.

The post-hoc error is just one type of error you encounter in business decision-making. Regrettably, there are many other kinds of errors.

Monday, August 15, 2022

A small revolution happened when I wasn't looking

The revolution is complete but I didn't notice

The other day I realized a market segment revolution had happened and I hadn’t noticed. There’d been a fundamental shift in the underlying technology and the change was nearly complete, to the point where very few new devices are based on the old technology. It's a classic case of technology disruption.

Batteries included

I was chopping up an old tree stump with an ax when a neighbor came over with his new chainsaw and offered to help. I gratefully accepted and he sliced up my large tree stump very quickly. Afterward, we got chatting about his new chainsaw; it was battery-powered.

(Not my tree stump, but it looked like this: allen watkin from London, UK, CC BY-SA 2.0, via Wikimedia Commons)

Frankly, I was astonished that a battery-powered chainsaw could chop up a tree stump this big and I said so. He told me the battery was good for more cutting if I had other trees to cut. He also told me he used the same batteries to power his lawn mower and he could cut his whole lawn (suburban New England) on one charge. I was taken aback, last time I looked battery powered devices were a joke.

No more gasoline internal combustion engines

The next time I went to Home Depot, I had a look at their lawnmowers and garden equipment. Almost all the lawnmowers were battery-powered, including ride-on mowers. Almost all the hedge cutters and trimmers and blowers were now battery powered too. In the last few years, garden equipment that was only ever gasoline powered has now become almost entirely battery-powered. 

The benefits are obvious: no storing gasoline, no pull starts, no winter maintenance, and so on. The only drawback I could see was battery price and power, but battery prices have fallen substantially at the same time as battery capacity has gone up. We crossed a usability threshold a while back and the benefits of battery power have led manufacturers to make the switch.

Brushless is the business

Two technologies have made this change possible: brushless motors and improved batteries. Everyone knows battery technology has improved, but brushless technology gets far less attention. Brushless motors are far more energy efficient, which means longer operation and/or more usable power for the same energy cost. They’ve been around for years but they rely on electronic control circuity to work, which made them too expensive for all but specialist applications. However, the cost of electronics has tumbled which meant cheaper brushless motors became possible. The garden equipment I saw all uses brushless motors, as do modern power tools, lawnmowers, and even snow blowers (see next section). It’s the combination of modern batteries and brushless motors that's led to a small revolution.

There's no business like snow business

For home and garden devices, the ultimate test for battery power is a snowblower. For those of you who don’t know, these are a bit bigger than a lawnmower, they’re very heavy, and they have a powerful gasoline engine. To clear a big New England snow dump, you’ll need to use a big snowblower and maybe a gallon or more of gasoline. Here’s a picture of one in use. 

(Image from https://www.wnins.com/resources/personal/features/snowblowersafety.shtml)

Snowblowers consume a lot of power. Is it even possible to have a battery-powered snowblower? Astonishingly, the answer is yes. There are at least two powerful battery-powered snowblowers on the market. You can see a video of one here.

These new snowblowers are a lot lighter than their gasoline cousins, they don’t need you to store gasoline, and they don’t require a pull start or an electric starter. The bigger two-stage snow blowers (which you need in New England) use two big brushless motors and 80V batteries. 

There are downsides though: batteries only last about 40 minutes clearing heavy snow and battery snowblowers are about 20-25% more expensive. This feels like an early adopter market right now, but in a few years, battery snowblowers will probably be the market standard. 

The revolution will not be televised

Batteries have taken over the garden equipment world. The revolution has succeeded but no one is talking about it.

There are a couple of lessons here and some pointers for the future.

It’s not just about better batteries. This garden revolution relied on brushless motor technology too. If we think of what's next for battery power or alternative energy, we need to think about enabling technologies, for example, solar panels are sometimes coupled with inverters, so advances in inverter technology are key.  

Manufacturers had an innovation pathway that made the problem more tractable. Home and garden devices have a range of power requirements. Electric screwdrivers and drills don’t need that much power, blowers and strimmers need more, lawnmowers still more, and snowblowers most of all. Manufacturers could solve the problems of lower power devices before moving up the ‘power’ chain. This is similar to Clayton Christensen’s “innovator’s dilemma” model of disruption.

Battery garden devices will put high-powered batteries in people’s homes, but they’ll be lying idle most of the time. What about using these powerful batteries to smooth out spikes in power demand or provide emergency power? What about charging the batteries at night when power is cheap and using the batteries during the day when power is more expensive? The problem is the step change needed in home electricity management, but maybe some incremental steps are possible. 

Other battery uses become possible too, for example, bigger motorized children’s toys, outdoor power away from electricity supplies, or even battery-powered boats. If powerful batteries are there, innovators will find a use for them.

Perhaps the next steps in home energy technology won’t be led by battery technology imported from cars but by battery technology imported from humble garden tools.

Sunday, July 24, 2022

Understanding Asia better

Misunderstanding Asia

Growing up in the UK, I never really understood Asia well. I heard the usual mix of opinions; that ‘they’ had developed their economies by adopting the free market, that there was something special about Asian societies that favored prosperity, and of course, that 'they' cheated and stole intellectual property. 

(Dado, Public domain, via Wikimedia Commons)

Years ago, I visited South Korea, China, Japan, and Taiwan. Immediately, I realized that what I’d read and understood was mostly wrong or at best very distorted. Even worse, the popular narratives in the west were pretty useless for understanding what I saw and heard.

Recently, I read a very illuminating book, “How Asia Works” by Joe Studwell.  Studwell provides a much better model for understanding Asia than anything I’d read before and I’m going to provide a quick overview of Studwell’s ideas here. I recommend you read his book.

How Asia Works

Studwell divides Asia into two broad groups: the successful trinity of Taiwan, South Korea, and Japan, and everyone else. China is of course a special case, but similar in many ways to the successful trinity. He immediately does away with geography and culture as factors explaining why the trinity was successful and others were not. Instead, he focuses on the development policies they followed and how they executed them.

In his view, there are three key drivers responsible for the rise of Japan, South Korea, and Taiwan; agriculture, industry, and finance. Behind these three drivers, there were crucial policies that enabled these countries to rise, but perhaps more important than the policies was the disciplined execution behind them. 

To set the scene, at the start of his narrative, all the countries were relatively poor with little industry. Each of them had a large population and each had the desire to develop and improve the lives of their people.

(Studwell's book.)

Studwell’s key insight into agriculture is the difference between productivity and efficiency. We can define productivity as the human consumable output per hectare and efficiency as the human consumable output per hour of human effort. Gardens are typically much more productive than farms at the cost of being more effort-intensive (less efficient). This is because gardeners plant their crops closer together and make better use of limited space, the price of which is the substantial human effort to maintain and harvest crops. Poor countries typically have lots of people they need to feed and little foreign exchange to pay for imported food. It makes a great deal of sense therefore to use their labor in highly productive agriculture, which usually means smallholdings. 

This is Studwell’s first key policy insight. Encouraging smallhold farming requires land reform. When people work for themselves and their families, they’re much more motivated to produce than when they’re tenants on someone else’s property. Each of the four countries, Japan, South Korea, Taiwan, and China all pursued land reform which involved redistributing land to smallhold farmers.  In all cases, landlords took a beating and saw little compensation for losing their lands. In each case, the countries were disciplined and prevented landlords from re-establishing control. All four countries saw agricultural productivity sharply rise. Rising agricultural productivity meant reduced food imports, agricultural exports to generate foreign exchange, and a surplus that was used to create demand for industrial output.

Other countries in Asia tried land reform but allowed landlords backdoors to rebuild their property portfolios. Although productivity rose in these countries, it wasn’t anything like the rise in Japan, Taiwan, South Korea, and China. It seems like a disciplined approach to land reform is key. 

Land reform also sheds some light on why the Soviet and Chinese experimentation with collective farming was a disaster; it destroyed the incentive for people to produce more for their families. Collectivization wiped out China’s agricultural productivity gains. 

It’s also the first area where Studwell’s ideas depart from standard economics. Western free-market economics stresses property rights. Forcing landlords to sell their land at low prices is very much counter to key free-market thought.

Industry is the next step. Studwell makes the same observation that everyone else does; textiles are the usual industrialization starting point because the skill set needed is relatively low. After textiles come other low-skilled products with countries working their way up the value chain to cars and semiconductors. The successful countries placed high import tariffs to protect their infant industries from more advanced foreign competition (again, deviation from free-market doctrine). They made capital available at low-interest rates to encourage company formation and growth, but crucially, they created a highly competitive internal market with companies forced to compete against each other (but not against foreign competition). The key policy was a disciplined focus on exports. South Korea tied investment capital access to foreign export targets; if your company hit its export targets you could get money, if it didn’t, you wouldn’t get money. This ensured export-led growth. Bear in mind that well-developed export markets usually have higher standards than developing domestic markets, so this policy forces manufacturers to meet higher foreign standards right from the start. He gives the example of a car produced by Malaysia’s Proton that lacked airbags and other safety features required for foreign markets, meaning the car could only be sold in Malaysia, limiting sales. Cars require imported parts, so a car produced for domestic consumption only means a hit to foreign currency reserves.


(Assembly line at Hyundai Motor Company’s car factory in Ulsan, South KoreaUser: Anonyme, CC BY-SA 3.0, via Wikimedia Commons)

Studwell made a comment that hit me between the eyes and woke me up. A highly competitive domestic market coupled with disciplined export-focused finance led to companies failing. Governments didn’t step in to prop up failing companies, rather they allowed the survivors to pick apart the carcasses of the dead companies. It’s not about governments picking winners, it’s about governments culling losers, but using a version of the free market to do so. Over the years in the UK, I’ve seen various attempts to build national champions in different segments, I can remember talk of “wasteful competition”, “world beaters”, and other rhetoric. It seems the successful Asian countries had a much better Darwinian survival-of-the-fittest approach. It’s cage-match economics, but it works.

The last part of Studwell’s trinity was finance; a disciplined approach to finance is what holds the entire thing together. In the agricultural stage, the goal is to finance smallhold farmers to enable them to buy fertilizer and the equipment they need to develop their farms. In the manufacturing stage, finance was tied to exports with very few exceptions. Disciplined finance becomes an extension of government development policy; countries that didn’t follow a disciplined path did not see the same level of investment. He points out that several countries used foreign investment to finance luxury real-estate developments that promised high short-term returns. Unfortunately, these types of projects don’t generate much foreign exchange and don’t offer long-term employment. The point is simple: don’t chase the highest returns, use finance to support strategic development initiatives. Once again, this runs counter to much free-market economic thought.

Studwell’s model explains much of what I heard in Asia, for example, it explains why joint ventures are usually structured in the way they are. It also helps explain why South Korea, Japan, and Taiwan used currency controls for as long as they did. Conversely, it explains why development in other parts of Asia was so stunted. 

Where next?

For me, one of the benefits of reading the book was helping me shake off the intellectual straitjacket of western free-market economics. Successful Asian countries embraced some key free market ideas (“culling losers”) but rejected “the invisible hand” laissez-faire idea; governments very actively intervened in markets. It seems that development in the real world is not about intellectual purity but about what works.

The obvious questions for me are where next for the successful countries, will they continue with activist government intervention, and conversely, will the unsuccessful countries learn lessons from the winners? It left me thinking more broadly about the west, if we accept the premise that governments should intervene in markets, how could we improve life for people in the west?