Tuesday, October 14, 2025

Competency porn

Over the last few weeks, I’ve increasingly heard the term “competency porn” used to describe movies or books. It’s a handy term, but I’m not sure I agree with how it’s been used. I’m going to give a little history of the term, give some examples, and tell you where I disagree with what’s been said online.

Leverage creator John Rogers created the phrase around 2009. He used it to describe an audience’s thrill at seeing (human) characters using specialist and well-developed skills to resolve some difficult situation. The situation might get very tough and evolve in ways the characters don’t expect, but they’re in control and it’s their calm use of their skills that saves the day. There are two main genres of competency porn: medical dramas and gangster/heist dramas.

A good example of medical competency porn is House. The titular character is a very talented (but deeply flawed) human being who uses his exceptional diagnostic skills to save lives. Things are never easy and there are plenty of diagnostic dead ends, but the show’s appeal lies in House’s ability to think through the situation and create opportunities for healing. Although the show mostly focuses on House, it's plain there's a (sometimes reluctant) team behind him.

(Canva)

Perhaps the best example of competency porn is the heist or gangster movie/TV series. In the heist movie, we see a group of highly-skilled (but flawed) individuals come together to overcome a series of challenges to steal something (a good example being Ocean’s Eleven). Of course, there are problems to solve along the way, but they never lose control of the situation and they overcome troubles through inventiveness borne from their skills.  The pleasure lies in watching the interaction between skilled people working together to execute a detailed plan under difficult circumstances.

(Gemini)

Competency porn characters are never “Mary Sue” types, meaning a character who has no character flaws or weaknesses. Famously, House is a seriously flawed individual, and most gangster characters have some weaknesses or problems. 

For me, superheroes can’t be competency porn. Their special powers mean they're super-human and the risk of failure is less. Their powers mean I empathize with them less; I could maybe be a safe cracker if I practiced for years, but there’s no way I could learn to fly, no matter how many buildings I jumped off. For the same reasons, I don’t think Dr Who is competency porn; famously, Dr Who isn’t human and has abilities and knowledge a human doesn’t have. Of course, superheroes always have a little “Mary Sue” tinge too.

By contrast, Law and Order is a great example of competency porn. The characters are human, highly-trained and experienced and they use their abilities to arrest and convict criminals. Almost all the time, they’re in control, and of course, they have personality flaws and weaknesses. 

Controversially, I don’t think Alien is a competency porn movie. The human characters are not in control of the situation and most of them don’t have relevant specialist skills. In fact, it's their ineptness and poor judgement that puts them at risk. There’s not a lot of calmness in the movie either. For the same reasons, horror movies can’t be competency porn.

Star Trek is usually competency porn. The characters are mostly well-trained, highly-skilled, and in control. They have a mission to accomplish, which they do through team work and the use of their complementary skill sets.

The online consensus is that Arthur C. Clarke’s Rendezvous with Rama is competency porn. It fits the definition: the characters are all human with specialist skills and they overcome challenges calmly. But for me, the characters are a little “Mary Sue” and there’s a whiff of super hero about one or two of them. Another problem for me is the pay-off. In the heist movie, the gangsters steal the money, in medical dramas, the doctor cures the patient, and in Law and Order, the criminal goes to jail. But in Rendezvous with Rama, there is no payoff: the crew leave Rama and that’s it. Other than exploration and preventing Rama being bombed, there’s no real sense the characters have achieved anything lasting.

The pleasure in competency porn is seeing a group of highly-skilled and in-control people collectively pull off something that would otherwise seem impossible. They’re not super-human in any way, so we can dream that we too could act as they do and win as they do.

Friday, October 10, 2025

Regression to the mean

An unfortunate phrase with unfortunate consequences

"Regression to the mean" is a simple idea that has profound consequences. It's led people astray for decades, if not centuries. I'm going to explain what it is, the consequences of not understanding it, and what you can do to protect yourself and your organization.

Let's give a simple definition for now: it's the tendency, when sampling data, for more extreme values to be followed by values closer to the mean. Here's an example, if I give the same children IQ tests over time, I'll see very high scores followed by more average scores, and some very low scores followed by more average scores. It doesn't mean the children are improving or getting worse, it's just regression to the mean. The problems occur when people attach a deeper meaning, as we'll see.

(Francis Galton, popularizer of "Regression to the mean")

What it means - simple examples

I'm going to start with an easy example that everyone should be familiar with, a simple game with a pack of cards.

  • Take a standard pack of playing cards and label the cards in each suit 1 to 13 (Ace is 1, 2 is 2, Jack is 11, etc.). The mean card value is 7.5. 
  • Draw a card at random. 
  • Imagine it's a Queen (12). Now, replace the card and draw another card. Is it likely the card will have a lower value or a higher value? 
    • The probability is (11/13), it will have a lower value. 
  • Now imagine you drew an ace (1), replace the card and draw again. 
    • The probability of drawing another ace is 1/13.
    • The probability of drawing a 2 or higher is 12/13. 
It's obvious in this example that "extreme" value cards are very likely to be followed by more "average" value cards. This is regression to the mean at work. It's nothing complex, just a probability distribution at work.

The cards example seems simple and obvious. Playing cards are very familiar and we're comfortable with randomness (in fact, almost all card games rely on randomness). The problem occurs when we have real measurements, we tend to give explanations to the data when randomness (and regression to the mean) is all that's there.

Let's say we're measuring the average speed of cars on a freeway. Here are 100 measurements of car speeds. What would you conclude about the freeway? What pattern can you see in the data and what does it tell you about driver behavior (e.g. lower speeds following higher speeds and vice versa)? What might cause it? 

['46.7', '63.3', '80.0', '71.7', '34.2', '55.0', '67.5', '34.2', '67.5', '67.5', '59.2', '63.3', '55.0', '34.2', '63.3', '63.3', '63.3', '59.2', '75.8', '71.7', '42.5', '42.5', '34.2', '34.2', '59.2', '67.5', '59.2', '71.7', '71.7', '67.5', '50.8', '63.3', '34.2', '63.3', '30.0', '38.3', '50.8', '34.2', '75.8', '75.8', '46.7', '80.0', '55.0', '46.7', '38.3', '38.3', '75.8', '59.2', '34.2', '42.5', '71.7', '71.7', '80.0', '80.0', '71.7', '34.2', '63.3', '71.7', '46.7', '42.5', '46.7', '46.7', '63.3', '80.0', '80.0', '38.3', '38.3', '46.7', '38.3', '34.2', '46.7', '75.8', '55.0', '30.0', '55.0', '75.8', '30.0', '42.5', '67.5', '30.0', '50.8', '67.5', '67.5', '71.7', '67.5', '67.5', '42.5', '75.8', '75.8', '34.2', '55.0', '50.8', '38.3', '71.7', '46.7', '71.7', '50.8', '71.7', '42.5', '42.5']

Let's imagine the authorities introduced a speed camera at the measurement I've indicated in red. What might you conclude about the effect of the speed camera?

You shouldn't conclude anything at all from this data. It's entirely random. In fact, it has the same probability distribution as the pack of cards example. I've used 13 different average speeds, each with the same probability of occurrence. What you're seeing is the result of me drawing cards from a pack and giving them floating point numbers like 71.7 instead of a number like 9. The speed camera had no effect in this case. The data set shows the regression to the mean and nothing more.

The pack of cards and the vehicles example are exactly the same example. In the pack of cards case, we understand randomness and we can intuitively see what regression to the mean actually means. Once we have a real world problem, like the cars on the freeway, our tendency is to look for explanations that aren't there and we discount randomness. Looking for meaning in random data has had bad consequences, as we'll see.

Schools example

In the last few decades in the US, several states have introduced standardized testing to measure school performance. Students in the same year group take the same test and, based on the results, the state draws conclusions about the relative standing of schools; it may intervene in low performing schools. The question is, how do we measure the success of these interventions? Surely, we would expect to see an improvement in test scores taken the next year? In reality, it's not so simple.

The average test result for a group of students will obviously depend on things like teaching, prior attainment etc. But there are also random factors at work. Individual students might perform better or worse than expected due to sickness, or family issues, or a host of other random issues. Of course, different year groups in the same school might have a different mix of abilities. All of which means that regression to the mean should show up in consecutive tests. In other words, low performing schools might show an improvement and high performing schools might show a degradation entirely due to random factors.

This isn't a theoretical example: regression to the mean has been clearly shown in school scores in Massachusetts, California and in other states (see Haney, Smith & Smith). Sadly, state politicians and civil servants have intervened based on scores and drawn conclusions where they shouldn't.

Children's education evokes a lot of emotion and political interest, which is not a good mix. It's important to understand concepts like regression to the mean so we can better understand what's really going on.

Heights example

"Regression to the mean" was originally called "regression to mediocrity", and was based on the study of human heights. If regression to mediocrity sounds very disturbing, it should do. It's closely tied to eugenics through Francis Galton. I'm not going to dwell on the links between statistics and eugenics here, but you should know the origins of statistics aren't sin free.

In 1880s England, Galton studied the heights of parents and their children. I've reproduced some of his results below. He found that parents who were above average height tended to have children closer to the average height, and that parent parents below average height tended to have children closer to the average height. This is the classic regression to the mean example. 

Think for the moment about possible different outcomes of a study like this. If taller parents had taller children, and shorter parents had shorter children, then we might expect to see two population groups emerging (short people and tall people) and maybe the start of speciation. Conversely, if tall parents had short children, and short parents had tall children, this would be very noticeable and commented on. Regression to the mean turns out to be a good explanation of what we observe in nature.

Galton's height study was very influential for both the study of genetics and the creation of statistics as a discipline.

New sports players

Let's take a cohort of baseball players in their first season. Obviously, talent makes a difference, but there are random factors at play too. We might expect some players to do extremely well, others to do well, some to do OK, some to do poorly, and some to do very poorly.  Regression to the mean tells us that some standout players may well perform worse the next year. Other, lower-ranked players will perform better for the same reason. The phenomena of new outstanding players performing worse in their second year is often called the "sophomore slump" and a lot has been written about it, but in reality, it can mostly be explained by regression to the mean.

You can read more about regression to the mean in sports here:

Business books

Popular business books often fall into the regression to the mean trap. Here's what happens. A couple of authors do an analysis of top performing businesses, usually measured by stock price, and find some commonalities. They develop these commonalities into a framework and write a best-selling business book whose thesis is, if you follow the framework, you'll be successful. They follow this with another book that's not quite as good. Then they write a third book that only the true believers read.

Unfortunately, the companies they select as winners don't do as well over a decade or more, and the longer the timescale, the worse the performance. Over the long-run, the authors' promise that they've found the elixir of success is shown to be not true. Their books go from the best seller list to the remainder bucket.

A company's stock price is determined by many factors, for example, its competitors, the market state, and so on. Only some of them are under the control of the company. Conditions change over time in unpredictable ways.  Regression to the mean suggests that great stock price performers now might not be in future, and low performers may do better. Regression to the mean neatly explains why picking winners today des not mean the same companies will be winners in the years to come. In other words, basic statistics makes a mockery of many business books.

Reading more:

  • The Halo Effect: . . . and the Eight Other Business Delusions That Deceive Managers - Phil Rosenzweig 

My experience

I've seen regression to the mean pop up in all kinds of business data sets and I've seen people make the classic mistake of trying to derive meaning from randomness. Here are some examples.

Sales data has a lot of random fluctuations, and of course, the smaller the sample, the greater the fluctuations. I've seen sales people have a stand out year followed by a very average year and vice versa. I've seen the same pattern at a regional and country level too. Unfortunately, I've also seen analysts tie themselves in knots trying to explain these patterns. Even worse, they've made foolish predictions based on small sample sets and just a few years' worth of data.

I've seen very educated people get very excited by changes in company assessment data. They think they've spotted something significant because companies that performed well one year tended to perform a bit worse the next etc. Regression to the mean explained all the data.

How not to be fooled

Regression to the mean is hidden in lots of data sets and can lead you into making poor decisions. If you're analyzing a dataset, here are some questions to ask:

  • Is your data the result of some kind of sampling process? 
  • Does randomness play a part in your collection process or in the data?
  • Are there unknowns that might influence your data?

If the answer to any of these questions is yes, you should assume you'll find regression to the mean in your dataset. Be careful about your analysis and especially careful about explaining trends. Of course, the smaller your data set, the more vulnerable you are.

You can estimate the effect of regression to the mean on your data using a variety of methods. I'm not going to go into them too much here because I don't want to make this blog post too long. In the literature, you'll see references on running a randomized control trial (RCT) also known as an A/B test. That's great in theory, but the reality is that it's not appropriate for most business situations. In practice, you'll have to run simulations or do some straightforward estimation of the fractional regression to the mean.

Friday, September 26, 2025

More money means more goals

Winner takes all?

Do clubs with the most expensive players score more goals in English league football? The answer is a strong yes.

In this blog post, I'll show an analysis of goals scored vs. club transfer value and you'll clearly see a strong correlation. Of course, it's not the only factor that affects goals scored, but it's a strong signal.

(Google Gemini. Note the Euro has three legs!)

The data

The data comes from TransferMarkt (https://www.transfermarkt.com/) who publish a market values for clubs. The market value is the estimated transfer value of all the players in the club squad. Obviously, transfer values change over time when players are bought, sold, or are injured. TransferMarkt have club transfer values at the start of each season and they also provide biweekly values. For this analysis, I've used the season start values. The dataset starts properly in 2010 for the top four tiers.

The charts

The charts below show goals for, against, and net (for - against) vs. total club transfer value for each club for each season for each league. The slider lets you change the year and the buttons let you change the league tier. The points on the charts are individual clubs and the line is a linear regression fit. The r2 and p-value for the fit are in the chart title. The blue band is the 95% confidence interval on the fit.

In addition to the buttons and slider, the charts are interactive:

  • You can hover over points and see their values.
  • You can zoom-in or zoom-out using the tool menu on the left.
  • You can save the charts using the tools menu on the left.

Take a while to play with the charts.

What the charts show

All leagues show the following trends:

  • Higher club value = more for goals
  • Higher club value = fewer against goals
  • Higher club value = more net goals 

The strength of this correlation varies by league and by time, but it's there.

The r2 value varies in the range 0.4 to 0.91, suggesting a good correlation, but it's not the only factor; there are other factors we need to consider to fully model goals. The p-values are close to 0, indicating this correlation is very unlikely to have happened by chance.

Take a look at league tier 3 for 2024 (this is currently called "League One"). There's a huge outlier and it's Birmingham City. These guys were in the Premier League not so long ago, but suffered a number of problems on and off the pitch which led to their relegation. They've recently had a big cash injection are are now owned (in part) by Tom Brady. Part of this big cash injection was new management and new players. As a result, they were promoted back to the EFL Championship (tier 2) in 2025. In other words, they're a big club temporarily fallen on hard times; they're an outlier.

If you take a look at tier 2, you'll see the top valued clubs are pretty much all clubs recently relegated from the Premier League. To play in the Premier League, you need top-quality talent, and that's expensive. On the flip side, you get more gate revenue and TV money. Relegated teams face a number of issues: star players may leave and revenues drop precipitously. To stand any chance of being promoted, clubs need to retain top-talent at the same time as their revenue has fallen. These conflicting requirements can and has led to financial instability. To ease the relegation transition, the Premier League provides "parachute" payments to relegated clubs.  The upshot is, newly relegated teams are in a better place than the other clubs in the league; they have parachute money and good players.

Children's fiction, Ted Lasso, and Wrexham 

Growing up in England, there was a lot of football fiction aimed at kids. A staple of the genre was a struggling team that somehow make it to the top, out-playing bigger and more expensive teams. Sadly, this just isn't the reality and probably never was; money is pretty much the only way up. Looking back, I'm not sure the financial underdog fantasy was helpful.

Both the fictional Ted Lasso and the real Wrexham are in the news. Notably, neither Ted Lasso nor Wrexham are rags-to-riches tales. 

In Ted Lasso, the fictional Richmond team owner brought in Ted Lasso to tank the team performance to spite her ex-husband. The team had plenty of money (lack of money was never a major story line). Perhaps the writers felt that having a cheap team rise to the top would be too unrealistic. 

Wrexham's upward path has been paid for by Hollywood money, and in fact Wrexham's club value is pretty typical of a League One team, they're very much not the financial underdog. 

The rags-to-riches fantasy, or maybe, the financial underdog-wins-all fantasy, is just a fantasy.

The bottom line

The bottom line is the bottom line. Money talks, and if you want to score the goals, you've got to spend the cash.