Trends in English football

Welcome to this mini-app where you can see some of the trends in English football over the last 137 years and explore the data for yourself. Almost all the charts are interactive; you can turn chart lines on or off, select different years, or zoom in and explore the data. The legends are almost all interactive (just click on them to see), and almost all plots have a tool menu on the right hand side you can use to explore. Some plots have a widget you can use to change the year.

(Ronnie Macdonald from Chelmsford, United Kingdom, CC BY 2.0, via Wikimedia Commons)

The league names have changed over the years, but the pyramid structure hasn't. Currently, the Premier League is the top tier (tier 1), but before 1993, the top league was called the First Division. To analyze what's going on over time, it's better to focus on league tiers rather than league names.

Click on a tab to get started!

What fraction of matches end in a drawer? The chart below shows the fraction of all matches in a season that ended in a drawer by league and by season. The salmon colored bands show the First and Second World Wars where football was either suspended or played differently.

The league started with a low fraction of draws. That's kind of what you might expect of a new league with a range of club abilities. As football professionalized and lower leagues were established, clubs became more equal up to the end of the second world war. After about 1970, the draw fraction stabilized at about 0.27 for all leagues, except the top league. Since the creation of the Premier League in 1993, the tier 1 draw fraction has dropped, meaning more matches are ending in a win/loss rather than a draw. The question is, why?

There are at least two explanation for the declining draw fraction in the Premier League:

The clubs in the league are becoming more unequal.
The clubs are still equal, but the style of play has changed, leading to more wins and losses, but still the same number of wins for each team overall.

Fortunately, there is a way of digging into this. Each club will win a fraction of games per season. If the clubs are equal, then the variance or standard deviation will be low, if they're unequal, then the standard deviation (or variance) will be larger. A change in the win fraction standard deviation tells us what's going on. If it's increasing, then the teams are becoming less equal. Let's look at the data.

The data clearly shows an increasing win fraction standard deviation for the Premier League, but not the other leagues. This implies the Premier League is becoming a league of winners and losers, but that's not the case for the other leagues.

Fans want to see goals scored. Are the number of goals going up or down over time? Let's see.

The chart below shows the mean total number of goals per season per league.

For all leagues, except the Premier League, the number goals per match has settled down to a mean of about 2.6. Since about 2009, the Premier League has seen a small but noticeable increase in the number of goals per game.

The total number of goals tell us part of the story, but what about the goal difference between clubs in matches? If that's increasing, the difference between winners and losers is increasing. Here's a chart of mean goal difference per league per season.

The chart shows a more-or-less constant goal difference for tiers 2-4 after about 1970, but the story for the Premier League (tier 1) is very different after about 2005. It shows an increasing goal difference. This adds to the narrative about the Premier League becoming more unequal, a league of winners and losers.

The previous tab showed us that mean scores are reducing over time, but how is the score distribution changing? Are 1-1, 2-1, etc, scorelines becoming more common? These heatmaps show the score distributions by league in the form of a heatmap. The x-axis is home score, the y-axis is away score. The brighter the color, the more matches ended with that scoreline. The slider lets you slide back through time to examine changes over the years. No matches were played in WWI and WWII and the leagues began on different dates.

As you can see, lower scores have become more common over time, but there's not been much change in the last few decades. Despite this, there are still matches that have higher scorelines.

English football went through some terrible times in the post-war period, with 1989 possibly being the worst year ever. There wwas hooliganism and rascism coupled with some terrible disasters. Scores of peopel died at matchs. Since then, the UK government, the football governing bodies, and owners, have worked hard to rehabilitate the game.

All this hstory shows up in the attendance figures below. Unsurprisingly, they reach a nadir in 1989 followed by a slow recovery as all-seater stadiums were introduced and the game became safer.

Total attendance tells us something about the game, but a more insightful view is to look at the attendance distribution at a league and season level. I'm going to use violin plots to do that. The chart below shows the attendance distribution for each of the top four leagues by year. You can use the slider at the bottom to change the year.

The data clearly shows the emergence of a bimodal attendance for the Premier League, something that's only become apparent in the last few years. Ntably, the other leagues don't show this pattern.

The data set I have for disciplinary data (red cards, yellow cards, fouls) is much more limited in time than my other data sets. For brevity, my analysis here will focus only on yellow cards. The chart below shows the mean number off yellow cards per match over the last twenty years.

You can see a dip during COVID.

There's a school of though that referees are biased against away teams. We can investigate this by looking at the fraction of yellow cards awarded to away teams. If there were no bias, we would expect this number to be 0.5. Here's the chart:

The chart shows three things:

The fraction is consistently above 0.5, suggesting there is either referee bias or that away teams play more aggressively.
The fraction is declining over time.
There's a dip during COVID and the away team bias is substantially reduced.

Possible explanations for this bias include the effect of home supporters on referees. The idea is, referees can be intimidated into penalizing away clubs more than home clubs.

Home advantage exists in many sports, but does it exist in English football? I worked out the fraction of all wins that were home wins for each league and for each season. There resulrs are in the chart below. If there were no home advantage, the home win fraction would be 0.5

You can clearly see two things from this chart:

There is a home bias.
It's declining over time.

Given that home advantage exists, how many goals is it worth? I've plotted the mean difference between home goals and away goals by season for the top five tiers and plotted it out by season.

You can see that the goal advantage is decreasing over time.

What might cause home advantage and what might be causing its decline? Changes over COVID show that fans have an effect, but it's not a simple relationship. While it's true that there is a bias in yellow cards and the bias is decreasing, attendance figures have followed an almost v-shaped distribution. So if fans were the cause, we might expect to see a v-shaped home advantage, but we don't. Whatever explanation there is must account for the yellow card phenomena and what happened over COVID.

Engora Data Blog

Sunday, August 3, 2025

English league football mini app

Trends in English football