Showing posts with label data visualization. Show all posts
Showing posts with label data visualization. Show all posts

Thursday, July 31, 2025

Attendance at English football: a tale of tragedy and recovery

Sport and on-going tragedy

Analyzing sports data is normally a harmless activity, but sometimes, it takes you to much darker places, and analyzing attendance in English football takes you to some dark places very quickly.

This blog post is about trends in attendance in English football and what you can tell from the data. For this analysis, I’m going to flip the script and start to talk about causes first and then show you the data.

(Steenbergs from Ripon, United Kingdom, CC BY 2.0, via Wikimedia Commons. Newcastle United vs. Chelsea, 2010-11-28. Note everyone seated and almost all seats filled.)

Hooliganism, racism, antisemitism, and tragedy

Football in the UK has suffered from hooliganism almost since the beginning of professional football. There are reports of riots dating back to 1909 and vandalism from 1934. After the Second World War, hooliganism arrived in earnest and by the 1980’s, the English game was in deep trouble. Notoriously, violence followed English teams abroad. Innocent people were caught up in the mayhem and were killed. 

The UK government acted to crack down on offenders and the football authorities worked to make the game safer. Things improved slowly through the 1990’s, and the game’s battered reputation gradually improved and matches became safer to go to.

By the 1970’s, black players started to appear in English football teams, and so did racism. Some spectators threw banana peels on the pitch and make monkey noises when a black player got the ball. England ‘fans’ abused black English players at international matches. Perhaps unsurprisingly, hooligans had links to far right groups and were extremely racist.

The football governing bodies cracked down hard on racism and banned people for life from all stadiums, but of course, it’s still present.

As you might expect, antisemitism is also a problem. Tottenham Hotspur has a long and well-known connection with Jewish communities in London. So, opposing fans would regularly shout antisemitic abuse.

Sadly, to complete the picture, I need to point out some significant English horrors, only one of which (Heysel Stadium) was related to hooliganism.

  • 1946 Burnden Park, Bolton Wanderers vs. Stoke City. 33 fans killed by crush injuries.
  • 1985 Valley Parade, Bradford. Bradford City vs. Lincoln City. 56 spectators killed by fire.
  • 1985 Heysel Stadium, Brussels. Juventus vs. Liverpool. 39 fans killed by a collapsing wall. 
  • 1989 Hillsborough, Sheffield. Liverpool vs. Nottingham Forest. 97 fans killed by crush injuries. 
The Hillsborough disaster in particular triggered a series of wide-ranging changes, for example, the introduction of all-seater stadiums.

Given all this, would you take your children to a football match in the late 1980's or early 1990's?

You can read more on these issues here:

To sum it all up, the 1980’s were the nadir of English football. Things have got a lot better since then, but there’s work to be done.

Now, let’s look at the data.

Attendance numbers

The chart below shows total attendance by league for each year since the start of the league. Total attendance is the sum of the attendance for each match held that season. The data is for English league matches only. The salmon-colored bands are World War I and World War II.

The chart is interactive, you can click on the legend to turn the leagues on and off, and you can use the toolbar on the right to zoom in and move around the data.

The chart shows the growth in attendance up to the immediate post-war period, followed by a decline. The nadir was 1989. I’ve explained above what was going on post-war, and given the issues, it’s no surprise attendance fell off.

The post-1989 recovery is probably due to a number of factors. The authorities have acted decisively to stamp out hooliganism, racism, and antisemitism. In the wake of the Hillsborough disaster, owners have invested in new stadiums that offer fans a much more pleasant experience. Fan culture has changed too, with more families attending matches and clubs actively trying to attract them. Notably, these changes are at all levels of the game.

COVID

COVID impacted the 2019-2020 season, with part of the season played behind closed doors and the lower leagues (tiers 3-5), cancelling games. However, the 2020-2021 season was played almost entirely without spectators. This is very clear in the attendance figures on the chart.

Home advantage

In a previous blog post, I used this chart below which shows the decline in home advantage (again, it’s interactive). One of my favorite explanations for home advantage was the effect of spectators. The key insight was that during COVID, home advantage (along with spectators), disappeared. 

Sadly, there’s a problem. Compare the shape of this graph to the attendance graph above. The decline in home advantage has been steady after the Second World War, but the attendance figures have not. If fans make a difference, we might expect more fans = more difference, but that doesn’t seem to be the case. Whatever the relationship between attendance and home advantage, it’s more subtle than just numbers.

Attendance distributions

The total numbers tell a story, but not the complete story. In any given season and league, there's a distribution of attendance, with some matches well-attended, while others are not. The change in distribution over time can tell us some very useful things.

Violin plots are very helpful to visualize distributions. In previous blog posts, I've talked about them in some depth, but for right now, you need to know they represent the distribution of the underlying data.

The charts below show violin (distribution) plots for the top four leagues. You can move the slider to see different years. The x-axis is attendance and you should note two things:

  • It's different for each league.
  • It changes year to year.

Slide the slider back through time and watch the shape of the distributions change. Compare the top tier (1) to the other tiers.

There were no games during World War I and II, and the 2020-2021 season was largely played behind closed doors because of COVID. The 2019-2020 season was also affected by COVID, but the story here is more subtle. Partway through the season, matches were played behind closed doors for leagues 1 and 2, but leagues 3 and 4 stopped the season and played no more games, which meant that league tiers 1 and 2 played games with no spectators while tiers 3 and 4 did not. This shows up strongly in the data; you can see significant zero attendance for tier 1 and 2 but not for tiers 3 and 4.

Strikingly, prior to about 1993, the distributions for all leagues are approximately unimodal with a fat tail. That's still mostly the case for lower leagues, but not for the top tier, the Premier League. The Premier League distribution is now bimodal.  To explain this, we need to know about stadium capacity and how full stadiums are (I'll call this the sold-out fraction, a sold-out fraction of 100% means the stadium is full to capacity and a sold-out fraction of 0% means it's completely empty).

Let's look at capacity first. The charts below show the capacity of the stadiums for the top four tiers. Note the Premier League has 'groupings' around 60,000 and 30,000. The Championship (tier 2) has a more linear distribution. Tiers 3 (League One) and 4 (League Two) also show groupings. The stadium size grouping is clearly visible in the Premier League attendance violin charts, it's the bimodal distribution. But we don't see the stadium distribution for League One and League Two. Why?

The answer lies in the sold-out fraction numbers. In the table below, I show the sold-out fraction by league-tier for 2024-2025.

League name League tier Sold-out fraction 2024-2025
Premier League 1 98.9%
Championship 2 81.4%
League One 3 68.1%
League Two 4 56.5%

At 98.9% sold-out, Premier League attendance is clearly limited by stadium size, so you would expect the stadium size groupings to clearly show up in the data, which they do. For the lower leagues, the sold-out fraction is less, meaning stadium size isn't a limiting factor and doesn't show up so much in the attendance data. 

There are a couple of points to make about stadium size. In the Premier League, the stadium size groupings support the idea of a league-within-a-league. Building 60,000+ capacity stadiums is hugely expensive, but if you can fill them, you get more revenue; Man Utd's stadium has a capacity of 74,197 compared to Bournemouth's 11,307 which is vastly different in size and of course, ticket sales. In the lower leagues, there are some stadiums with capacity way in excess of attendance, which must be a financial drag. Clubs are still building up their stadiums across all leagues, which is a striking vote of confidence in the future.

Attendance figures tell us a lot about changes in support and the structure of the game.

What of the future?

I’m hopeful for the future. I like the initiatives clubs are taking to make themselves family-friendly and I’m pleased to see hatred and violence being stamped out. I'd love to see attendance rise in the lower leagues and I'm very happy to see the rise of the women's game. Of course, there are still problems and I expect them to persist, with trouble sporadically occurring. My expectation is, attendances will rise as the game day experience becomes better for everyone.