The data can't be right, it's so ugly
Despite what many technical people want to believe, well-presented data is more convincing than badly presented data. Unfortunately, the default way Pandas outputs dataframes as tables is ugly. I'm going to show you how to make Pandas dataframes (tables) very pretty and hopefully more convincing.
Ugly Betty
My dataset is the results of the 2019 UK general election: the number of MPs and voters per party. Here's my Pandas dataframe (I've called it parliament for some reason):
party MPs votes MPs frac votes frac
0 Conservative 365 13966565 0.561538 0.452447
1 Labour 202 10269076 0.310769 0.332667
2 Scottish Nationalist 48 1242380 0.073846 0.040247
3 Liberal Democrats 11 3696423 0.016923 0.119746
4 Democratic Unionists 8 244128 0.012308 0.007909
5 Sinn Fein 7 181853 0.010769 0.005891
6 Plaid Cymru 4 153265 0.006154 0.004965
7 Social Democratic and Labour 2 118737 0.003077 0.003846
8 Green 1 835579 0.001538 0.027069
9 Alliance 1 134115 0.001538 0.004345
10 Speaker 1 26831 0.001538 0.000869
If we output the dataframe to HTML using parliament.to_html(), here's what we get by default. It looks amateurish. Let's make it nicer.
party | MPs | votes | MPs frac | votes frac | |
---|---|---|---|---|---|
0 | Conservative | 365 | 13966565 | 0.561538 | 0.452447 |
1 | Labour | 202 | 10269076 | 0.310769 | 0.332667 |
2 | Scottish Nationalist | 48 | 1242380 | 0.073846 | 0.040247 |
3 | Liberal Democrats | 11 | 3696423 | 0.016923 | 0.119746 |
4 | Democratic Unionists | 8 | 244128 | 0.012308 | 0.007909 |
5 | Sinn Fein | 7 | 181853 | 0.010769 | 0.005891 |
6 | Plaid Cymru | 4 | 153265 | 0.006154 | 0.004965 |
7 | Social Democratic and Labour | 2 | 118737 | 0.003077 | 0.003846 |
8 | Green | 1 | 835579 | 0.001538 | 0.027069 |
9 | Alliance | 1 | 134115 | 0.001538 | 0.004345 |
10 | Speaker | 1 | 26831 | 0.001538 | 0.000869 |
Adding style
Pandas dataframes have a style property we can use to customize the appearance of the dataframe and its HTML rendering too. The style property returns a Styler object we can use to make changes to the way the data is rendered as a HTML table. I'm going to add style and show you what the rendered HTML looks like.
Precision, thousands, and hiding the index
The fraction of votes and MPs has six decimal places, which is the default for Python formatting. Let's change the fractional numbers to three decimal places, introduce thousand separators for the number of votes, and hide the index. Here's the code to do it:
parliament.style.format(
{"MPs frac":"{:.3f}",
"votes frac":"{:.3f}",
"votes": "{:,}"}
).hide_index().render()
In this case, the style.format code takes a dict argument. The dict keys are the dataframe column names and the dict values are the formatting instructions. Most Python formatters work with this method but some don't; for example, the alignment Python formatters don't work. Here's what the rest of the code means:
- {:.3f} truncates the floating point numbers to three decimal places
- {:,} introduces the thousand separator
- hide_index hides the index
- render renders the table using HTML - it produces a string output of HTML text
The arguments to format don't have to be a dict, but using a dict makes it easier if you're changing several columns at once.
Here's the HTML output from the code above. It's a big improvement, but not quite what we want.
party | MPs | votes | MPs frac | votes frac |
---|---|---|---|---|
Conservative | 365 | 13,966,565 | 0.562 | 0.452 |
Labour | 202 | 10,269,076 | 0.311 | 0.333 |
Scottish Nationalist | 48 | 1,242,380 | 0.074 | 0.040 |
Liberal Democrats | 11 | 3,696,423 | 0.017 | 0.120 |
Democratic Unionists | 8 | 244,128 | 0.012 | 0.008 |
Sinn Fein | 7 | 181,853 | 0.011 | 0.006 |
Plaid Cymru | 4 | 153,265 | 0.006 | 0.005 |
Social Democratic and Labour | 2 | 118,737 | 0.003 | 0.004 |
Green | 1 | 835,579 | 0.002 | 0.027 |
Alliance | 1 | 134,115 | 0.002 | 0.004 |
Speaker | 1 | 26,831 | 0.002 | 0.001 |
Column alignment and spacing
Let's right-align the columns and add a bit more spacing between columns.
{"MPs frac":"{:.3f}",
The set_properties method sets the CSS properties of the HTML object, in this case, the table.
Here's the output:
party | MPs | votes | MPs frac | votes frac |
---|---|---|---|---|
Conservative | 365 | 13,966,565 | 0.562 | 0.452 |
Labour | 202 | 10,269,076 | 0.311 | 0.333 |
Scottish Nationalist | 48 | 1,242,380 | 0.074 | 0.040 |
Liberal Democrats | 11 | 3,696,423 | 0.017 | 0.120 |
Democratic Unionists | 8 | 244,128 | 0.012 | 0.008 |
Sinn Fein | 7 | 181,853 | 0.011 | 0.006 |
Plaid Cymru | 4 | 153,265 | 0.006 | 0.005 |
Social Democratic and Labour | 2 | 118,737 | 0.003 | 0.004 |
Green | 1 | 835,579 | 0.002 | 0.027 |
Alliance | 1 | 134,115 | 0.002 | 0.004 |
Speaker | 1 | 26,831 | 0.002 | 0.001 |
Colors
The political parties have colors, so it would be nice to show their party colors as a background to their names, meaning we should change the background colors of the party column. It might also be nice to highlight the maximum results in a light gray. Maybe we can get really clever and add a bar chart representing the number of seats won. Here's the code to do all that:
Here's what this code does:
- bar takes the data in the column and draws a bar chart based on it. It uses the full width of the column and expands the column if necessary, hence my need to restrict the column width to get the table to fit on the Blogger page correctly.
- The party background colors are created with applymap method using the colors function applied to just the party column using the subset argument.
- The maximum value highlighting I do with the highlight_max built-in method and I highlight the cells a very light gray.
- The method set_table_styles restricts the width of the MPs column so the page renders on Blogger; it uses a CSS selector to do it, and of course you could use the same approach for fine grain formatting using CSS.
- The subset argument restricts formatting to just the specified columns.
Here's what the final results look like:
party | MPs | votes | MPs frac | votes frac |
---|---|---|---|---|
Conservative | 365 | 13,966,565 | 0.562 | 0.452 |
Labour | 202 | 10,269,076 | 0.311 | 0.333 |
Scottish Nationalist | 48 | 1,242,380 | 0.074 | 0.040 |
Liberal Democrats | 11 | 3,696,423 | 0.017 | 0.120 |
Democratic Unionists | 8 | 244,128 | 0.012 | 0.008 |
Sinn Fein | 7 | 181,853 | 0.011 | 0.006 |
Plaid Cymru | 4 | 153,265 | 0.006 | 0.005 |
Social Democratic and Labour | 2 | 118,737 | 0.003 | 0.004 |
Green | 1 | 835,579 | 0.002 | 0.027 |
Alliance | 1 | 134,115 | 0.002 | 0.004 |
Speaker | 1 | 26,831 | 0.002 | 0.001 |
Commentary
It's nice that Pandas has this functionality, and it's nice that it's as extensive as it is, but there's a problem. The way style is implemented is inconsistent and hard to understand, for example, some but not all of the string formatters work, and there are two methods that do very similar things (set_table_styles and set_properties). In practice, it takes more time and it's harder than it needs to be to get good results. The code looks ungainly too. It is what it is for now.
Next steps
You can do some other clever things with style, like apply heatmaps, or apply clever conditional table formatting. You can really make your data output standout, but be careful, you can go overboard! To find out more, read the Pandas dataframe style documentation.
No comments:
Post a Comment