Monday, October 19, 2020

Stylish Pandas in the frame

The data can't be right, it's so ugly

Despite what many technical people want to believe, well-presented data is more convincing than badly presented data. Unfortunately, the default way Pandas outputs dataframes as tables is ugly. I'm going to show you how to make Pandas dataframes (tables) very pretty and hopefully more convincing.

(A very attractive panda. Image source: Wikimedia Commons. Author: Christian Mehlführer. License: Creative Commons.)

Ugly Betty

My dataset is the results of the 2019 UK general election: the number of MPs and voters per party. Here's my Pandas dataframe (I've called it parliament for some reason):

                           party  MPs     votes  MPs frac  votes frac

0                   Conservative  365  13966565  0.561538    0.452447

1                         Labour  202  10269076  0.310769    0.332667

2           Scottish Nationalist   48   1242380  0.073846    0.040247

3              Liberal Democrats   11   3696423  0.016923    0.119746

4           Democratic Unionists    8    244128  0.012308    0.007909

5                      Sinn Fein    7    181853  0.010769    0.005891

6                    Plaid Cymru    4    153265  0.006154    0.004965

7   Social Democratic and Labour    2    118737  0.003077    0.003846

8                          Green    1    835579  0.001538    0.027069

9                       Alliance    1    134115  0.001538    0.004345

10                       Speaker    1     26831  0.001538    0.000869

If we output the dataframe to HTML using parliament.to_html(), here's what we get by default. It looks amateurish. Let's make it nicer.

partyMPsvotesMPs fracvotes frac
0Conservative365139665650.5615380.452447
1Labour202102690760.3107690.332667
2Scottish Nationalist4812423800.0738460.040247
3Liberal Democrats1136964230.0169230.119746
4Democratic Unionists82441280.0123080.007909
5Sinn Fein71818530.0107690.005891
6Plaid Cymru41532650.0061540.004965
7Social Democratic and Labour21187370.0030770.003846
8Green18355790.0015380.027069
9Alliance11341150.0015380.004345
10Speaker1268310.0015380.000869

Adding style

Pandas dataframes have a style property we can use to customize the appearance of the dataframe and its HTML rendering too. The style property returns a Styler object we can use to make changes to the way the data is rendered as a HTML table.  I'm going to add style and show you what the rendered HTML looks like.

Precision, thousands, and hiding the index

The fraction of votes and MPs has six decimal places, which is the default for Python formatting. Let's change the fractional numbers to three decimal places, introduce thousand separators for the number of votes, and hide the index. Here's the code to do it:

parliament.style.format(
    {"MPs frac":"{:.3f}",
     "votes frac":"{:.3f}",
     "votes": "{:,}"}
    ).hide_index().render()

In this case, the style.format code takes a dict argument. The dict keys are the dataframe column names and the dict values are the formatting instructions. Most Python formatters work with this method but some don't; for example, the alignment Python formatters don't work. Here's what the rest of the code means:

  • {:.3f} truncates the floating point numbers to three decimal places
  • {:,} introduces the thousand separator
  • hide_index hides the index
  • render renders the table using HTML - it produces a string output of HTML text

The arguments to format don't have to be a dict, but using a dict makes it easier if you're changing several columns at once.

Here's the HTML output from the code above. It's a big improvement, but not quite what we want.

party MPs votes MPs frac votes frac
Conservative 365 13,966,565 0.562 0.452
Labour 202 10,269,076 0.311 0.333
Scottish Nationalist 48 1,242,380 0.074 0.040
Liberal Democrats 11 3,696,423 0.017 0.120
Democratic Unionists 8 244,128 0.012 0.008
Sinn Fein 7 181,853 0.011 0.006
Plaid Cymru 4 153,265 0.006 0.005
Social Democratic and Labour 2 118,737 0.003 0.004
Green 1 835,579 0.002 0.027
Alliance 1 134,115 0.002 0.004
Speaker 1 26,831 0.002 0.001

Column alignment and spacing

Let's right-align the columns and add a bit more spacing between columns.

parliament.style.format(
    {"MPs frac":"{:.3f}",
     "votes frac":"{:.3f}",
     "votes": "{:,}"})
    .set_properties(**{'text-align': 'right',
                       'padding':'0 15px'})
    .hide_index().render()

The set_properties method sets the CSS properties of the HTML object, in this case, the table. 

Here's the output:


party MPs votes MPs frac votes frac
Conservative 365 13,966,565 0.562 0.452
Labour 202 10,269,076 0.311 0.333
Scottish Nationalist 48 1,242,380 0.074 0.040
Liberal Democrats 11 3,696,423 0.017 0.120
Democratic Unionists 8 244,128 0.012 0.008
Sinn Fein 7 181,853 0.011 0.006
Plaid Cymru 4 153,265 0.006 0.005
Social Democratic and Labour 2 118,737 0.003 0.004
Green 1 835,579 0.002 0.027
Alliance 1 134,115 0.002 0.004
Speaker 1 26,831 0.002 0.001

Colors

The political parties have colors, so it would be nice to show their party colors as a background to their names, meaning we should change the background colors of the party column. It might also be nice to highlight the maximum results in a light gray. Maybe we can get really clever and add a bar chart representing the number of seats won. Here's the code to do all that:

styles = [dict(selector='.col1', 
               props=[('width', '50px')])]

def colors(value):
    partymap = {'Conservative': 'lightblue',
                'Labour': 'salmon',
                'Scottish Nationalist' : 'yellow',
                'Liberal Democrats': 'orange',
                'Democratic Unionists': 'orange',
                'Sinn Fein': 'lightgreen' ,
                'Plaid Cymru': 'lightgreen',
                'Social Democratic and Labour': 'salmon',
                'Green' : 'lightgreen',
                'Alliance': 'orange',
                'Speaker': 'lightgray'}
    return """background-color: {0}""".format(
        partymap[value])

parliament.style
          .format(
            {"MPs frac":"{:.3f}",
             "votes frac":"{:.3f}",
             "votes": "{:,}"})
          .set_properties(**{'text-align': 'right',
                             'padding':'0 15px'})
          .bar(subset=['MPs'], color='lightgray')
          .set_table_styles(styles)
          .applymap(colors, 
                    subset=['party'])
          .highlight_max(color='#F8F9F9')
          .hide_index().render()

Here's what this code does:

  • bar takes the data in the column and draws a bar chart based on it. It uses the full width of the column and expands the column if necessary, hence my need to restrict the column width to get the table to fit on the Blogger page correctly.
  • The party background colors are created with applymap method using the colors function applied to just the party column using the subset argument.
  • The maximum value highlighting I do with the highlight_max built-in method and I highlight the cells a very light gray. 
  • The method set_table_styles restricts the width of the MPs column so the page renders on Blogger; it uses a CSS selector to do it, and of course you could use the same approach for fine grain formatting using CSS. 
  • The subset argument restricts formatting to just the specified columns.

Here's what the final results look like:

party MPs votes MPs frac votes frac
Conservative 365 13,966,565 0.562 0.452
Labour 202 10,269,076 0.311 0.333
Scottish Nationalist 48 1,242,380 0.074 0.040
Liberal Democrats 11 3,696,423 0.017 0.120
Democratic Unionists 8 244,128 0.012 0.008
Sinn Fein 7 181,853 0.011 0.006
Plaid Cymru 4 153,265 0.006 0.005
Social Democratic and Labour 2 118,737 0.003 0.004
Green 1 835,579 0.002 0.027
Alliance 1 134,115 0.002 0.004
Speaker 1 26,831 0.002 0.001

Commentary

It's nice that Pandas has this functionality, and it's nice that it's as extensive as it is, but there's a problem. The way style is implemented is inconsistent and hard to understand, for example, some but not all of the string formatters work, and there are two methods that do very similar things (set_table_styles and set_properties). In practice, it takes more time and it's harder than it needs to be to get good results. The code looks ungainly too.  It is what it is for now.

Next steps

You can do some other clever things with style, like apply heatmaps, or apply clever conditional table formatting. You can really make your data output standout, but be careful, you can go overboard! To find out more, read the Pandas dataframe style documentation

No comments:

Post a Comment