Tuesday, January 28, 2020

Future directions for Python visualization software

The Python charting ecosystem is highly fragmented and still lags behind R, it also lacks some of the features of paid-for BI tools like Tableau or Qlik. However, things are slowly changing and the situation may be much better in a few years' time.



Theoretically, the ‘grammar of graphics’ approach has been a substantial influence on visualization software. The concept was introduced in 1999 by Leland Wilkinson in a landmark book and gained widespread attention through Hadley Wickham’s development of ggplot2  The core idea is that a visualization can be represented as different layers within a framework, with rules governing the relationship between layers. 

Bokeh was influenced by the 'grammar of graphics' concept as were other Python charting libraries. The Vega project seeks to take the idea of the grammar of graphics further and creates a grammar to specify visualizations independent of the visualization backend module. Building on Vega, the Altair project is a visualization library that offers a different approach from Bokeh to build charts. It’s clear that the grammar of graphics approach has become central to Python charting software.

If the legion of charting libraries is a negative, the fact that they are (mostly) built on the same ideas offers some hope for the future. There’s a movement to convergence by providing an abstraction layer above the individual libraries like Bokeh or Matplotlib. In the Python world, there’s precedence for this; the database API provides an abstraction layer above the various Python database libraries. Currently, the Panel project and HoloViews are offering abstraction layers for visualization, though there are discussions of a more unified approach.

My take is, the Python world is suffering from having a confusing array of charting library choices which splits the available open-source development efforts across too many projects, and of course, it confuses users. The effort to provide higher-level abstractions is a good idea and will probably result in fewer underlying charting libraries, however, stable and reliable abstraction libraries are probably a few years off. If you have to produce results today, you’re left with choosing a library now.

The big gap between Python and BI tools like Tableau and Qlik is the ease of deployment and speed of development. BI tools reduce the skill level to build apps, deploy them to servers, and manage tasks like access control. Projects like Holoviews may evolve to make chart building easier, but there are still no good, easy, and automated deployment solutions. However, some of the component parts for easier deployment exist, for example, Docker, and it’s not hard to imagine the open-source community moving its attention to deployment and management once the various widget and charting issues of visualization have been solved.

Will the Python ecosystem evolve to be as good as R’s and be good enough to take on BI tools? Probably, but not for a few years. In my view, this evolution will happen slowly and in public (e.g. talks at PyCon, SciPy etc.). The good news for developers is, there will be plenty of time to adapt to these changes.

1 comment: