Tuesday, October 6, 2020

Faster Python BI app development through code generation

Back to the future: design like it's 1999

Back in 1999, you could build Visual Basic apps by dragging and dropping visual components (widgets) onto a canvas. The Visual Basic IDE handled all the code generation, leaving you with the task of wiring up your new GUI to your business data. It wasn't just Visual Basic though, you could do the same thing with Visual C++ and other Microsoft versions of languages. The generated code wasn't the prettiest, but it worked, and it meant you could get the job done quickly.

(Microsoft Visual Basic. Image credit: Microsoft.)

Roll forward twenty years. Python is now very popular and people are writing all kinds of software using it, including software that needs UIs. Of course, the UI front-end is now the browser, which is another change. Sadly, nothing like the UI-building capabilities of the Microsoft Visual Studio IDE exists for Python; you can't build Python applications by dragging and dropping widgets onto a canvas.

Obviously, BI tools like Tableau and Qlik fulfill some of the need to quickly build visualization tools; they've inherited the UI building crown from Microsoft. Unfortunately, they run out of steam when the analysis is complex; they have limited statistical capabilities and they're not good as general-purpose programming languages.

If your apps are 'simple', obviously, Tableau or Qlik are the way to go. But what happens if your apps involve more complex analysis, or if you have data scientists who know Python but not Tableau?

What would it take to make a Visual Basic or Tableau-like app builder for Python? Could we build something like it?

Start with the end in mind

The end goal is to have a drag-and-drop interface that looks something like this.

(draw.io. Image credit: draw.io.)

On the left-hand side of the screenshot, there's a library of widgets the user can drag and drop onto a canvas. 

Ideally, we'd like to be able to design a multi-tabbed application and move widgets onto each tab from a library. We'd do all the visualization layout on the GUI editor and maybe set up some of the properties for the widgets from the UI too. For example, we might set up the table column names, or give a chart a title and axis titles. When we're done designing, we could press a button and generate outline code that would create an application with the (dummy) UI we want.

A step further would be to import existing Python code into the UI editor and move widgets from tab to tab, add new widgets, or delete unwanted widgets.

Conceptually, all the technology to do this exists right now, just not in one place. Unfortunately, it would take considerable effort to produce something like it. 

If we can't go all the way, can we at least go part of the way?

A journey of a thousand miles begins with a single step

A first step is code generation from a specification. The idea is simple: you define your UI in a specification file that software uses to generate code. 

For this first simple step (and the end goal), there are two things to bear in mind:

  • Almost all UI-based applications can be constructed using a Model-View-Controller architecture (pattern) or something that looks like it.
  • Python widgets are similar and follow well-known rules. For example, the widgets in Bokeh follow an API; a button follows certain rules, a dropdown menu follows certain rules and so on.

Given that there are big patterns and small patterns here, we could use a specification file to generate code for almost all UI-based applications.

I've created software that does this, and I'm going to tell you about it.

JSON and the argonauts

Here's an overview of how my code generation software works.

  • The Model-View-Controller code exists as a series of templates, with key features added at code generation time.
  • The application is specified in a JSON file. The JSON file contains details of each tab in the application, along with details of the widgets on the tab. The JSON file must follow certain rules; for example, no duplicate names.
  • Most of the rules for code generation are in a JSON schema file that contains details for each Bokeh widget. For example, the JSON schema has rules for how to implement a button, including how to create a callback function for a button.

Here's how it works in practice.

  1. The user creates a specification file in JSON. The JSON file has details of:
    • The overall project (name, copyright, author etc.)
    • Overall data for each tab (e.g. name of each tab and a description of what it does).
    • For each tab, there's a specification for each widget, giving its name, its argument, and a comment on what it does.
  2. The system checks the user's JSON specification file for consistency (well-formed JSON etc.)
  3. Using a JSON schema file that contains the rules for constructing Bokeh widgets, the system generates code for each Bokeh widget in the specification.
    • For each widget that could have a callback, the system generates the callback code.
    • For complex widgets like DataTable and FileInput, the system generates skeleton example code that shows how to implement the widget. In the DataTable case, it sets up a dummy data source and table columns.
  4. The system then adds the generated code to the Model-View-Controller templates and generates code for the entire project.
    • The generated code is PEP8 compliant by design.
The generated code is runnable, so you can test out how the UI looks.

Here's an excerpt from the JSON schema defining the rules for building widgets:

            "allOf":[

                    {

                      "$comment":"███ Button ███",

                      "if":{

                        "properties":{

                          "type":{

                            "const":"Button"

                          }

                        }

                      },

                      "then":{

                        "properties":{

                          "name":{

                            "$ref":"#/definitions/string_template_short"

                          },

                          "description":{

                            "$ref":"#/definitions/string_template_long"

                          },

                          "type":{

                            "$ref":"#/definitions/string_template_short"

                          },

                          "arguments":{

                            "type":"object",

                            "additionalProperties":false,

                            "required":[

                              "label"

                            ],

                            "properties":{

                              "label":{

                                "type":"string"

                              },

                              "sizing_mode":{

                                "type":"string",

                                "default":"stretch_width"

                              },

                              "button_type":{

                                "type":"string",

                                "default":"success"

                              }

                            }

                          },

Here's an excerpt from the JSON file defining an application's UI:

{

      "name":"Manage data",

      "description":"Panel to manage data sources.",

      "widgets":[

        {

          "name":"ECV year allocations",

          "description":"Displays the Electoral College Vote allocations by year.",

          "type":"TextInput",

          "disabled":true,

          "arguments":{

            "title":"Electoral College Vote allocations by year in system",

            "value":"No allocations in system"

          }

        },

        {

          "name":"Election results",

          "description":"Displays the election result years in the system.",

          "type":"TextInput",

          "disabled":true,

          "arguments":{

            "title":"Presidential Election results in system",

            "value":"No allocations in system"

          }

What this means in practice

Using this software, I can very rapidly prototype BI-like applications. The main task left is wiring up the widgets to the business data in the Model part of the Model-View-Controller architecture. This approach reduces the tedious part of UI development but doesn't entirely eliminate it. It also helps with widgets like DataTable that require a chunk of code to get them working - this software generates most of that code for you.

How things could be better

The software works, but not as well as it could:

  • It doesn't do layout. Laying out Bokeh widgets is a major nuisance and a time suck. 
  • The stubs for Bokeh DataTable are too short - ideally, the generated code should contain more detail which would help reduce the need to write code.
  • The Model-View-Controller architecture needs some cleanup.

The roadmap

I have a long shopping list of improvements:

  • Better Model-View-Controller
  • Robust exception handling in the generated code
  • Better stubs for Bokeh widgets like DataTable
  • Automatic Sphinx documentation
  • Layout automation

Is it worth it?

Yes and no.

For straightforward apps, it will still be several times faster to write apps in Tableau or Qlik. But if the app requires more statistical firepower, or complex analysis, or linkage to other systems, then Python wins and this approach is worth taking. If you have access to Python developers, but not Tableau developers, then once again, this approach wins.

Over the longer term, regardless of my efforts, I can clearly see Python tools evolving to the state where they can compete with Qlik and Tableau for speed of application development.

Maybe in five years' time, we'll have all of the functionality we had 25 years ago. What's old is new again.

1 comment:

  1. I just saw your talk about USA elections, great talk !! I am only commenting, you can try https://panel.holoviz.org/, it's built on top of bokeh and it has the same power and speed development than tableau...

    ReplyDelete