Monday, September 15, 2025

A better approach to internal reporting

‘I need the data every Monday morning’

Companies need regular reporting on what’s going on in their business. This could be reporting for the executive team (e.g., weekly sales data, support tickets), board reporting (e.g., churn), or external reporting (e.g., financial metrics). My contention in this blog post is simple: many companies have chosen a slow and expensive way of delivering these reports and that cheaper and faster reports are possible, especially with AI.

(Canva)

Let’s start by defining the problems a bit better

‘Bring me solutions, not problems’

Let’s take a couple of real-world examples and look at some of the problems that apply regardless of the solution.

  • The Finance team wants to see sales figures.
  • The Support team and the CTO want to see support ticket numbers broken down by category.
  • The CEO wants churn numbers and customer usage data.

Typically, these reports are needed for weekly exec team meetings and they’re wanted as numbers, a chart, and % increases or decreases.

The first problem is definitional. Let’s take something that sounds simple: number of customers. Do we count:

  • Academic customers and student usage?
  • Customers who are in arrears who haven’t been terminated yet?
  • Freemium (free usage) customers?
  • Customers who have just signed up and haven’t been activated yet?

This gets even more complicated because Finance, Sales, Marketing, and Engineering may have differing definitions of customers for different purposes. In an ideal world, there would be one definition, but we don’t live in an ideal world.

The second problem is data location and access. Let’s take churn as an example. The customer data might be held in Salesforce, but the payment data might be in an accounting system, and licensing data might be in another system. Even after we’ve sorted out the definition of churn, we need to access the data from multiple systems.

The third problem is data preparation and presentation.  The finance team love spreadsheets, so ideally, they want their data in spreadsheets. Others want to see charts and numbers. A typical exec team meeting might use data in spreadsheets and charts.

Let’s turn now to a common way of meeting these needs

‘I’ll have what she’s having’

Here’s what I’ve seen in some big companies and in some startups.

The organization hires data engineers to create data feeds from the organization’s data. These data feeds will combine data from various sources to provide the needed metrics and will provide data on a daily, weekly, or monthly basis. For example, the feed might contain churn data, customer counts, etc. The data engineer will figure out how to extract the data (usually via an API) from each source.

The organization will also hire a BI analyst to create reports from the data feed. The report is usually a BI app with a couple of charts (maybe with a trend line) and some numbers. Often, the app has good data interactivity, maybe allowing the user to drill down into the data. 

A group of people might work on the metric definitional problem, for example the BI analyst, the data engineer, a subject matter expert, and maybe an executive sponsor.

This approach is slow, expensive, unsatisfactory, and in many cases, overkill.

In most cases, the BI work can’t begin until the data feed has been finished, and of course the data feed work won’t start until the metric has been defined properly.

Despite all these people involved, there’s often missing automation. The weekly exec summary report might involve manually extracting numbers from different reports and pasting the numbers into a spreadsheet. Automating the last piece is often a big problem because none of the people in the process have ownership or the right skills. For example, how many BI analysts know how to paste a value into a spreadsheet via an API call?

This approach requires licensing a BI tool ($), running a BI server ($), and maintain security and access to those servers and apps ($). Each BI analyst consumes a ‘creator’ license ($$) and each user typically consumes a user license ($). Often, the BI apps are not under source control and don’t follow engineering development processes. 

Lastly, there’s something that I’m really sad to admit. Most BI app users never use the app to explore data. It’s hard for me to admit because I used to be a champion for apps with all the bells and whistles that would enable users to drill down into the data. But my disappointing experience is, most (but not all) users really wanted a static report and not much else. Sales departments sometimes do want more detailed breakdowns, but at the exec level, they want simple reports that tell a clear story. (Of course, if you’re a big bank, or a large manufacturer, building BI apps with drill down capability makes sense, but then you’re mostly building for analysts and not execs.)

‘Do better’

So how can we do better? Here’s my summary.

  • Combine the BI engineer and the data engineer into a single data analyst role. The analyst takes on the whole process.
  • Replace BI tools with static reports output to the company intranet.

In practice, this means:

  • Querying the disparate data sources to produce the needed metrics using Python.
  • Using Python to create static reports pasted onto the company intranet.
  • Using Python to paste numbers directly into spreadsheets, automating the last stage of the reporting process.

Data analysts are typically familiar with APIs and can learn new ones fairly easily. Pasting values into a spreadsheet is easy via an API, provided that the spreadsheet is at the same accessible location all the time.

Static reports are easy to produce with Python, they just require the use of visualization libraries (e.g. Matplotlib, Bokeh, Vega). Most companies have some form of intranet (e.g., Confluence) and in all cases, these are available via an API. So, the analyst can output their results to a static report on the company intranet. 

Data analysts know how to access data via APIs, so retrieving data is straightforward.

Because we’ve combined the BI analyst and the date engineer into a single role we can move faster. A data analyst can build a report based on an incomplete definition and share it for comments. A definitional change means changing one piece of code instead of a data feed and a BI app. 

Because everything is in Python in this model, we can use standard engineering development practices to raise quality and improve repeatability.

The data analyst here must be a true data analyst with a good understanding of stats and a great grasp of Python and SQL. Could a data engineer do this role? Yes, with a small amount of training. Could a BI analyst do this role? Mostly no, the skills gap is too great.

Of course, this approach doesn’t solve all problems, but it does let you move faster and more cheaply.

‘AI’

I’ve had conversations with numerous people who think that AI is going to bring a nuclear winter to BI employment. Once you have the data, AI can automate most of the BI visualization process, reducing the BI workload and hence the need for so many BI analysts. However, even with AI, you still have the cost of the BI server and licensing.

In my data analyst model, there’s a huge role for AI. In my experience, AI code gen knows about the Confluence etc. API, and it can help with Bokeh and other visualizations. It’s a productivity boost but without the need to spend money on servers and licenses.  

‘Open issues’

The natural home for these kinds of data analysts is in the engineering organization because of the skill set match. But that might be politically hard.

Access to web pages can sometimes cause teething problems. Execs might not be permissioned to see certain pages and of course if may be necessary to restrict access to certain reports.

The biggest problem is probably cultural. Removing BI tools represents a very radical step for many organizations and may be a step too far.

‘The bottom line’

If you want faster and cheaper internal reporting, you need to re-think your whole reporting processes more along engineering lines. I’m advocating using a true data analyst to use Python to produce static reports and paste data into spreadsheets. My process allows you to get the benefits of AI code gen and the benefits of the engineering process.

Wednesday, September 10, 2025

Building server-less web apps

Engaging content without tears

Interactive charts and reports are often more engaging and informative than a static chart or report. An interactive report might let you zoom in and out of a chart, drill down into the data to get insight, or investigate different behaviors or settings. A great example of an interactive app is "Probability Playground" (https://www.acsu.buffalo.edu/~adamcunn/probability/lognormal.html).

The problem with building something interactive is needing a backend server. This could be a Tableau server or a database or an API server. These backends take time and money to set up and represent an ongoing cost.

Is it possible to build a web app without the use of a server? How far can you go down the path of building an interactive app without needing a server? Let's find out.

(The finished app)

Project goals

I wanted to build a self-contained interactive web app that showed relatively complex data using multiple plots and widgets. The whole app was to be served with a "vanilla" web server and downloaded in one go; no backend server providing data. The data set was to be several MB, but under 100 MB to keep load times down (for reference, my browser reports the BBC News home page is 147 MB today).

This means the whole app must fit into a single page of HTML, including all CSS and JavaScript. It can be a long HTML page, several MB long, but nonetheless, it should be a single page.

The other main requirement was development speed. I wanted something that could be built within a few days.

Finally, I wanted something I could build mostly in Python. I don't know enough JavaScript to build an entire app, but I can create a few JavaScript functions.

Here's one I prepared earlier

Here's how the project came out: https://blog.engora.com/2025/09/where-does-government-money-go-in-us.html. This app has two main charts, but the second one ("Entity drill down") has a drill down through four levels of data; click on the bars to see.  The charts have some novel controls as I'll explain in the text.

The data set

Triggered by a Reddit discussion, I used data from the USA Spending website (https://www.usaspending.gov/). This is a US federal government website that provides details on how the US government spent money over the years. The data is broken down by year, agency, state, etc.

The first task was downloading the data. Not easy. I had to reverse engineer the API and deal with a number of corner cases. Once downloaded, the data turned out to be hierarchical. For agencies, it was structured as agency - federal account - program activities - object classes - recipients. I didn't use the recipients data as it made the project too large.

To keep this proof-of-concept simple, I used just the agency (and its associated hierarchy) data.

Designing the dashboard

I didn't want to wing it, so I planned out what the app was going to do using a pen and paper. I had a clear idea of what I wanted to build and how it was to look. Importantly, this included the controls I wanted. 

I knew I wanted to look at agency spend over time, but there were over 100 agencies, far too many for one line chart. This means I needed a widget to select which agencies to show.

I wanted some way of showing spending by hierarchy. The hierarchy is agency - federal account - program activity - object class, so I needed some way of drilling down the hierarchy and coming back up. The obvious drill down choice is a bar chart with interactive bars - tap the bar to go down the hierarchy. You can't go back up this way, so I needed something like a row of buttons to go back up different levels of hierarchy - all the way back from object class to agency.

Lastly, I wanted tooltips (to show data when you hover over a chart point or bar) and the ability to zoom in move around the data.

Implementation choices

Let's go back to my requirements:

  • Single HTML page with no server.
    • That means the app must run in JavaScript + CSS + HTML.
  • Interactive charts.
  • Development mostly in Python.

This takes us down to a handful of choices, including Altair-Viz and Bokeh. Both of them offer the ability to have tooltips and to zoom into the chart etc. Both of them offer a limited set of control widgets.

Altair-Viz is based on Vega/Vega-Lite, which is implemented in JavaScript and is very much web-native. It has all of the controls I want to implement and more. Unfortunately, I've had some difficulties getting Altair-Viz visualizations to work and I've come across a number of bugs. I wasn't confident I could complete the entire project with it.

I know Bokeh works 100% in a browser and development is mostly in Python. It does lack some of the controls I want to implement, specifically, a mini-bar chart range selector, but my feeling was I could build something that works.

I went with the pragmatic choice: Bokeh.

The development process

The first part was straightforward.

  • I reformatted the downloaded US federal spending data to add the fields I needed.
  • I built the "baseline" charts and the "baseline" controls.

It got much more difficult from there. 

The mini-bar chart selector isn't a standard Bokeh widget. To build it, I needed to write some difficult JavaScript and I had to do some hacking. The problem is, getting a range control to work with a factor axis. The hack is to create an intermediate chart, not shown to the user. The solution isn't elegant, but it works and the hack isn't visible to users.

The bar chart on "Entity drill down" was much more of a challenge. I wanted to create a drill down by tapping on a bar chart bar. For example, if the user taps on an agency, that should bring up the underlying federal accounts. If they tap on a federal account, that should bring up program activities, and so on. To go back up the hierarchy, I created a row of "go back" buttons, offering the ability to go back to different levels of the hierarchy. In other words, whenever the user clicks on a bar or on one of the "go back" buttons, a whole new data set has to be loaded.

The year selector on "Entity drill down" posed a similar challenge. Entities do not spend the same amount of money year to year and some entities only exist for a short period. The bottom line is, changing the year means a new data set has to be loaded.

To solve the new data set problem, I created a "central" JavaScript function that had the underlying JSON. The function selected the appropriate subset of data and updated the charts. The Bokeh widget callbacks then called this central function. It's not documented how to create a central function, but I figured it out from Bokeh Discourse discussions.

Cybersecurity

Before I tell you where it all ended up, I thought I would throw in something about cybersecurity. Every  cloud service a company uses adds to its attack surface and is another service to secure. The nice thing about this work is, it's just a vanilla web page and can be served from any web server, including existing ones. Most companies have a web server that serves up marketing content or a public blog, which would work just fine for this approach. In other words, you gain interactive web apps without increasing your attack surface.

Where it ended up

I got my proof of concept working as a you can see here: https://blog.engora.com/2025/09/where-does-government-money-go-in-us.html. This is a more extensive and ambitious example than my previous experiment you can see here: https://blog.engora.com/2025/08/english-league-football-mini-app.html. The US government app fits in a single file that takes up 10 MB of disk space on my machine.

What works

  • I built the app according to my spec and I managed to get all the features and controls in.
  • It fits in a single file and operates entirely in the user's browser. No backend server needed.
  • Most of the development work was in Python, but a manageable proportion was in JavaScript.
  • The app consumes under 10 MB of space.
  • I managed to build the app in a few days. It would have been faster with fewer hacks.
  • Using Cursor code gen.

What doesn't work

  • The drill down functionality isn't as smooth as I'd like it. 
  • The page doesn't load as smoothly and as quickly as I'd like it to.
  • The hack to make the mini-bar chart selector is just that. It costs more time to develop than I would like.
  • Debugging JavaScript was time consuming and awkward.
  • The range slider always goes to the center if you change the data source (e.g. by clicking a bar on the bar chart). I know why this happens and I could fix it.

Overall

The US government data set is complex, with four levels of hierarchy. In terms of hierarchical data, this is about as complex as I'd care to get with this approach. By contrast, there were over 100 agencies, and this caused no problems at all. The completed HTML file came in at 10 MB which is about as large as I'd go for performance reasons. However, with some thoughtful data management and loading, I could probably handle a larger data set. My approach works with this data set, and I'm confident it would work with simpler data sets too. There's no reason why I couldn't extend this to different chart types and have many charts in an app.

It is possible to build a relatively sophisticated app that works entirely in the user's browser, no server required. There are problems and it's not smooth sailing, but it can be done in a reasonably short space of time.

Future enhancements

Better data formatting. The data format wasn't ideal. I should have restructured the data to allow for faster selection. Using more compact field names would have helped bring the data size down.

Bar chart taps. This still isn't quite right. It probably needs a re-think and some experimentation with a smaller data set.

Faster and smoother loading. I have a number of thoughts for how to do this, mostly centered around preventing the page from blocking while waiting for libraries, scripts, and data to load. These ideas include:

  • Allowing the non-Bokeh elements of the page to load before everything else.
  • Loading the data separately from the Bokeh charts.

These might break my "one page" rule, but it would still be a "server-less" solution.

A timing analysis using a service like WebPageTest might be a good starting place.

Would you like something like this for your business?

Would you like to create an engaging server-less web app to show your company's data and to generate leads? If you would, drop me a line.

Where does federal government money go In the US?

This app shows you some of the details of US federal government spending since 2017. Click on the tabs to see different presentations of the data.

(Canva)