‘I need the data every Monday morning’
Companies need regular reporting on what’s going on in their business. This could be reporting for the executive team (e.g., weekly sales data, support tickets), board reporting (e.g., churn), or external reporting (e.g., financial metrics). My contention in this blog post is simple: many companies have chosen a slow and expensive way of delivering these reports and that cheaper and faster reports are possible, especially with AI.
Let’s start by defining the problems a bit better
‘Bring me solutions, not problems’
Let’s take a couple of real-world examples and look at some of the problems that apply regardless of the solution.
- The Finance team wants to see sales figures.
- The Support team and the CTO want to see support ticket numbers broken down by category.
- The CEO wants churn numbers and customer usage data.
Typically, these reports are needed for weekly exec team meetings and they’re wanted as numbers, a chart, and % increases or decreases.
The first problem is definitional. Let’s take something that sounds simple: number of customers. Do we count:
- Academic customers and student usage?
- Customers who are in arrears who haven’t been terminated yet?
- Freemium (free usage) customers?
- Customers who have just signed up and haven’t been activated yet?
This gets even more complicated because Finance, Sales, Marketing, and Engineering may have differing definitions of customers for different purposes. In an ideal world, there would be one definition, but we don’t live in an ideal world.
The second problem is data location and access. Let’s take churn as an example. The customer data might be held in Salesforce, but the payment data might be in an accounting system, and licensing data might be in another system. Even after we’ve sorted out the definition of churn, we need to access the data from multiple systems.
The third problem is data preparation and presentation. The finance team love spreadsheets, so ideally, they want their data in spreadsheets. Others want to see charts and numbers. A typical exec team meeting might use data in spreadsheets and charts.
Let’s turn now to a common way of meeting these needs
‘I’ll have what she’s having’
Here’s what I’ve seen in some big companies and in some startups.
The organization hires data engineers to create data feeds from the organization’s data. These data feeds will combine data from various sources to provide the needed metrics and will provide data on a daily, weekly, or monthly basis. For example, the feed might contain churn data, customer counts, etc. The data engineer will figure out how to extract the data (usually via an API) from each source.
The organization will also hire a BI analyst to create reports from the data feed. The report is usually a BI app with a couple of charts (maybe with a trend line) and some numbers. Often, the app has good data interactivity, maybe allowing the user to drill down into the data.
A group of people might work on the metric definitional problem, for example the BI analyst, the data engineer, a subject matter expert, and maybe an executive sponsor.
This approach is slow, expensive, unsatisfactory, and in many cases, overkill.
In most cases, the BI work can’t begin until the data feed has been finished, and of course the data feed work won’t start until the metric has been defined properly.
Despite all these people involved, there’s often missing automation. The weekly exec summary report might involve manually extracting numbers from different reports and pasting the numbers into a spreadsheet. Automating the last piece is often a big problem because none of the people in the process have ownership or the right skills. For example, how many BI analysts know how to paste a value into a spreadsheet via an API call?
This approach requires licensing a BI tool ($), running a BI server ($), and maintain security and access to those servers and apps ($). Each BI analyst consumes a ‘creator’ license ($$) and each user typically consumes a user license ($). Often, the BI apps are not under source control and don’t follow engineering development processes.
Lastly, there’s something that I’m really sad to admit. Most BI app users never use the app to explore data. It’s hard for me to admit because I used to be a champion for apps with all the bells and whistles that would enable users to drill down into the data. But my disappointing experience is, most (but not all) users really wanted a static report and not much else. Sales departments sometimes do want more detailed breakdowns, but at the exec level, they want simple reports that tell a clear story. (Of course, if you’re a big bank, or a large manufacturer, building BI apps with drill down capability makes sense, but then you’re mostly building for analysts and not execs.)
‘Do better’
So how can we do better? Here’s my summary.
- Combine the BI engineer and the data engineer into a single data analyst role. The analyst takes on the whole process.
- Replace BI tools with static reports output to the company intranet.
In practice, this means:
- Querying the disparate data sources to produce the needed metrics using Python.
- Using Python to create static reports pasted onto the company intranet.
- Using Python to paste numbers directly into spreadsheets, automating the last stage of the reporting process.
Data analysts are typically familiar with APIs and can learn new ones fairly easily. Pasting values into a spreadsheet is easy via an API, provided that the spreadsheet is at the same accessible location all the time.
Static reports are easy to produce with Python, they just require the use of visualization libraries (e.g. Matplotlib, Bokeh, Vega). Most companies have some form of intranet (e.g., Confluence) and in all cases, these are available via an API. So, the analyst can output their results to a static report on the company intranet.
Data analysts know how to access data via APIs, so retrieving data is straightforward.
Because we’ve combined the BI analyst and the date engineer into a single role we can move faster. A data analyst can build a report based on an incomplete definition and share it for comments. A definitional change means changing one piece of code instead of a data feed and a BI app.
Because everything is in Python in this model, we can use standard engineering development practices to raise quality and improve repeatability.
The data analyst here must be a true data analyst with a good understanding of stats and a great grasp of Python and SQL. Could a data engineer do this role? Yes, with a small amount of training. Could a BI analyst do this role? Mostly no, the skills gap is too great.
Of course, this approach doesn’t solve all problems, but it does let you move faster and more cheaply.
‘AI’
I’ve had conversations with numerous people who think that AI is going to bring a nuclear winter to BI employment. Once you have the data, AI can automate most of the BI visualization process, reducing the BI workload and hence the need for so many BI analysts. However, even with AI, you still have the cost of the BI server and licensing.
In my data analyst model, there’s a huge role for AI. In my experience, AI code gen knows about the Confluence etc. API, and it can help with Bokeh and other visualizations. It’s a productivity boost but without the need to spend money on servers and licenses.
‘Open issues’
The natural home for these kinds of data analysts is in the engineering organization because of the skill set match. But that might be politically hard.
Access to web pages can sometimes cause teething problems. Execs might not be permissioned to see certain pages and of course if may be necessary to restrict access to certain reports.
The biggest problem is probably cultural. Removing BI tools represents a very radical step for many organizations and may be a step too far.
‘The bottom line’
If you want faster and cheaper internal reporting, you need to re-think your whole reporting processes more along engineering lines. I’m advocating using a true data analyst to use Python to produce static reports and paste data into spreadsheets. My process allows you to get the benefits of AI code gen and the benefits of the engineering process.
No comments:
Post a Comment