Monday, November 24, 2025

Caching and token reduction

This is a short blog post to share some thoughts on how to reduce AI token consumption and improve user response times.

I was at the AI Tinkerers event in Boston and I saw a presentation on using AI report generation for quant education. The author was using a generic LLM to create multiple choice questions on different themes. Similarly, I've been building an LLM system that produces a report based on data pulled from the internet. In both cases, there are a finite number of topics to generate reports on. My case was much larger, but even so, it was still finite.

The obvious thought is, if you're only generating a few reports or questions & answers, why not generate them in batch? There's no need to keep the user waiting and of course, you can schedule your LLM API calls in the middle of the night when there's less competition for resources.

(Canva)

In my case, there are potentially thousands of reports, but some reports will be pulled more often than others. A better strategy in my case is something like this:

Take a guess at the most popular reports (or use existing popularity data) and generate those reports overnight (or at a time when competition for resources is low). Cache them.
If the user wants a report that's been cached, return the cached copy.
If the user wants an uncached report:

Tell the user there will be a short wait for the LLM
Call the LLM API and generate the report
Display the report
Cache the report

For each cached report, record the LLM and it's creation timestamp.

You can start to do some clever things here like refresh the reports every 30 days or when the LLM is upgraded etc.

I know this isn't rocket science, but I've been surprised how few LLM demos I've seen use any form of batch processing and caching.

Monday, November 17, 2025

Data scientists need to learn JavaScript

Moving quickly

Over the last few months, I've become very interested in rapid prototype development for data science projects. Here's the key question I asked myself: how can a data scientist build their own app as quickly as possible? Nowadays, speed means code gen, but that's only part of the solution.

The options

The obvious quick development path is using Streamlit; that doesn't require any new skills because it's all in Python. Streamlit is great, and I've used it extensively, but it only takes you so far and it doesn't really scale. Streamlit is really for internal demos, and it's very good at that.

The more sustainable solution is using Django. It's a bigger and more complex beast, but it's scalable. Django requires Python skills, which is fine for most data scientists. Of course, Django apps are deployed on the web and users access them as web pages.

The UI is one place code gen breaks down under pressure

Where things get tricky is adding widgets to Django apps. You might want your app to take some action when the user clicks a button, or have widgets controlling charts etc. Code gen will nicely provide you with the basics, but once you start to do more complicated UI tasks, like updating chart data, you need to write JavaScript or be able to correct code gen'd JavaScript.

(As an aside, for my money, the reason why a number of code gen projects stall is because code gen only takes you so far. To do anything really useful, you need to intervene, providing detailed guidance, and writing code where necessary. This means JavaScript code.)

JavaScript != Python

JavaScript is very much not Python. Even a cursory glance will tell you the JavaScript syntax is unlike Python. More subtly, and more importantly, some of the underlying ideas and approaches are quire different. The bottom line is, a Python programmer is not going to write good enough JavaScript without training.

To build even a medium complexity data science app, you need to know how JavaScript callbacks work, how arrays work, how to debug in the browser, and so on. Because code gen is doing most of the heavy lifting for you, you don't need to be a craftsman, but you do need to be a journeyman.

What data scientists need to do

The elevator pitch is simple:

If you want to build a scalable data science app, you need to use Django (or something like it).
To make the UI work properly, code gen needs adult supervision and intervention.
This means knowing JavaScript.

(Data Scientist becoming JavaScript programmer. Gemini.)

In my view, all that's needed here is a short course, a good book, and some practice. A week should be enough time for an experienced Python programmer to get to where they need to be.

What skillset should data scientists have?

AI is shaking everything up, including data science. In my view, data scientists will have to do more than their "traditional" role. Data scientists who can turn their analysis into apps will have an advantage.

For me, the skillset a data scientist will need looks a lot like the skillset of a full-stack developer. This means data scientists knowing a bit of JavaScript, code gen, deployment technologies, and so on. They won't need to be experts, but they will need "good enough" skills.

Wednesday, November 12, 2025

How to rapidly build and deploy data science apps using code gen

Introduction

If you want to rapidly build and deploy apps with a data science team, this blog post is written for you.

(Canva)

I’ve seen how small teams of MIT and Harvard students at the sundai.club in Boston are able to produce functioning web apps in twelve hours. I want to understand how they’re doing it, adapt what they’re doing for business, and create data science heavy apps very quickly. This blog post is about what I’ve learned.

Almost all of the sundai.club projects use an LLM as part of their project (e.g., using agentic systems to analyze health insurance denials), but that’s not how they’re able to build so quickly. They get development speed through code generation, the appropriate use of tools, and the use of deployment technologies like Vercel or Render.

(Building prototypes in 12 hours: the inspiration for this blog post.)

Inspired by what I’ve seen, I developed a pathfinder project to learn how to do rapid development and deployment using AI code gen and deployment tools. My goal was to find out:

The skills needed and the depth to which they’re needed.
Major stumbling blocks and coping strategies.
The process to rapidly build apps.

I'm going to share what I've learned in this blog post.

Summary of findings

Process is key

Rapid development relies on having three key elements in place:

Using the right tools.
Having the right skill set.
Using AI code gen correctly.

Tools

Fast development must use these tools:

AI-enabled IDE.
Deployment platform like Render or Vercel.
Git.

Data scientists tend to use notebooks and that’s a major problem for rapid development; notebook-based development isn’t going to work. Speed requires the consistent use of AI-enabled IDEs like Cursor or Lovable. These IDEs use AI code generation at the project and code block level, and can generate code in different languages (Python, SQL, JavaScript etc,). They have the ability to generate test code, comment code, and make code PEP8 complaint. It’s not just one-off code gen, it’s applying AI to the whole code development process.

(Screen shot of Cursor used in this project.)

Using a deployment platform like Render or Vercel means deployment can be extremely fast. Data scientists don’t have deployment skills, but these products are straightforward enough that some written guidance should be enough.

Deployment platforms retrieve code from Git-based systems (e.g., GitHub, GitLab etc.), so data scientists need some familiarity with them. Desktop tools (like GitHub Desktop) make it easier, but they have to be used, which is a process and management issue.

Skillsets and training

The skillset needed is the same as a full-stack engineer with a few tweaks, which is a challenge for data scientists who mostly lack some of the key skills. Here are the skillsets, level needed, and training required for data scientists.

Hands-on experience with AI code generation and AI-enabled IDE.

What’s needed:

Ability to appropriately use code gen at the project and code-block levels. This could be with Cursor, Claude Code, or something similar.
Understanding code gen strengths and weaknesses and when not to use it.
Experience developing code using an IDE.

Training:

To get going, an internal training session plus a series of exercises would be a good choice.
At the time of writing, there are no good off-the-shelf courses.

Python

What’s needed:

Decent Python coding skills, including the ability to write functions appropriately (data scientists sometimes struggle here).
Django uses inheritance and function decorators, so understanding these properties of Python is important.
Use of virtual environments.

Training:

Most data scientists have “good enough” Python.
The additional knowledge should come from a good advanced Python book.
Consider using experienced software engineers to train data scientists in missing skills, like decomposing tasks into functions, PEP8 and so on.

SQL and building a database

What’s needed:

Create databases, create tables, insert data into tables, write queries.

Training:

Most data scientists have “good enough” SQL.
Additional training could be a books or online tutorials.

Django

What’s needed:

An understanding of Django’s architecture and how it works.
The ability to build an app in Django.

Training:

On the whole, data scientists don’t know Django.
The training provided by a short course or a decent text book should be enough.
Writing a couple of simple Django apps by hand should be part of the training.
This may take 40 hours.

JavaScript

What’s needed:

Ability to work with functions (including callbacks), variables, and arrays.
Ability to debug JavaScript in the browser.
These skills are needed to add and debug UI widgets. Code generation isn't enough.

Training:

A short course (or a reasonable text book) plus a few tutorial examples will be enough.

HTML and CSS

What’s needed:

A low level of familiarity is enough.

Training:

Tutorials on the web or a few YouTube videos should be enough.

What’s needed:

The ability to use Git-based source control systems.
It's needed because deployment platforms rely on code being on Git.

Training:

Most data scientists have a weak understanding of Git.
A hands-on training course would be the most useful approach.

Code gen is not one-size-fits-all

AI code gen is a tremendous productivity boost and enabler in many areas but not all. For key tasks, like database design and app deployment, AI code gen doesn’t help at all. In other areas, for example, complex database/dataframe manipulations and handling some advanced UI issues, AI helps somewhat but it needs substantial guidance. The AI coding productivity benefit is a range from negative to greatly positive depending on the task.

The trick is to use AI code gen appropriately and provide adult supervision. This means reviewing what AI produces and intervening. It means knowing when to stop prompting and when to start coding.

Recommendations before attempting rapid application development

Make sure your team have the skills I’ve outlined above, either individually or collectively.
Use the right tools in the right way.
Don’t set unreasonable expectation, understand that your first attempts will be slow as you learn.
Run a pilot project or two with loose deadlines. From the pilot project, codify the lessons and ways of working. Focus especially on AI code gen and deployment.

How I learned rapid development: my pathfinder app

For this project, I chose to build an app that analyzes the results of English League Football (soccer) games since the league began in 1888 to the most recently completed season (2024-2025).

The data set is quite large, which means a database back end. The database will need multiple tables.

It’s a very chart-heavy app. Some of the charts are violin plots that need kernel density estimation, and I’ve added curve fitting and confidence intervals on some line plots. That’s not the most sophisticated data analysis, but it’s enough to prove a point about the use of data science methods in apps. Notably, charts are not covered in most Django texts.

(Just one of the plots from my app. Note the year slider at the bottom.)

In several cases, the charts need widgets: sliders to select the year and radio buttons to select different leagues. This means either using ‘native’ JavaScript or libraries specific to the charting tool (Bokeh). I chose to use native JavaScript for greater flexibility.

To get started, I roughly drew out what I wanted the app to look like. This included different themed analysis (trends over time, goal analysis, etc.) and the charts I wanted. I added widgets to my design where appropriate.

The stack

Here’s the stack I used for this project.

Django was the web framework, which means it handles incoming and outgoing data, manages users, and manages data. Django is very mature, and is very well supported by AI code generation (in particular, Cursor). Django is written in Python.

Postgres. “Out of the box”, Django supports SQLite, but Render (my deployment solution) requires Postgres.

Bokeh for charts. Bokeh is a Python plotting package that renders its charts in a browser (using HTML and JavaScript). This makes it a good choice for this project. An alternative is Altair, but my experience is that Bokeh is more mature and more amenable to being embedded in web pages.

JavaScript for widgets. I need to add drop down boxes, radio buttons, sliders, and tabs etc. I’ll use whatever libraries are appropriate, but I want code gen to do most of the heavy lifting.

Render.com for deployment. I wanted to deploy my project quickly, which means I don’t want to build out my own deployment solution on AWS etc., I want something more packaged.

I used Cursor for the entire project.

The build process and issues

Building the database

My initial database format gave highly complicated Django models that broke Django’s ORM. I rebuilt the database using a much simpler schema. The lesson here is to keep the database reasonably close to the format in which it will be displayed.

My app design called for violin plots of attendance by season and by league tier. This is several hundred plots. Originally, I was going to calculate the kernel density estimates for the violin plots at run time, but I decided it would slow the application down too much, so I calculated them beforehand and saved them to a database table. This is a typical trade-off.

For this part of the process, I didn’t find code generation useful.

The next stage was uploading my data to the database. Here, I found code generation very useful. It enabled me to quickly create a Python program to upload data and check the database for consistency.

Building Django

Code gen was a huge boost here. I gave Cursor a markdown file specifying what I wanted and it generated the project very quickly. The UI wasn’t quite what I wanted, but by prompting Cursor, I was able to get it there. It let me create and manipulate dropdown boxes, tabs, and widgets very easily – far, far faster than hand coding. I did try and create a more detailed initial spec, but I found that after a few pages of spec, code generation gets worse; I got better results by an incremental approach.

(One part of the app, a dropdown box and menu. Note the widget and the entire app layout was AI code generated.)

The simplest part of the project is a view of club performance over time. Using a detailed prompt, I was able to get all of the functionality working using only code gen. This functionality included dropdown selection box, club history display, league over time, matches played by season. It needed some tweaks, but I did the tweaks using code gen. Getting this simple functionality running took an hour or two.

Towards the end of the project, I added an admin panel for admin users to create. edit, and delete "ordinary" users. With code gen, This took less than half an hour, including bug fixes and UI tweaks.

For one UI element, I needed to create an API interface to supply JSON rather than HTML. Code gen let me create it in seconds.

However, there were problems.

Code gen didn’t do well with generating Bokeh code for my plots and I had to intervene to re-write the code.

It did even worse with retrieving data from Django models. Although I aligned my data as closely as I could to the app, it was still necessary to aggregate data. I found code generation did a really poor job and the code needed to be re-written. Code gen was helpful to figure out Django’s model API though.

In one complex case, I needed to break Django’s ORM and make a SQL call directly to the database. Here, code gen worked correctly on the first pass, creating good-quality SQL immediately.

My use of code gen was not one-and-done, it was an interactive process. I used code generation to create code at the block and function level.

Bokeh

My app is very chart heavy, having more than 10 charts and there aren't that many examples of this type of app that I could find. This means that AI code gen doesn't have much to learn from.

(One of the Bokeh charts. Note the interactive controls on the right of the plot and the fact the plot is part of a tabbed display.)

Code gen didn’t do well with generating Bokeh code for my plots and I had to intervene to re-write code.

I needed to access the Bokeh chart data from the widget callbacks and update the charts with new data (in JavaScript). This involved a building a JSON API, which code gen created very easily. Sadly, code gen had a much harder time with the JavaScript callback. It’s first pass was gibberish and refining the prompt didn’t help. I had to intervene and ask for code gen on a code block-by-block basis. Even then, I had to re-write some lines of code. Unless the situation changes, my view is, code generation for this kind of problem is probably limited to function definition and block-by-block code generation, with hand coding to correct/improve issues.

(Some of the hand-written code. Code gen couldn't create this.)

Render

By this stage, I had an app that worked correctly on my local machine. The final step was deployment so it would be accessible on the public internet. The sundai.club and others, use Render.com and other similar services to rapidly deploy their apps, so I decided to use the free tier of Render.com.

Render’s free tier is good enough for demo purposes, but it isn’t powerful enough for a commercial deployment (which is fair); that's why I’m not linking to my app in this blog post: too much traffic will consume my free allowance.

Unlike some of its competitors, Render uses Postgres rather than SQLite as its database, hence my choice of Postgres. This means deployment is in two stages:

Get the database deployed.
Linking the Django app to the database and deploy it.

The process was more complicated than I expected and I ran into trouble. The documentation wasn’t as clear as it needed to be, which didn’t help. The consistent advice in the Render documentation was to turn off debug. This made diagnosing problems almost impossible. I turned debug on and fixed my problems quickly.

To be clear: code gen was of no help whatsoever.

(Part of Render's deployment screen.)

However, it’s my view this process could be better documented and subsequent deployments could go very smoothly.

General comments about AI code generation

Typically, many organizations require code to pass checks (linting, PEP8, test cases etc.) before the developer can check it into source control. Code generation makes it easier and faster to pass these checks. Commenting and code documentation is also much, much faster.
Code generation works really well for “commodity” tasks and is really well-suited to Django. It mostly works well with UI code generation, provided there’s not much complexity.
It doesn’t do well with complex data manipulations, although its SQL can be surprisingly good.
It doesn’t do well with Bokeh code.
It doesn’t do well with complex UI callbacks where data has to be manipulated in particular ways.

Where my app ended up

End-to-end, it took about two weeks, including numerous blind alleys, restarts, and time spent digging up answers. Knowing what I know now, I could probably create an app of this complexity in less than 5 days, fewer still with more people.

My app has multiple pages, with multiple charts on each page (well over 10 charts in total). The chart types include violin plots, line charts, and heatmaps. Because they're Bokeh charts, my app has built-in chart interactivity. I have widgets (e.g., sliders, radio buttons) controlling some of the charts, which communicate back to the database to update the plots. Of course, I also have Django's user management features.

Discussion

There were quite a few surprises along the way in this project: I had expected code generation to do better with Bokeh and callback code, I’d expected Render to be easier to use, and I thought the database would be easier to build. Notably, the Render and database issues are learning issues; it’s possible to avoid these costs on future projects.

I’ve heard some criticism of code generated apps from people who have produced 70% or even 80% of what they want, but are unable to go further. I can see why this happens. Code gen will only take you so far, and will produce junk under some circumstances that are likely to occur with moderately complex apps. When things get tough, it requires a human with the right skills to step in. If you don’t have the right skills, your project stalls.

My goal with this project was to figure out the skills needed for rapid application development and deployment. I wanted to figure out the costs of enabling a data science team to build their own apps. What I found is the skill set needed is the skill set of a full-stack engineer. In other words, rapid development and deployment is firmly in the realm of software engineers and not data scientists. If data scientists want to build apps, there's a learning curve and a leaning cost. Frankly, I'm coming round to the opinion that data scientists need a broader software skill set.

For a future version of this project, I would be tempted to split off the UI entirely. The Django code would be entirely a JSON server, accessed through the API. The front end would be in Next.js. This would mean having charting software entirely in JavaScript. Obviously, there's a learning curve cost here, but I think it would give more consistency and ultimately an easier to maintain solution. Once again, it points to the need for a full-stack skill set.

To make this project go faster next time, here's what I would do:

Make the database structure reasonably close to how data is to be displayed. Don't get too clever and don't try to optimize it before you begin.
Figure out a way to commoditize creating charts and updating them through a JavaScript callback. The goal is of course to make the process more amenable to code generation.
Related to charts, figure out a better way of using the ORM to avoid using SQL for more complex queries. Figure out a way to get better ORM code generation results.
Document the Render deployment process and have a simple checklist or template code.

Bottom line: it’s possible to do rapid application development and deployment with the right approach, the right tools, and using code gen correctly. Training is key.

Using the app

I want to tinker with my app, so I don't want to exhaust my Render free tier. If you'd like to see my app, drop me a line (https://www.linkedin.com/in/mikewoodward/) and I'll grant you access.

If you want to see my app code, that's easier. You can see it here: https://github.com/MikeWoodward/English-Football-Forecasting/tree/main/5%20Django%20app

Thursday, November 6, 2025

How to get data analysis very wrong: sample size effects

We're not reading the data right

In the real world, we’re under pressure to get results from data analysis. Sometimes, the pressure to deliver certainty means we forget some of the basics of analysis. In this blog post, I’m going to talk about one pitfall you can make which can cause you to give wildly wrong answers. I’ll start with an example.

School size - smaller schools are better?

You’ve probably heard the statement that “small schools produce better results than large schools”. Small-school advocates point out that small schools disproportionately appear in the top performing groups in an area. It sounds like small schools are the way to go, or are they? It’s also true that small schools disproportionately appear among the worse schools in an area. So, which is it, are small schools better or worse?

The answer is: both. Small schools have a higher variation in results because they have fewer students. The results are largely due to “statistical noise” [1].

We can easily see the effects of sample size “statistical noise”, more properly called variance, in a very simple example. Imagine tossing a coin and scoring heads a 1 and tails a 0. You would expect the mean over many tosses to be close to 0.5, but how many tosses do you have to do? I wrote a simple program to simulate tossing a coin and I summed up the results as a I went along. The charts below show four simulations. The x axis of each chart is the number of tosses, the y axis is the running mean, the blue line is the simulation, and the red dotted line is 0.5.

The charts clearly show higher variance at low numbers of simulations. It takes a surprisingly large number of tosses for the mean to get close top 0.5. If we want more certainty, and less variance, we need bigger samples sizes.

We can repeat the experiment, but this time with a six-sided dice and record the running mean. We’d see the same result, more variance for shorter simulations. Let’s try a more interesting example (you’ll see why in a minute). Let’s imagine a 100-sided dice and run the experiment multiple times , recording the mean results after each simulation (I’ve shown a few runs here).

Let’s change the terminology a bit here. The 100-sided dice is a percentage test result. Each student rolls the dice. If there are 100 students in a school, there are 100 die rolls, if there are 1,500 students in the school, we roll the die 1,500 times. We now have a simulation of school test results and the effect of school size.

I simulated 500 schools with 500 to 1,500 students. Here are the results.

As you can see, there’s more variance for shorter smaller schools than larger schools. This neatly explains why smaller schools are both the best in an area and the worst.

You might object to the simplicity of my analysis, surely real school results don't look like this.What does real-world data show? Wainer [1] did the work and got the real results (read his paper for more detials). Here's a screen shot from his paper showing real-world school results. It looks a lot like my simple-minded simulation.

Sample size variation is not the full explanation for school results, but it is a factor. Any analysis has to take it into account. Problems occur because of simple (wrong) analysis and overly-simple conclusions.

The law of large numbers

The effect that variance goes down with increasing sample size is known as the law of large numbers. It’s widely taught and there’s a lot written about it online. Unfortunately, most of the discussions get lost in the weeds very quickly. These two references do a very good job of explaining what’s going on: [1] [2].

The law of large numbers has a substantial body of mathematical theory behind it. It has an informal counter-part, that's a bit easier to understand, called the law of small numbers that says that there’s more variance in smaller samples than large ones. Problems occur because people assume that small samples behave in the same way as larger samples (small school results have the same variance as large school results for example).

So far, this sounds simple and obvious, but in reality, most data analysts aren’t fully aware of the effect of sample size. It doesn’t help that the language used in the real-world doesn’t conform to the language used in the classroom.

Small sales territories are the best?

Let’s imagine you were given some sales data on rep performance for an American company and you were asked to find factors that led to better performance.

Most territories have about 15-20 reps, with a handful having 5 or less reps. The top 10 leader board for the end of the year shows you that the reps from the smaller territories are doing disproportionally well.The sales VP is considering changing her sales organization to create smaller territories and she wants you to confirm what she’s seen in the data. Should she re-organize to smaller territories to get better results?

Obviously, I’ve prepped you with answer, but if I hadn’t, would you have concluded smaller territories are the way to go?

Rural lives are healthier

Now imagine you’re an analyst in a health insurance company in the US. You’ve come across data on the prevalence on kidney cancer by US county. You’ve found that the lowest prevalence is in rural counties. Should you set company policy based on this data? It seems obvious that the rural lifestyle is healthier. Should health insurance premiums include a rural/urban cost difference?

I’ve taken this example from the paper by Wainer [1]. As you might have guessed, rural counties have both the lowest and the highest rates of kidney cancer because their populations are small, so the law of small numbers kicks in. I’ve reproduced Wainer’s chart here: the x axis is county population and the y-axis is cancer rate, see his paper for more about the chart. It’s a really great example of the effect of sample size on variance.

A/B test hell

Let’s take a more subtle example. You’re running an A/B test that’s inconclusive. The results are really important to the company. The CMO is telling everyone that all the company needs to do is run the test for a bit longer. You are the analyst and you’ve been asked if running more tests is the solution. What do you say?

The only time it's worth running the test a bit longer is if the test is on the verge of significance. Other than that, it's probably not worth it. Belle's book [3] has a nice chapter on sample size calculations you can access for free online [4]. The bottom line is, the smaller the effect, the larger the sample size you need for significance. The relationship isn't linear. I've seen A/B tests that would have to run for over a year to reach significance.

Surprisingly, I've seen analysts who don't know how to do a sample size/duration estimate for an A/B test. That really isn't a good place to be when the business is relying on you for answers,

The missing math

Because I’m aiming for a more general audience, I’ve been careful here not to include equations. If you’re an analyst, you need to know:

What variance is and how to calculate it.
How sample size can affect results - you need to look for it everywhere.
How to estimate how much of what you're seeing is due to sample size effects and how much due to something "real".

Unfortunately, references for the law of large numbers get overly technical overly quickly. A good place to start is references that cover variance and standard deviation calculations. I like reference [5], but be aware it is technical.

The bottom line

The law of large numbers can be hidden in data; the language used and the data presentation can all confuse what’s going on. You need to be acutely aware of sample size effects: you need to know how to calculate them and how they can manifest themselves in data in surprising ways.

References

[1] Howard Wainer, “The Most Dangerous Equation”, https://www.americanscientist.org/article/the-most-dangerous-equation

[2] Jeremy Orloff, Jonathan Bloom, “Central Limit Theorem and the Law of Large Numbers”, https://math.mit.edu/~dav/05.dir/class6-prep.pdf

[3] Gerald van Belle, "Statistical rules of thumb", http://www.vanbelle.org/struts.htm

[4] Gerald van Belle, "Statistical rules of thumb chapter 2 - sample size", http://www.vanbelle.org/chapters/webchapter2.pdf

[5] Steven Miller, "The probability lifesaver"