Showing posts with label AI. Show all posts
Showing posts with label AI. Show all posts

Wednesday, February 11, 2026

The perceptron

Why study the perceptron?

Perceptrons were one of the first learning systems and an important early stepping-stone to most recent AI innovations. That alone would be motivation enough to study them, however the reaction of the press, and the consequences of the hype, are a cautionary tale for us in 2026.

I'm going to share with you the why and the how of the perceptron, with some of the consequences of the hype.

Why do we care about systems that learn?

Go back to the 1950s, why would you care about a system that can learn? There’s the obvious coolness of it, but there are important real-world applications.

Photo analysts study reconnaissance photos looking for hidden bunkers or other items of military significance. The work is tiring and boring at times, but it’s hard to automate because it relies on human interpretation rather than a hard and fast set of rules. The “enemy” constantly changes how they disguise their installations, so whoever or whatever is analyzing photos must continually learn.

A similar problem occurs in post offices. If a post office wants to automate letter sorting, it has to automate reading handwritten addresses. Each person’s handwriting is different, which means creating definitive rules about letter or number formation is hard.

A learning system can adapt itself to new information and so stay productive when things change. In practice, this means it can be taught to recognize a new way a country is disguising a bunker or a new way someone is writing the number 5. It doesn’t require its creators to continually tweak settings. Of course, these automated systems can process letters or images etc. much faster (and cheaper) than human beings, which makes them very attractive.

Given the demand existed, how can you create a system that learns?

How do biological systems learn?

The obvious learning systems are biological. By the 1950s, we’d made some progress understanding how brains work, in particular, we had a basic understanding of how neurons worked, which are the lowest level of processing in the brain. 

Neurons take sensory input signals from dendrites into the soma, where the input is “processed”. If the input signal crosses some threshold, the soma fires an output signal (an action potential) through an axon. Neurons learn by changing the way they “weight” different dendrite signals, so changing the conditions under which they fire. 

The output of one neuron could be the input to another neuron and real brains have layers of processing. 

The picture below shows the arrangement for a single neuron.

(Gemini)

My explanation of how neurons work is very simplistic and in reality, it’s much more complicated. In real brains, neurons learn together and there are other biological processes going on involving dendrites. If you want to read more about biological neurons, here are some good references:

The perceptron 

In 1957, at the Cornell Aeronautical Laboratory in Buffalo, New York, the psychologist Frank Rosenblatt was studying human learning (specifically, the neuron) and trying to replicate it in software and hardware. His team built a prototype system, called the perceptron, that could “learn” in a very limited sense. The learning task was simple image classification.

(Rosenblatt and the perceptron. National Museum of the U.S. Navy, Public domain, via Wikimedia Commons)

The Mark I Perceptron input was a 20x20 photocell array; a photocell is very limited form of digital camera. These 400 inputs were fed to “association units” that weighted the inputs. The weights were set by potentiometers that were adjusted by electric motors. Importantly, the initial weights were random to avoid bias. The system summed the weights and used a simple threshold algorithm (response units) to decide the image classification, if the sum of the weighted signals was above the threshold the algorithm output a signal (a true output), if the sum of the weighted signals was below the threshold, the algorithm did not output a signal (a false output). Technically, the name of the threshold function is a Heaviside step function. If the perceptron made an error, the relevant weights were adjusted. The perceptron required 50 training iterations to reliably distinguish between squares and triangles.

(From the perceptron user manual.)

In 2026, this sounds really basic, but in 1957 it was a breakthrough. Rosenblatt and his team had demonstrated that a machine could learn and change how it “sees” the world.

References:

The perceptron theory

Here’s a simple representation of the perceptron. The inputs from the photocell are fed in and assigned weights. There’s a bias term to account for bias in the photocells, for example, the photocells might give a very small signal instead of zero when there’s no image. The weighted inputs (and the bias) are summed. If the weighted sum exceeds some threshold, the perceptron fires, if not, it doesn’t.

(The perceptron is a linear classifier, meaning it can only separates point on a hyperplane. In two dimensions, this means it can only separate points using a straight line.)

Mathematically, this is how it works.

\[u = \sum w_i x_i + b \]

\[y = f(u(x)) = \begin{cases} 1, & \text{if } u(x) > \theta \\ 0, & \text{otherwise} \end{cases}\]

In the vector notation used in machine learning, the equations are usually written:

\[y = h( \textbf{ w} \cdot \textbf{x } + b ) \]

where h is the Heaviside step function.

So far, this is pretty simple, but how does it learn? Rosenblatt insisted on starting training from a random state, so that gives us a starting point. Then we expose the perceptron to some training data where we know what the output should be (the data is labeled). Here’s how we update the weights:

\[ w_i  \leftarrow w_i + \Delta w_i \]

\[ \Delta w_i = \eta(t - o)x_i \]

where:

  • \(t\) is the target or correct output
  • \(o\) is the measured output
  • \(\eta\) is the training rate and \( 0 \lt \eta \leq 1\)

We update the weights and try again in an iterative loop. This continues until we can successfully predict the training data set within a certain error, or we’ve reached a set number of iterations, or we’re seeing no improvement. This is similar to how machine learning systems work today.

References:

Perceptron problems

There were lots of issues with the perceptron in its original form. Let’s start with the worst: the hype.

Rosenblatt gave interviews to the press on his system and they ran with it, but not in a good way. A 1958 New York Times article was typical, the headline read “NEW NAVY DEVICE LEARNS BY DOING; Psychologist Shows Embryo of Computer Designed to Read and Grow Wiser”, with a lede: “The Navy revealed the embryo of an electronic computer today that it expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.” Other press stories were similarly sensational and hyped the technology. The press very much set the expectation that walking, talking AIs were just around the corner. Of course, the technology couldn’t deliver what the press forecast, which helped lead to a loss of confidence.

The technical problems varied from the straightforward to the severe.

The original perceptron used a simple threshold to decide whether to fire or not, but this caused problems for training weights. Most important training algorithms use derivatives (for example, gradient descent). A simple threshold isn’t differentiable, which means it can’t be used in these kinds of training algorithms. Fortunately, this is relatively easy to fix using a differentiable function to replace the simple threshold. There are a number of possible differentiable functions, and a popular choice is the sigmoid function. (The function that decides whether to fire or not is now called the activation function).

A more serious problem is the logical limitations of the simple perceptron. As Minksy and Papert showed in 1969, there are some logical structures (most notably, XOR), you can’t build using the simple single-layer perceptron architecture. Although multi-layer networks solve these problems, the Minsky and Papert book and their papers significantly damaged research in this area, as we'll see.

This is only a summary of the difficulties the perceptron faced. For a fuller description, check out: https://yuxi.ml/essays/posts/perceptron-controversy/

What happened next

By the early 1970s, the hype bubble had burst. Minsky and Papert’s book had an impact and governments found disappointing results from funding perceptron-based projects; projects promised big results, but in reality, very little was produced. Governmental patience eventually wore thin and eventually they concluded this form of AI research wasn't worth funding. The research money went elsewhere leading to the first “AI Winter” which lasted for a decade or so. 

Sadly, AI experienced another hype bubble and collapse in the late 1980s, a second "AI Winter". As a whole, AI research began to get a bad reputation.

The “AI Winters” bled talent and money away from neural network development, but research still continued.  Although multi-layer networks had been developed by the 1960s, it wasn’t known how to train them until the Rumelhart, Hinton, and Williams 1986 paper “Learning representations by back-propagating errors” [https://www.nature.com/articles/323533a0] popularized the back propagation method. Convolutional Neural Networks (CNNs) using back propagation and a convolutional structure were demonstrated in 1989. With these technologies as the backbone, LLMs were developed starting in the mid-to-late 2010s. It’s only the enormous success of LLMs that has brought a flood of money into AI research and a resurgence of interest in its origins.

Rosenblatt had a wide variety of research interests, including astronomy and photometry (measuring light). By any measure he was a genius. Unfortunately, in 1971 he died at the age of 43 in a boating accident. His death was just a few years into the first "AI Winter", so he saw the hype and the subsequent bubble bursting. Sadly, he never go to see how the field eventually developed.

Thoughts on the story

The original perceptron was very much based on what had gone before, but it was a breakthrough and ahead of its time, which was part of the problem. The necessary technology wasn’t there to advance quickly. Unfortunately, the hype in the press, fed by  Rosenblatt and others, set unrealistic expectations. While great for short-term research funding, it was terrible for the long-term when the hype bubble burst.

AI as a whole has been prone to hype cycles through its entire existence. It's no wonder there's a lot of discussion online about the latest AI bubble bursting. My feeling is, it is different this time, but we're still in a bubble and people are going to get hurt when it eventually pops.

Monday, February 9, 2026

Learning by hand is better than learning by AI

Accelerating learning with AI?

Recently, I've been learning a new LLM API from a vendor. There's a ton of documentation to wade through to get to what I need to know and the vendor's examples are overly detailed. In other words, it's costly to figure out how to use their API.

(Gemini)

I decided to use code gen to get me up and running quickly. In the process, I found out how to speed up learning, but equally important, I found out what not to do.

Code gen everywhere!

My first thought was to code gen the entire problem and figure out what was going on from the code. This didn't work so well.

The code worked and gave me the answer I expected, but there were two problems. Firstly, the code was bloated and secondly, it wasn't clear why it was doing what it was doing. The bloated code made it hard to wade through and zero in on what I wanted. It wasn't clear to me why it had split something into two operations, despite code gen commenting the code. Because I didn't know the vendor's API, I couldn't be sure the code was correct; it didn't look right, but was it?

Hand coding wins - mostly

I recoded the whole thing by hand the old fashioned way, but using the generated code as an inspiration (what function to call and what arguments to use). I tried the LLM calls in the way I thought they should work, but the code didn't work the way I thought it would. On the upside, the error message I got was very helpful and I tracked down why it didn't work. Now I knew why code gen had made two LLM calls instead of one and I knew what outputs and inputs I should use.

The next step was properly formatting the final output. Foolishly, I tried code gen again. It gave me code, but once again, I couldn't follow why it was doing what it was doing. I went back looking at the data structure in detail and moved forward by hand.

But code gen was still helpful. I used it to help me fill in API argument calls and to build a Pydantic data structure. I also used it to format my code. Yes, this isn't as helpful as I'd hoped, but it's still something and it still made things easier for me.

Why code gen didn't work fully

Code gen created functioning code, not tutorial code, so the comments it generated weren't appropriate to learn what was going on and why.

Because I didn't know the API, I couldn't tell if code gen was correct. As it turned out, code gen produced code that was overly complex, but it was correct.

Lessons

This experience crystallized some other experiences I've had with AI code gen.

If I didn't care about understanding what's going on underneath, code gen would be OK. It would work perfectly well for a demo. Where things start to go wrong is if you're building a production system where performance matters or a system that will be long-lived - in these cases the why of coding matters.

Code generation is an accelerator if you know what you're doing. If you don't know the libraries (or language) you're using, you're on thin ice. Eventually, something bad is going to happen and you won't know how to fix it.

Wednesday, January 14, 2026

Replit vs. Cursor - who wins?

Building Business Apps - Cursor vs. Replit

For a while now, I've been very interested in using AI to build BI-type apps. I know you can do it with Cursor, but it requires a strong technical background. I've heard people have had great success with Replit, so I thought I would give it a go. I decided to build the same app in both Cursor and Replit. It's a kind of battle of the tools.

(Gemini.)

For my comparison contest. I chose to build a simple app that shows the weather and news for a given location.

Round 1: getting started/ease of use

I gave both contenders the same prompt and asked them to build me an app. Both tools gave me an app in about the same time. However, I found Replit much, much easier to use; by contrast, Cursor can be tough to get started with.

Round 1 is a decisive victory for Replit.

Round 2: building the app

Both apps had problems and I needed to tweak them to get them working. I found I had to give Replit multiple prompts to fix problems; problems that just didn't occur in Cursor. Replit got stuck on some simple things and I had to get creative with prompting to get round them, all the while my AI token consumption went up. Cursor didn't need this level of imaginative prompting.

I'm giving this round to Cursor on points.

Round 3: editing the visual layout

Replit let me edit the visual layout of the app directly, while Cursor did not. I know Cursor has a visual editor, but I just couldn't get it to work. This is of course an ease of use thing, and overall, Replit is easier. For this app, I didn't need to tweak the layout but it's an important consideration. 

Round 3 is a decisive victory for Replit.

Round 4: what is the app doing?

I wanted to know what the apps were doing "under the hood" so I wanted to see the code. Cursor is unashamedly a code editor, so it was simple. By contrast, Replit hides the code away and it requires a bit of digging. On a related theme, Cursor is much better at debugging, so it's easier to track down errors.

Round 4 is a victory for Cursor.

Round 5: changing the app under the hood

I wanted to change the app "under the hood", which meant changing some of the code. Cursor generates code that's very well commented, so it's easy to see what's going on. By contrast, Replit's code is sparsely commented and I found it difficult to understand what each file did. Bear in mind though, Replit is trying to be an app creation tool not a code editor.

Round 5 is a victory for Cursor.

Round 6: running the app locally

Both Replit and Cursor did well here. This round is a draw.

Round 7: deploying the app to the web

Replit makes this really easy, There's a simple process to go through and your app is deployed. Cursor doesn't do deployment and the deployment services like Render have a learning curve.

Round 7 is a victory for Replit.

A disturbing thought

I was looking at how both apps turned out and something struck me when I was looking at the code for the Cursor app: what services did these apps use? I didn't specify what APIs I wanted to use, the AIs chose for me.

Both of these apps converted an address to a latitude/longitude, showed a map, got local news, got a climate chart for the year, and so on. But what APIs (services) did they use underneath? What were the terms and conditions of the services? What are the limitations of the services? The answer is: you have to find out for yourself. Which means either asking the AI or digging into the code.

If I sign up for an API key, I have to go to a website, read what the service offers, and accept the terms and conditions. For example, some APIs forbid commercial use, some are very rate limited, and others require an acknowledgment in the app or web page. If you build an app using an AI, how do you know what you've agreed to? Will your app get rate limited? Will you get banned for using the API service inappropriately? What are the risks? It seems like a feeble defense to say "my AI made me do it".

It looks like the onus is on you to figure this out, which is definitely a problem.

Who won?

Looking at the results of the contest, my answer is: it depends on your end goal.

If you want a tool to let you build a "simplish" app and you don't have much, if any, coding experience, then Replit is the clear winner. On the downside, it will be very difficult to add more complex features later.

If you want to build a more complex app and you have coding experience, then Cursor wins. Cursor also wins if you think that you'll need to edit the app code in the future. 

What would I chose for internal reporting or BI-type development? On balance, Cursor, but it's not a clear victory. Here's my logic.

  • I love the idea of democratizing analysis. I like giving users the power to answer their own questions. This would appear to favor Replit, but...
  • I worry about maintainability and extendability. I've seen too many cases where a one-off app has become business critical and no-one knows how to maintain it. This favors Cursor because in my view, it produces more maintainable code.

Future directions

The ultimate goal is a tool that lets a non-coder quickly and simply build an app, even a complex one, that's maintainable in the future. This could be building an app for internal use (within an organization) or external use. The app development process will be a combination of natural language prompting and visual editing. Right now, we're really, really close to that goal and it's probably arriving later in 2026.

I'm sure some readers will feel I'm being harsh when I say Replit isn't quite there yet; for me, it needs less prompting and better code layout and documentation. Cursor has a way to go and I'm not convinced they're going in this direction (they may well stay focused on code development). 

In my view, the bigger problem is not app development but data availability. To build internal apps, the internal data has to be available, which means it has to be well-described and in a place where the app development program (and the app itself) can access it. In many organizations, their data isn't as well organized as it should be (to put it politely). It's like having a car but not being able to find gas (or only finding the wrong gas), it makes the car useless. To make internal app development really fly, internal data has to be organized "good enough". We may well see more focus on data organization within companies as a result.

Both Cursor hand Replit have the advantage that they both ultimately use common languages and packages. This means that the skills to maintain apps created using them are common in any company with programmers or analysts on staff. Contrast that with BI tools where the skills and knowledge of how to use the BI tools are only in the BI group. I can see tools like Cursor and Replit encroaching more and more into BI territory, especially as app development becomes democratized.

Tuesday, December 30, 2025

Why are weather forecasting sites so bad?

Just show me what's relevant!

Weather forecasting in the US has got really bad for no real reason. I'm not talking about the accuracy, I'm talking about the way the data is presented. Oddly, it's the professional weather sites that are the worst.

Here's what I want. I want a daily view of the weather for the next week. I want temperature highs and lows, chances of rain/snow when and how much, and some details on the wind if it's going to be unusual. A line of two of text would be great for each day. I don't mind ads, but I don't want so many that I can't read the data. It's not much to ask, but it seems like it's hard to get.

(Gemini)

What the commercial sites give me

The commercial sites give me visual clutter everywhere. There are ads all over their pages. Of course, ads scream for attention, so multiple ads are distracting and make the page hard to use. If I try and change anything on the page, I get an ad I have to click away from.  Because they have to allow space for ads and links to other content, the screen real estate they can use for actual weather  data is very limited. Throw in some over-size icons and you leave even less room for meaningful text and data. 

The hourly views they provide are very detailed, but oddly, poorly presented. If I want the hourly forecast for three days' time, I have to scroll through lots of stuff - which I guess is the point. The summary views are too truncated because of their cluttered presentations.

The radar charts are nice, as is the animation, but again they're distracting. The choice of colors makes me feel like I'm reading a 1980s superhero comic.

Of course, these websites have to be paid for and the money comes from ads. It seems like it's ads or subscriptions and I'm already paying too much in subscription fees. It feels like things aren't going to get better.

Google and others

Google provides a very good weather summary, as do a number of other sites. Unfortunately, they don't provide all the data I want, but they get pretty close. Their data presentation is great too. 

TV is the worst

Let me be blunt. I don't trust TV forecasts. I've read that they tend to exaggerate bad weather to get viewers, this includes exaggerating rainfall and exaggerating weather severity. I've read of TV forecasters who were asked by their station manager to make forecasts worse to drive ratings. There's a saying in journalism, "if it bleeds, it leads" and it seems like sometimes weather forecasts fit into this category. It may well be that some or all of my local stations are not like this, but I have no way of knowing. If they want to gain my trust, they should publish data on their accuracy, but none of them do.

For reasons I'll get to in a minute, AI has made me lose faith in TV forecasters completely. 

NWS

By now, many of you will be screaming about the National Weather Service. They provide free forecasts and plenty of data via their API. They have exactly the data I want, but it's poorly presented. Their website feels very late 1990s, and there may be reasons for that.

There's been an on-and-off campaign against the NWS for some time now. The argument against it is that it's unfair competition for the commercial weather forecast providers.  Bear in mind that the commercial providers all use NWS data underneath and that we the tax payers have paid for weather collection. The push is to have the NWS stop providing data and forecasts to the public but still provide the data to commercial providers in bulk. In effect, this means the public will pay for data collection and pay again to see the data they paid to be collected. I can't help feeling that part of the awkward NWS data presentation is to deflect the unfair competition argument.

The NWS' parent agency is NOAA and recently, NOAA has suffered substantial cuts. At this time, it's not clear what the effect of these cuts are, but it can't be good for forecasting.

What I did about it

I built my own app using AI code gen and using an LLM to give me the text I wanted.

I wrote a long prompt to tell Cursor to build an app. I told it to get a US zip code, find the biggest town or city in the zip code, and convert it to latitude and longitude. Next up, I told it to get the NWS seven day forecast and pass the data to Google Gemini and produce a summary forecast from the data. Finally, I added in a weather chatbot, just because. I put the whole thing into Streamlit.

My app isn't perfect, but it's pretty close to what I want. It all fits on one page so it's easy to see the daily forecast and the overall summary is very readable. If I have questions, I can just ask the chatbot. I'm now using my app when I want a forecast because it has what I want and it's faster and easier to use than the alternatives. It's way better than watching the TV weather forecast and I'm convinced my app isn't biased to emphasize drama.

(My app, simple but effective.)

(Future enhancements I'm thinking of adding include:

  • Changing to a tabbed display.
  • Summary and seven day view on the main tab.
  • Hourly views on another tab - including Google-like charts.
  • Adding a radar view tab using the NWS radar data.
  • Adding text-to-speech via an AI service.
This is all about adding more functionality in an easy-to-use way that lets me get what I want quickly.)

My app took 10 minutes to write.

Let me say this again. I built an app that's better for me than the existing commercial weather forecasting services and I did it in 10 minutes. 

There are implications here.

Let's say I'm a radio station and my existing meteorologist retires or leaves. Why not replace them with an app? I can generate a soothing calming voice using AI so I can automate the whole forecast and save myself some money. I can do the same thing if I'm a TV station too; I can hire someone cheap to read the forecast or generate a movie of the forecast. I could also amp up the urgency of any bad news without any fear of someone pushing back. In other words, AI is a game changer.

So long as the NWS exists and is providing free data, the potential exists to disrupt the weather forecasting market using AI. 

What other markets like this could AI disrupt?

Tuesday, December 23, 2025

Using Cursor for data science: a talk

Code generation is good enough for data science use

I gave a talk at PyData Boston on using Cursor for data science. Here's the talk.



Friday, December 19, 2025

Small adventures with small language models

Small is the new large

I've been talking to people about small language models (SLMs) for a little while now. They've told me they've got great results and they're saving money compared to using LLMs; these are people running businesses so they know what they're talking about. At an AI event, someone recommended I read the recent and short NVIDIA SLM paper, so I did. The paper was compelling; it gave the simple message that SLMs are useful now and you can save time and money if you use them instead of LLMs. 

(If you want to use SLMs, you'll be using Ollama and HuggingFace. They work together really well.)

As a result of what I've heard and read, I've looked into SLMs and I'm going to share with you what I've found. The bottom line is: they're worth using, but with strong caveats.

What is a SLM?

The boundary between an SLM and an LLM is a bit blurry, but to put it simply, an SLM is any model small enough to run on a single computer (even a laptop). In reality, SLMs require quite a powerful machine (developer spec) as we'll see, but nothing special, and certainly nothing beyond the budget of almost all businesses. Many (but not all) SLMs are open-source.

(If your laptop is "business spec", e.g., a MacBook Air, you probably don't have enough computing power to test out SLMs.) 

How to get started

To really dive into SLMs, you need to be able to use Python, but you can get started without coding. Let's start with the non-coders path because this is the easiest way for everyone to get going.

The first port of call is visiting ollama.com and downloading their software for your machine. Install the software and run it. You should see a UI like this.

Out-of-the-box, Ollama doesn't install any SLMs, so I'm going to show you how to install a model. From the drop down menu on the bottom right, select llama3.2. This will install the model on your machine which will take a minute or so. Remember, these models are resource hogs and using them will slow down your machine.

Once you've installed a model, ask it a question. For example, "Who is the Prime Minister of Canada?". The answer doesn't really matter, this is just a simple proof that your installation was successful. 

(By the way, the Ollama logo is very cute and they make great use of it. It shows you the power of good visual design.)

So many models!

The UI drop down list shows a number of models, but these are a fraction of what's available. Go to this page to see a few more: https://ollama.com/library. This is a nice list, but you actually have access to thousands more. HuggingFace has a repository of models that follow the GGUF format, you can see the list here: https://huggingface.co/models?library=gguf

Some models are newer than others and some are better than others at certain tasks. HuggingFace have a leaderboard that's useful here: https://huggingface.co/spaces/ArtificialAnalysis/LLM-Performance-Leaderboard. It does say LLM, but it includes SLMs too and you can select just a SLM view of the models. There are also model cards you can explore that give you insight into the performance of each model for different types of tasks. 

To select the right models for your project, you'll need to define your problem and look for a model metric that most closely aligns with what you're trying to do. That's a lot of work, but to get started, you can install the popular models like mistral, llama3.2, and phi3 and get testing.

Who was the King of England in 1650?

You can't just generically evaluate an SLM, you have to evaluate it for a the task you want to do. For example, if you want a chatbot to talk about the stock you have in your retail company, it's no use testing the model on questions like "who was King of England in 1650?". It's nice if the model knows Kings & Queens, but not really very useful to you. So your first task is defining your evaluation criteria.

(England didn't have a King in 1650, it was a republic. Parliament had executed the previous King in 1649. This is an interesting piece of history, but why do you care if your SLM knows it?)

Text analysis: data breaches

For my evaluation, I chose a project analyzing press reports on data breaches. I selected nine questions I wanted answers to from a press report. Here are my questions:

  • "Does the article discuss a data breach - answer only Yes or No"
  • "Which entity was breached?"
  • "How many records were breached?"
  • "What date did the breach occur - answer using dd-MMM-YYYY format, if the date is not mentioned, answer Unknown, if the date is approximate, answer with a range of dates"
  • "When was the breach discovered, be as accurate as you can"
  • "Is the cause of the breach known - answer Yes or No only"
  • "If the cause of the breach is known state it"
  • "Were there any third parties involved - answer only Yes or No"
  • "If there were third parties involved, list their names"

The idea is simple, give the SLM a number of press reports. Get it to answer the questions on each article. Check the accuracy of the results for each SLM.

As it turns out, my questions needs some work, but they're good enough to get started.

Where to run your SLM?

The first choice you face is which computer to run your SLM on. Your choices boil down to evaluating it on the cloud or on your local machine. If you evaluate on the cloud, you need to choose a machine that's powerful enough but also works with your budget. Of course, the advantage of cloud deployment is you can choose any machine you like. If you choose your local machine, it needs to be powerful enough for the job. The advantage of local deployment is that it's easier and cheaper to get started.

To get going quickly, I chose my local machine, but as it turned out, it wasn't quite powerful enough.

The code

This is where we part ways with the Ollama app and turn to coding. 

The first step is installing the Ollama Python module (https://github.com/ollama/ollama-python). Unfortunately, the documentation isn't great, so I'm going to help you through it.

We need to install the SLMs on our machine. This is easy to do, you can either do it via the command line or via the API. I'll just show you the command line way to install the model llama3.2:

ollama pull llama3.2

Because we have the same nine questions we want to ask of each article, I'm going to create a 'custom' SLM. This means selecting a model (e.g. Llama3.2) and customizing it with my questions. Here's my code.

ollama.create(
model='breach_analyzer',
from_='llama3.2',
system=system_prompt,
stream=True,
):

The system_prompt is my nine questions I showed you earlier plus a general prompt. model is the name I'm giving my custom model; in this case I'm calling it breach_analyzer.

Now I've customized my model, here's how I call it:

response = ollama.generate(
model='breach_analyzer',
prompt=prompt,
format=BreachAnalysisResponse.model_json_schema(),
)

The prompt is the text of the article I want to analyze. The format is the JSON format I want the results to be in.  The response is the response from the model using the JSON format defined by BreachAnalysisResponse.model_json_schema().

Note I'm using generate here and not chat. My queries are "one-off" and there's no sense of a continuing dialog. If I'd wanted a continuing dialog, I'd have used the chat function.

Here's how my code works overall:

  1. Read in the text from six online articles.
  2. Load the model the user has selected (either mistral, llama3.2, or phi3).
  3. Customize the model.
  4. Run all six online articles through the customized model.
  5. Collect the results and analyze them.
I created two versions of my code, a command line version for testing and a Streamlit version for proper use. You can see both versions here: https://github.com/MikeWoodward/SLM-experiments/tree/main/Ollama

The results

The first thing I discovered is that these models are resource hogs! They hammered my machine and took 10-20 minutes to run each evaluation of six articles. My laptop is a 2020 developer spec MacBook Pro but it isn't really powerful enough to evaluate SLMs. The first lesson is, you need a powerful, recent machine to make this work; one that has GPUs built in that the SML can access. I've heard from other people that running SLMs on high-spec machines leads to fast (usable) response times.

The second lesson is accuracy. Of the three models I evaluated, not all of them answered my questions correctly. One of the articles was an article about tennis and not about data breaches, but one of the models incorrectly said it was about data breaches. Another of the models told me it was unclear whether there were third parties involved in a breach and then told me the name of the third party! 

On reflection, I needed to tweak my nine questions to get clearer answers. But this was difficult because of the length of time it took to analyze each article. This is a general problem; it took so long to run the models that any tweaking of code or settings took too much time.

The overall winner in terms of accuracy was Phi-3, but this was also the slowest to run on my machine, taking nearly 20 minutes to analyze six articles. From commentary I've seen elsewhere, this model runs acceptably fast on a more powerful machine.

Here's the key question: could I replace paid-for LLMs with SLMs? My answer is: almost certainly yes, if you deploy your SLMs on a high-spec computer. There's certainly enough accuracy here to warrant a serious investigation.

How I could have improved the results?

The most obvious thing is a faster machine. A brand new top-of-the-range MacBookPro with lots of memory and built-in GPUs. Santa, if you're listening, this is what I'd like. Alternatively, I could have gone onto the cloud and used a GPU machine.

My prompts could be better. They need some tweaking.

I get the text of these articles using requests. As part of the process, it gives me all of the text on the page, which includes a lot of irrelevant stuff. A good next step would be to get rid of some of the extraneous and distracting text. There are lots of ways to do that and it's a job any competent programmer could do.

If I could solve the speed problem, it would be good to investigate using multiple models. This could take several forms:

  • asking the same questions using multiple models and voting on the results
  • using different models for different questions.

What's notable about these ways of improving the results is how simple they are.

Some musings

  • Evaluating SLMs is firmly in the technical domain. I've heard of non-technical people try to play with these models, but they end up going nowhere because it takes technical skills to make them do anything useful. 
  • There are thousands of models and selecting the right one for your use case can be a challenge. I suggest going with the most recent and/or ones that score most highly on the HuggingFace leaderboard.
  • It takes a powerful machine to run these models. A new high-end machine with GPUs would probably run these models "fast enough". If you have a very recent and powerful local machine, it's worth playing around with SLMs locally to get started, but for serious evaluation, you need to get on the cloud and spend money.
  • Some US businesses are allergic to models developed in certain countries, some European businesses want models developed in Europe. If the geographic origin of your model is important, you need to check before you start evaluating.
  • You can get cost savings compared to LLMs, but there's hard work to be done implementing SLMs.

I have a lot more to say about evaluations and SLMs that I'm not saying here. If you want to hear more, reach out to me.

Next steps

Ian Stokes-Rees gave an excellent tutorial at PyData Boston on this topic and that's my number one choice for where to go next.

After that, I suggest you read the Ollama docs and join their Discord server. After that, the Hugging Face Community is a good place to go. Lastly, look at the YouTube tutorials out there.

Monday, December 1, 2025

Some musings on code generation: kintsugi

Hype and reality

I've been using AI code generation (Claude, Gemini, Cursor...) for months and I'm familiar with its strengths and weaknesses. It feels like I've gone through whole the hype cycle (see https://en.wikipedia.org/wiki/Gartner_hype_cycle) and now I'm firmly on the Plateau of Productivity. Here are some musings covering benefits, disappointments, and a way forward.

(The Japanese art of Kintsugi. Image by Gemini.)

Benefits

Elsewhere, people have waxed lyrical about the benefits of code generation, so I'm just going to add in a few novel points.

It's great when you're unfamiliar with an area of a language; it acts as a prompt or tutorial. In the past, you'd have to wade through pages of documentation and write code to experiment. Alternatively, you could search to see if anyone's tackled your problem and has a solution. If you were really stuck, you could try and ask a question on Stack Overflow and deal with the toxicity. Now, you can get something to get you going quickly.

Modern code development requires properly commenting code, making sure code is "linted" and PEP8 compliant, and creating test cases etc. While these things are important, they can consume a lot of time. Code generation steps on the accelerator pedal and makes them go much faster. In fact, code gen makes it quite reasonable to raise the bar on code quality.

Disappointments

Pandas dataframes

I've found code gen really doesn't do well manipulating Pandas dataframes. Several times, I've wanted to transform dataframes or do something non-trivial, for example, aggregating data, merging dataframes, transforming a column in some complex way and so on. I've found the generated code to either be wrong or really inefficient. In a few cases, the code was wrong, but in a way that was hard to spot; subtle bugs are costly to fix.

Bloated code

This is something other people have commented to me too: sometimes generated code is really bloated. I've had cases where what should have been a single line of code gets turned into 20 or more lines. Some of it is "well-intentioned", meaning lots of error trapping. But sometimes it's just a poor implementation. Bloated code is harder to maintain and slower to run.

Django

It took me a while to find the problems with Django code gen. On the whole, code gen for Django works astonishingly well, it's one of the huge benefits. But I've found the generated code to be inefficient in several ways:

  • The model manipulations have sometimes been odd or poor implementations. A more thoughtful approach to aggregation can make the code more readable and faster.
  • If the network connection is slow or backend computations take some time, a page can take a long time to even start to render. A better approach involves building the page so the user sees something quickly and then adding other elements as they become available. Code gen doesn't do this "out of the box".
  • UI layout can sometimes take a lot of prompting to get right. Mostly, it works really well, but occasionally, code gen finds something it really, really struggles with. Oddly, I've found it relatively easy to fix these issues by hand.

JavaScript oddities

Most of my work is in Python, but occasionally, I've wandered into JavaScript to build apps. I don't know a lot of JavaScript, and that's been the problem, I've been slow to spot code gen wrongness.

My projects have widgets and charts and I found the JavaScript callbacks and code were overcomplicated and bloated. I re-wrote the code to be 50% shorter and much clearer. It cost me some effort to come up to speed with JavaScript to spot and fix things.

Oddly, I found hallucination more of a problem for JavaScript than Python. My code gen system hallucinated the need to include an external CSS that didn't exist and wasn't needed. Code gen also hallucinated "standard" functions that weren't available (that was nice one to debug!).

Similar to my Python experience, I found code gen to be really bad at manipulating data objects. In a few cases, it would give me code that was flat out wrong.

'Unpopular' code

If you're using libraries that have been extensively used by others (e.g. requests, Django, etc.), code gen is mostly good. But when you're using libraries that are a little "off the beaten path", I've found code generation really drops down in quality. In a few cases, it's pretty much unusable.

A way forward through the trough of disappointment

It's possible that more thorough prompting might solve some of these problems, but I'm not entirely convinced. I've found that code generation often doesn't do well with very, very detailed and long prompting. Here's what I think is needed.

Accepting that code generation is flawed and needs adult supervision. It's a tool, not a magic wand. The development process must include checks the code is correct.

Proper training. You need to spot when it's gone wrong and you need to intervene. This means knowing the languages you're code generating. I didn't know JavaScript well enough and I paid the price.

Libraries to learn from and use. Code gen learns from your codebase, but this isn't enough, especially if you're doing something new, and it can also mean code gen is learning the wrong things. Having a library means code gen isn't re-inventing the wheel each time.

In a corporate setting, all this means having thoughtful policies and practices for code gen and code development. Code gen is changing rapidly, which means policies and practices will need to be updated every six months, or when you learn something new.

Kintsugi

Kintsugi is the Japanese art of taking something broken (e.g., a pot or a vase) and mending it in a way that both acknowledges its brokenness and makes it more beautiful. Code generation isn't broken, but it can be made a lot more useful with some careful thought and acknowledging its weaknesses.

Monday, November 24, 2025

Caching and token reduction

This is a short blog post to share some thoughts on how to reduce AI token consumption and improve user response times.

I was at the AI Tinkerers event in Boston and I saw a presentation on using AI report generation for quant education. The author was using a generic LLM to create multiple choice questions on different themes. Similarly, I've been building an LLM system that produces a report  based on data pulled from the internet. In both cases, there are a finite number of topics to generate reports on. My case was much larger, but even so, it was still finite.

The obvious thought is, if you're only generating a few reports or questions & answers, why not generate them in batch? There's no need to keep the user waiting and of course, you can schedule your LLM API calls in the middle of the night when there's less competition for resources. 

(Canva)

In my case, there are potentially thousands of reports, but some reports will be pulled more often than others. A better strategy in my case is something like this:

  1. Take a guess at the most popular reports (or use existing popularity data) and generate those reports overnight (or at a time when competition for resources is low). Cache them.
  2. If the user wants a report that's been cached, return the cached copy.
  3. If the user wants an uncached report:
    • Tell the user there will be a short wait for the LLM
    • Call the LLM API and generate the report
    • Display the report
    • Cache the report
  4. For each cached report, record the LLM and it's creation timestamp. 

You can start to do some clever things here like refresh the reports every 30 days or when the LLM is upgraded etc.

I know this isn't rocket science, but I've been surprised how few LLM demos I've seen use any form of batch processing and caching.

Monday, November 17, 2025

Data scientists need to learn JavaScript

Moving quickly

Over the last few months, I've become very interested in rapid prototype development for data science projects. Here's the key question I asked myself: how can a data scientist build their own app as quickly as possible? Nowadays, speed means code gen, but that's only part of the solution.

The options

The obvious quick development path is using Streamlit; that doesn't require any new skills because it's all in Python. Streamlit is great, and I've used it extensively, but it only takes you so far and it doesn't really scale. Streamlit is really for internal demos, and it's very good at that.

The more sustainable solution is using Django. It's a bigger and more complex beast, but it's scalable. Django requires Python skills, which is fine for most data scientists. Of course, Django apps are deployed on the web and users access them as web pages.

The UI is one place code gen breaks down under pressure

Where things get tricky is adding widgets to Django apps. You might want your app to take some action when the user clicks a button, or have widgets controlling charts etc. Code gen will nicely provide you with the basics, but once you start to do more complicated UI tasks, like updating chart data, you need to write JavaScript or be able to correct code gen'd JavaScript.

(As an aside, for my money, the reason why a number of code gen projects stall is because code gen only takes you so far. To do anything really useful, you need to intervene, providing detailed guidance, and writing code where necessary. This means JavaScript code.)

JavaScript != Python

JavaScript is very much not Python. Even a cursory glance will tell you the JavaScript syntax is unlike Python. More subtly, and more importantly, some of the underlying ideas and approaches are quire different. The bottom line is, a Python programmer is not going to write good enough JavaScript without training.

To build even a medium complexity data science app, you need to know how JavaScript callbacks work, how arrays work, how to debug in the browser, and so on. Because code gen is doing most of the heavy lifting for you, you don't need to be a craftsman, but you do need to be a journeyman.

What data scientists need to do

The elevator pitch is simple:

  • If you want to build a scalable data science app, you need to use Django (or something like it).
  • To make the UI work properly, code gen needs adult supervision and intervention.
  • This means knowing JavaScript.
(Data Scientist becoming JavaScript programmer. Gemini.)

In my view, all that's needed here is a short course, a good book, and some practice. A week should be enough time for an experienced Python programmer to get to where they need to be.

What skillset should data scientists have?

AI is shaking everything up, including data science. In my view, data scientists will have to do more than their "traditional" role. Data scientists who can turn their analysis into apps will have an advantage. 

For me, the skillset a data scientist will need looks a lot like the skillset of a full-stack developer. This means data scientists knowing a bit of JavaScript, code gen, deployment technologies, and so on. They won't need to be experts, but they will need "good enough" skills.

Wednesday, November 12, 2025

How to rapidly build and deploy data science apps using code gen

Introduction

If you want to rapidly build and deploy apps with a data science team, this blog post is written for you.

(Canva)

I’ve seen how small teams of MIT and Harvard students at the sundai.club in Boston are able to produce functioning web apps in twelve hours. I want to understand how they’re doing it, adapt what they’re doing for business, and create data science heavy apps very quickly. This blog post is about what I’ve learned.

Almost all of the sundai.club projects use an LLM as part of their project (e.g., using agentic systems to analyze health insurance denials), but that’s not how they’re able to build so quickly. They get development speed through code generation, the appropriate use of tools, and the use of deployment technologies like Vercel or Render. 

(Building prototypes in 12 hours: the inspiration for this blog post.)

Inspired by what I’ve seen, I developed a pathfinder project to learn how to do rapid development and deployment using AI code gen and deployment tools. My goal was to find out:

  • The skills needed and the depth to which they’re needed.
  • Major stumbling blocks and coping strategies.
  • The process to rapidly build apps.

I'm going to share what I've learned in this blog post. 

Summary of findings

Process is key

Rapid development relies on having three key elements in place:

  • Using the right tools.
  • Having the right skill set.
  • Using AI code gen correctly.

Tools

Fast development must use these tools:

  • AI-enabled IDE.
  • Deployment platform like Render or Vercel.
  • Git.

Data scientists tend to use notebooks and that’s a major problem for rapid development; notebook-based development isn’t going to work. Speed requires the consistent use of AI-enabled IDEs like Cursor or Lovable. These IDEs use AI code generation at the project and code block level, and can generate code in different languages (Python, SQL, JavaScript etc,). They have the ability to generate test code, comment code, and make code PEP8 complaint. It’s not just one-off code gen, it’s applying AI to the whole code development process.

(Screen shot of Cursor used in this project.)

Using a deployment platform like Render or Vercel means deployment can be extremely fast. Data scientists don’t have deployment skills, but these products are straightforward enough that some written guidance should be enough. 

Deployment platforms retrieve code from Git-based systems (e.g., GitHub, GitLab etc.), so data scientists need some familiarity with them. Desktop tools (like GitHub Desktop) make it easier, but they have to be used, which is a process and management issue.

Skillsets and training

The skillset needed is the same as a full-stack engineer with a few tweaks, which is a challenge for data scientists who mostly lack some of the key skills. Here are the skillsets, level needed, and training required for data scientists.

  • Hands-on experience with AI code generation and AI-enabled IDE.
    • What’s needed:
      • Ability to appropriately use code gen at the project and code-block levels. This could be with Cursor, Claude Code, or something similar.
      • Understanding code gen strengths and weaknesses and when not to use it.
      • Experience developing code using an IDE.
    • Training: 
      • To get going, an internal training session plus a series of exercises would be a good choice.
      • At the time of writing, there are no good off-the-shelf courses.
  • Python
    • What’s needed:
      • Decent Python coding skills, including the ability to write functions appropriately (data scientists sometimes struggle here).
      • Django uses inheritance and function decorators, so understanding these properties of Python is important. 
      • Use of virtual environments.
    • Training:
      • Most data scientists have “good enough” Python.
      • The additional knowledge should come from a good advanced Python book. 
      • Consider using experienced software engineers to train data scientists in missing skills, like decomposing tasks into functions, PEP8 and so on.
  • SQL and building a database
    • What’s needed:
      • Create databases, create tables, insert data into tables, write queries.
    • Training:
      • Most data scientists have “good enough” SQL.
      • Additional training could be a books or online tutorials.
  • Django
    • What’s needed:
      • An understanding of Django’s architecture and how it works.
      • The ability to build an app in Django.
    • Training:
      • On the whole, data scientists don’t know Django.
      • The training provided by a short course or a decent text book should be enough.
      • Writing a couple of simple Django apps by hand should be part of the training.
      • This may take 40 hours.
  • JavaScript
    • What’s needed:
      • Ability to work with functions (including callbacks), variables, and arrays.
      • Ability to debug JavaScript in the browser.
      • These skills are needed to add and debug UI widgets. Code generation isn't enough.
    • Training:
      • A short course (or a reasonable text book) plus a few tutorial examples will be enough.
  • HTML and CSS
    • What’s needed:
      • A low level of familiarity is enough.
    • Training:
      • Tutorials on the web or a few YouTube videos should be enough.
  • Git
    • What’s needed:
      • The ability to use Git-based source control systems. 
      • It's needed because deployment platforms rely on code being on Git.
    • Training:
      • Most data scientists have a weak understanding of Git. 
      • A hands-on training course would be the most useful approach.

Code gen is not one-size-fits-all

AI code gen is a tremendous productivity boost and enabler in many areas but not all. For key tasks, like database design and app deployment, AI code gen doesn’t help at all. In other areas, for example, complex database/dataframe manipulations and handling some advanced UI issues, AI helps somewhat but it needs substantial guidance. The AI coding productivity benefit is a range from negative to greatly positive depending on the task. 

The trick is to use AI code gen appropriately and provide adult supervision. This means reviewing what AI produces and intervening. It means knowing when to stop prompting and when to start coding.

Recommendations before attempting rapid application development

  • Make sure your team have the skills I’ve outlined above, either individually or collectively.
  • Use the right tools in the right way.
  • Don’t set unreasonable expectation, understand that your first attempts will be slow as you learn.
  • Run a pilot project or two with loose deadlines. From the pilot project, codify the lessons and ways of working. Focus especially on AI code gen and deployment.

How I learned rapid development: my pathfinder app

For this project, I chose to build an app that analyzes the results of English League Football (soccer) games since the league began in 1888 to the most recently completed season (2024-2025). 

The data set is quite large, which means a database back end. The database will need multiple tables.

It’s a very chart-heavy app. Some of the charts are violin plots that need kernel density estimation, and I’ve added curve fitting and confidence intervals on some line plots. That’s not the most sophisticated data analysis, but it’s enough to prove a point about the use of data science methods in apps. Notably, charts are not covered in most Django texts.

(Just one of the plots from my app. Note the year slider at the bottom.)

In several cases, the charts need widgets: sliders to select the year and radio buttons to select different leagues. This means either using ‘native’ JavaScript or libraries specific to the charting tool (Bokeh). I chose to use native JavaScript for greater flexibility.

To get started, I roughly drew out what I wanted the app to look like. This included different themed analysis (trends over time, goal analysis, etc.) and the charts I wanted. I added widgets to my design where appropriate.

The stack

Here’s the stack I used for this project.

Django was the web framework, which means it handles incoming and outgoing data, manages users, and manages data. Django is very mature, and is very well supported by AI code generation (in particular, Cursor). Django is written in Python.

Postgres. “Out of the box”, Django supports SQLite, but Render (my deployment solution) requires Postgres. 

Bokeh for charts. Bokeh is a Python plotting package that renders its charts in a browser (using HTML and JavaScript). This makes it a good choice for this project. An alternative is Altair, but my experience is that Bokeh is more mature and more amenable to being embedded in web pages.

JavaScript for widgets. I need to add drop down boxes, radio buttons, sliders, and tabs etc. I’ll use whatever libraries are appropriate, but I want code gen to do most of the heavy lifting.

Render.com for deployment. I wanted to deploy my project quickly, which means I don’t want to build out my own deployment solution on AWS etc., I want something more packaged.

I used Cursor for the entire project.

The build process and issues

Building the database

My initial database format gave highly complicated Django models that broke Django’s ORM. I rebuilt the database using a much simpler schema. The lesson here is to keep the database reasonably close to the format in which it will be displayed. 

My app design called for violin plots of attendance by season and by league tier. This is several hundred plots. Originally, I was going to calculate the kernel density estimates for the violin plots at run time, but I decided it would slow the application down too much, so I calculated them beforehand and saved them to a database table. This is a typical trade-off.

For this part of the process, I didn’t find code generation useful.

The next stage was uploading my data to the database. Here, I found code generation very useful. It enabled me to quickly create a Python program to upload data and check the database for consistency.

Building Django

Code gen was a huge boost here. I gave Cursor a markdown file specifying what I wanted and it generated the project very quickly. The UI wasn’t quite what I wanted, but by prompting Cursor, I was able to get it there. It let me create and manipulate dropdown boxes, tabs, and widgets very easily – far, far faster than hand coding. I did try and create a more detailed initial spec, but I found that after a few pages of spec, code generation gets worse; I got better results by an incremental approach.

(One part of the app, a dropdown box and menu. Note the widget and the entire app layout was AI code generated.)

The simplest part of the project is a view of club performance over time. Using a detailed prompt, I was able to get all of the functionality working using only code gen. This functionality included dropdown selection box, club history display, league over time, matches played by season. It needed some tweaks, but I did the tweaks using code gen. Getting this simple functionality running took an hour or two.

Towards the end of the project, I added an admin panel for admin users to create. edit, and delete "ordinary" users. With code gen, This took less than half an hour, including bug fixes and UI tweaks.

For one UI element, I needed to create an API interface to supply JSON rather than HTML. Code gen let me create it in seconds.

However, there were problems.

Code gen didn’t do well with generating Bokeh code for my plots and I had to intervene to re-write the code.

It did even worse with retrieving data from Django models. Although I aligned my data as closely as I could to the app, it was still necessary to aggregate data. I found code generation did a really poor job and the code needed to be re-written. Code gen was helpful to figure out Django’s model API though.

In one complex case, I needed to break Django’s ORM and make a SQL call directly to the database. Here, code gen worked correctly on the first pass, creating good-quality SQL immediately.

My use of code gen was not one-and-done, it was an interactive process. I used code generation to create code at the block and function level.

Bokeh

My app is very chart heavy, having more than 10 charts and there aren't that many examples of this type of app that I could find. This means that AI code gen doesn't have much to learn from. 

(One of the Bokeh charts. Note the interactive controls on the right of the plot and the fact the plot is part of a tabbed display.)

Code gen didn’t do well with generating Bokeh code for my plots and I had to intervene to re-write code.

I needed to access the Bokeh chart data from the widget callbacks and update the charts with new data (in JavaScript). This involved a building a JSON API, which code gen created very easily. Sadly, code gen had a much harder time with the JavaScript callback. It’s first pass was gibberish and refining the prompt didn’t help. I had to intervene and ask for code gen on a code block-by-block basis. Even then, I had to re-write some lines of code. Unless the situation changes, my view is, code generation for this kind of problem is probably limited to function definition and block-by-block code generation, with hand coding to correct/improve issues.

(Some of the hand-written code. Code gen couldn't create this.)

Render

By this stage, I had an app that worked correctly on my local machine. The final step was deployment so it would be accessible on the public internet. The sundai.club and others, use Render.com and other similar services to rapidly deploy their apps, so I decided to use the free tier of Render.com.

Render’s free tier is good enough for demo purposes, but it isn’t powerful enough for a commercial deployment (which is fair); that's why I’m not linking to my app in this blog post: too much traffic will consume my free allowance.

Unlike some of its competitors, Render uses Postgres rather than SQLite as its database, hence my choice of Postgres. This means deployment is in two stages:

  • Get the database deployed.
  • Linking the Django app to the database and deploy it.

The process was more complicated than I expected and I ran into trouble. The documentation wasn’t as clear as it needed to be, which didn’t help. The consistent advice in the Render documentation was to turn off debug. This made diagnosing problems almost impossible. I turned debug on and fixed my problems quickly. 

To be clear: code gen was of no help whatsoever.

(Part of Render's deployment screen.)

However, it’s my view this process could be better documented and subsequent deployments could go very smoothly.

General comments about AI code generation

  • Typically, many organizations require code to pass checks (linting, PEP8, test cases etc.) before the developer can check it into source control. Code generation makes it easier and faster to pass these checks. Commenting and code documentation is also much, much faster. 
  • Code generation works really well for “commodity” tasks and is really well-suited to Django. It mostly works well with UI code generation, provided there’s not much complexity.
  • It doesn’t do well with complex data manipulations, although its SQL can be surprisingly good.
  • It doesn’t do well with Bokeh code.
  • It doesn’t do well with complex UI callbacks where data has to be manipulated in particular ways.

Where my app ended up

End-to-end, it took about two weeks, including numerous blind alleys, restarts, and time spent digging up answers. Knowing what I know now, I could probably create an app of this complexity in less than 5 days, fewer still with more people.

My app has multiple pages, with multiple charts on each page (well over 10 charts in total). The chart types include violin plots, line charts, and heatmaps. Because they're Bokeh charts, my app has built-in chart interactivity. I have widgets (e.g., sliders, radio buttons) controlling some of the charts, which communicate back to the database to update the plots. Of course, I also have Django's user management features.

Discussion

There were quite a few surprises along the way in this project: I had expected code generation to do better with Bokeh and callback code, I’d expected Render to be easier to use, and I thought the database would be easier to build. Notably, the Render and database issues are learning issues; it’s possible to avoid these costs on future projects. 

I’ve heard some criticism of code generated apps from people who have produced 70% or even 80% of what they want, but are unable to go further. I can see why this happens. Code gen will only take you so far, and will produce junk under some circumstances that are likely to occur with moderately complex apps. When things get tough, it requires a human with the right skills to step in. If you don’t have the right skills, your project stalls. 

My goal with this project was to figure out the skills needed for rapid application development and deployment. I wanted to figure out the costs of enabling a data science team to build their own apps. What I found is the skill set needed is the skill set of a full-stack engineer. In other words, rapid development and deployment is firmly in the realm of software engineers and not data scientists. If data scientists want to build apps, there's a learning curve and a leaning cost. Frankly, I'm coming round to the opinion that data scientists need a broader software skill set.

For a future version of this project, I would be tempted to split off the UI entirely. The Django code would be entirely a JSON server, accessed through the API. The front end would be in Next.js. This would mean having charting software entirely in JavaScript. Obviously, there's a learning curve cost here, but I think it would give more consistency and ultimately an easier to maintain solution. Once again, it points to the need for a full-stack skill set.

To make this project go faster next time, here's what I would do:

  • Make the database structure reasonably close to how data is to be displayed. Don't get too clever and don't try to optimize it before you begin.
  • Figure out a way to commoditize creating charts and updating them through a JavaScript callback. The goal is of course to make the process more amenable to code generation. 
  • Related to charts, figure out a better way of using the ORM to avoid using SQL for more complex queries. Figure out a way to get better ORM code generation results.
  • Document the Render deployment process and have a simple checklist or template code.

Bottom line: it’s possible to do rapid application development and deployment with the right approach, the right tools, and using code gen correctly. Training is key.

Using the app

I want to tinker with my app, so I don't want to exhaust my Render free tier. If you'd like to see my app, drop me a line (https://www.linkedin.com/in/mikewoodward/) and I'll grant you access.

If you want to see my app code, that's easier. You can see it here: https://github.com/MikeWoodward/English-Football-Forecasting/tree/main/5%20Django%20app