Monday, December 1, 2025

Some musings on code generation: kintsugi

Hype and reality

I've been using AI code generation (Claude, Gemini, Cursor...) for months and I'm familiar with its strengths and weaknesses. It feels like I've gone through whole the hype cycle (see https://en.wikipedia.org/wiki/Gartner_hype_cycle) and now I'm firmly on the Plateau of Productivity. Here are some musings covering benefits, disappointments, and a way forward.

(The Japanese art of Kintsugi. Image by Gemini.)

Benefits

Elsewhere, people have waxed lyrical about the benefits of code generation, so I'm just going to add in a few novel points.

It's great when you're unfamiliar with an area of a language; it acts as a prompt or tutorial. In the past, you'd have to wade through pages of documentation and write code to experiment. Alternatively, you could search to see if anyone's tackled your problem and has a solution. If you were really stuck, you could try and ask a question on Stack Overflow and deal with the toxicity. Now, you can get something to get you going quickly.

Modern code development requires properly commenting code, making sure code is "linted" and PEP8 compliant, and creating test cases etc. While these things are important, they can consume a lot of time. Code generation steps on the accelerator pedal and makes them go much faster. In fact, code gen makes it quite reasonable to raise the bar on code quality.

Disappointments

Pandas dataframes

I've found code gen really doesn't do well manipulating Pandas dataframes. Several times, I've wanted to transform dataframes or do something non-trivial, for example, aggregating data, merging dataframes, transforming a column in some complex way and so on. I've found the generated code to either be wrong or really inefficient. In a few cases, the code was wrong, but in a way that was hard to spot; subtly bugs are costly to fix.

Bloated code

This is something other people have said to me too. Sometimes generated code is really bloated. I've had cases where what should have been a single line of code gets turned into 20 or more lines. Some of it is "well-intentioned", meaning lots of error trapping. But sometimes it's just a poor implementation. Bloated code is harder to maintain and slower in execution.

Django

It took me a while to find the problems with Django code gen. On the whole, code gen for Django works astonishingly well, it's one of the huge benefits. But, I've found the generated code to be inefficient in several ways:

  • The model manipulations have sometimes been odd or poor implementations. A more thoughtful approach to aggregation can make the code more readable and faster.
  • If the network connection is slow or backend computations take some time, a page can take a long time to even start to render. A better approach involves building the page so the user sees something quickly and then adding other elements as they become available. Code gen doesn't do this "out of the box".
  • UI layout can sometimes take a lot of prompting to get right. Mostly, it works really well, but occasionally, code gen finds something it really, really struggles with. Oddly, I've found it relatively easy to fix these issues by hand.

JavaScript oddities

Most of my work is in Python, but occasionally, I've wandered into JavaScript to build apps. I don't know a lot of JavaScript, and that's been the problem, I've been slow to spot code gen wrongness.

My projects have widgets and charts and I found the JavaScript callbacks and code were overcomplicated and bloated. I re-wrote the code to be 50% shorter and much clearer. It cost me some effort to come up to speed with JavaScript to spot and fix things.

Oddly, I found hallucination more of a problem for JavaScript than Python. My code gen system hallucinated the need to include an external CSS that didn't exist and wasn't needed. Code gen also hallucinated "standard" functions that weren't available (that was nice to debug).

Similar to my Python experience, I found code gen to be really bad at manipulating data objects. In a few cases, it would give me code that was flat out wrong.

Unpopular code

If you're using libraries that have been extensively used by others (e.g. requests, Django, etc.), code gen is mostly good. But when you're using libraries that are a little "off the beaten path", I've found code generation really drops down in quality. In a few cases, it's pretty much unusable.

A way forward through the trough of disappointment

It's possible that more thorough prompting might solve some of these problems, but I'm not entirely convinced. I've found that code generation often doesn't do well with very, very detailed and long prompting. Here's what I think is needed.

Accepting that code generation is flawed and needs adult supervision. It's a tool, not a magic wand. The development process must include checks the code is correct.

Proper training. You need to spot when it's gone wrong and you need to intervene. This means knowing the languages you're code generating. I didn't know JavaScript well enough and I paid the price.

Libraries to learn from and use. Code gen learns from your codebase, but this isn't enough, especially if you're doing something new, and it can also mean code gen is learning the wrong things. Having a library means code gen isn't re-inventing the wheel each time.

In a corporate setting, all this means having thoughtful policies and practices for code gen and code development. Code gen is changing rapidly, which means policies and practices will need to be updated every six months, or when you learn something new.

Kintsugi

Kintsugi is the Japanese art of taking something broken (e.g., a pot or a vase) and mending it in a way that both acknowledges its brokenness and makes it more beautiful. Code generation isn't broken, but it can be made a lot more useful with some careful thought and acknowledging its weaknesses.