Engora Data Blog: ChatGPT and code generation: be careful

Tuesday, July 25, 2023

ChatGPT and code generation: be careful

I've heard bold pronouncements that Large Language Models (LLMs), and ChatGPT in particular, will greatly speed up software development with all kinds of consequences. Most of these pronouncements seem to come from 'armchair generals' who are a long way from writing code. I'm going to chime in with my real-world experiences and give you a more realistic view.

D J Shin, CC BY-SA 3.0, via Wikimedia Commons

I've used ChatGPT to generate Python code to solve some small-scale problems. These are things like using an API or doing some simple statistical analysis or chart plotting. Recently, I've branched out to more complex problems, which is where its limitations become more obvious.

In my experience, ChatGPT is excellent for generating code for small problems. It might not solve the problem completely, but it will automate most of the boring pieces and give you a good platform to get going. The code it generates is good with some exceptions. It doesn't generate doc strings for functions, it's light on comments, and it doesn't always follow PEP8 layout, but it does lay out its code clearly and it uses functions well. The supporting documentation it creates is great, in fact, it's much better than the documentation most humans produce.

For larger problems, it falls down, sometimes badly. I gave it a brief to create code to demonstrate the Central Limit Theorem (CLT) using Bokeh charts with several underlying distributions. Part of the brief it did well and it clearly understood how to demonstrate the CLT, but there were problems I had to fix. It generated code for an out-of-date version of Bokeh which required some digging and coding to fix; this could have been cured by simply adding comments about the versions of libraries it was using. It also chose some wrong variable names (it used the reverse of what I would have chosen). More importantly, it did some weird and wrong things with the data at the end of the process, I spotted its mistake in a few minutes and spent 30 minutes rewriting code to correct it. I had similar problems with other longer briefs I gave ChatGPT.

Obviously, the problems I encountered could have been due to incomplete or ambiguous briefs. A solution might have been to spend time refining my brief until it gave me the code I wanted, but that may have taken some time. Which would have been faster, writing new detailed briefs or fixing code that was only a bit wrong?

More worryingly, I spotted what was wrong because I knew the output I expected. What if this had been a new problem where I didn't know what the result should look like?

After playing around with ChatGPT for a while, here are my takeaways:

ChatGPT code generation is about the level of a good junior programmer.
You should use it as a productivity boost to automate the boring bits of coding, a jump start.
Never trust the code and always check what it's doing. Don't use it when you don't know what the result should look like.

Obviously, this is ChatGPT today and the technology isn't standing still. I would expect future versions to improve on commenting etc. What will be harder is the brief. The problem here isn't the LLM, it's with the person writing the brief. English is a very imperfect language for detailed specifications which means we're stuck with ambiguities. I might write what I think is the perfect brief, only to find out I've been imprecise or ambiguous. Technology change is unlikely to fix this problem in the short term.

Of course, other industries have gone through similar disruptive changes in the past. The advent of CAD/CAM didn't mean the end of factory work, it raised productivity at the expense of requiring a higher skill set. The workers with the higher skillset gained, and those with a lesser skillset lost out.

In my view, here's how things are likely to evolve. LLMs will become standard tools to aid data scientists and software developers. They'll be productivity boosters that will require a high skill set to use. The people most negatively impacted will be junior staff or the less skilled, the people who gain the most will be those with experience and a high skill level.

Engora Data Blog

Tuesday, July 25, 2023

ChatGPT and code generation: be careful

No comments:

Post a Comment