Introduction
If you want to rapidly build and deploy apps with a data science team, this blog post is written for you.
(Canva)
I’ve seen how small teams of MIT and Harvard students at the sundai.club in Boston are able to produce functioning web apps in twelve hours. I want to understand how they’re doing it, adapt what they’re doing for business, and create data science heavy apps very quickly. This blog post is about what I’ve learned.
Almost all of the sundai.club projects use an LLM as part of their project (e.g., using agentic systems to analyze health insurance denials), but that’s not how they’re able to build so quickly. They get development speed through code generation, the appropriate use of tools, and the use of deployment technologies like Vercel or Render.
Inspired by what I’ve seen, I developed a pathfinder project to learn how to do rapid development and deployment using AI code gen and deployment tools. My goal was to find out:
- The skills needed and the depth to which they’re needed.
- Major stumbling blocks and coping strategies.
- The process to rapidly build apps.
I'm going to share what I've learned in this blog post.
Summary of findings
Process is key
Rapid development relies on having three key elements in place:
- Using the right tools.
- Having the right skill-set.
- Using AI code gen correctly.
Tools
Fast development must use these tools:
- AI-enabled IDE.
- Deployment platform like Render or Vercel.
- Git.
Data scientists tend to use notebooks and that’s a major problem for rapid development; notebook-based development isn’t going to work. Speed requires the consistent use of AI-enabled IDEs like Cursor or Lovable. These tools use AI code generation at the project and code block level, and can generate code in different languages (Python, SQL, JavaScript etc,). They have the ability to generate test code, comment code, and make code PEP8 complaint. It’s not just one-off code gen, it’s applying AI to the whole code development process.
Using a deployment platform like Render or Vercel means deployment can be extremely fast. Data scientists don’t have deployment skills, but these products are straightforward enough that some written guidance should be enough.
Deployment platforms retrieve code from Git-based systems (e.g., GitHub, GitLab etc.), so data scientists need some familiarity with them. Desktop tools (like GitHub Desktop) make it easier, but they have to be used, which is a process and management issue.
Skillsets and training
The skillset needed is the same as a full-stack engineer with a few tweaks. This is a challenge for data scientists who mostly lack some of the key skills. Here are the skill sets, levels needed, and training required for data scientists.
- Hands-on experience with AI code generation and AI-enabled IDE.
- What’s needed:
- Ability to appropriately use code gen at the project and code-block levels. This could be with Cursor, Claude Code, or something similar.
- Understanding code gen strengths and weaknesses and when not to use it.
- Experience developing code using an IDE.
- Training:
- To get going, an internal training session plus a series of exercises would be a good choice.
- At the time of writing, there are no good off-the-shelf courses.
- Python
- What’s needed:
- Decent Python coding skills, including the ability to write functions appropriately (data scientists sometimes struggle here).
- Django uses inheritance and function decorators, so understanding these properties of Python is important.
- Use of virtual environments.
- Training:
- Most data scientists have “good enough” Python.
- The additional knowledge should come from a good advanced Python book.
- Consider using experienced software engineers to train data scientists in missing skills, like decomposing tasks into functions, PEP8 and so on.
- SQL and building a database
- What’s needed:
- Create databases, create tables, insert data into tables, write queries.
- Training:
- Most data scientists have “good enough” SQL.
- Additional training could be a books or online tutorials.
- Django
- What’s needed:
- An understanding of Django’s architecture and how it works.
- The ability to build an app in Django.
- Training:
- On the whole, data scientists don’t know Django.
- The training provided by a short course or a decent text book should be enough.
- Writing a couple of simple Django apps by hand should be part of the training.
- This may take 40 hours.
- JavaScript
- What’s needed:
- Ability to work with functions (including callbacks), variables, and arrays.
- Ability to debug JavaScript in the browser.
- Training:
- A short course (or a reasonable text book) plus a few tutorial examples will be enough.
- HTML and CSS
- What’s needed:
- A low level of familiarity is enough.
- Training:
- Tutorials on the web or a few YouTube videos should be enough.
- Git
- What’s needed:
- The ability to use Git-based source control systems.
- Training:
- Most data scientists have a weak understanding of Git. Because deployment platforms rely on code being on Git, using Git is essential.
- A hands-on training course would be the most useful approach here.
Code gen is not one-size-fits-all
AI code gen is a tremendous productivity boost and enabler in many areas but not all. For key tasks, like database design and app deployment, AI code gen doesn’t help at all. In other areas, for example, complex database/dataframe manipulations and handling some advanced UI issues, AI helps somewhat but it needs substantial guidance. The AI coding productivity benefit is a range from negative to greatly positive depending on the task.
The trick is to use AI code gen appropriately and provide active (adult) supervision. This means reviewing what AI produces and intervening. It means knowing when to stop prompting and when to start coding.
Recommendations before attempting rapid application development
- Make sure your team have the skills I’ve outlined above, either individually or collectively.
- Use the right tools in the right way.
- Don’t set unreasonable expectation, understand that your first attempts will be slow as you learn.
- Run a pilot project or two with loose deadlines. From the pilot project, codify the lessons and ways of working. Focus especially on AI code gen and deployment.
How I learned rapid development: my pathfinder app
For this project, I chose to build an app that analyzes the results of English League Football (soccer) games since the league began in 1888 to the most recently completed season (2024-2025).
The data set is quite large, which means a database back end. The database will need multiple tables.
It’s a very chart-heavy app. Some of the charts are violin plots that need kernel density estimation, and I’ve added curve fitting and confidence intervals on some line plots. That’s not the most sophisticated data analysis, but it’s enough to prove a point about the use of data science methods in apps. Notably, charts are not covered in most Django texts.
In several cases, the charts need widgets: sliders to select the year and radio buttons to select different leagues. This means either using ‘native’ JavaScript or libraries specific to the charting tool (Bokeh). I chose to use native JavaScript for greater flexibility.
To get started, I roughly drew out what I wanted the app to look like. This included different themed analysis (trends over time, goal analysis, etc.) and the charts I wanted. I added widgets to my design where appropriate.
The stack
Here’s the stack I used for this project.
Django was the web framework, which means it handles incoming and outgoing data, manages users, and manages data. Django is very mature, and is very well supported by AI code generation (in particular, Cursor). Django is written in Python.
Postgres. “Out of the box”, Django supports SQLite, but Render (my deployment solution) requires Postgres.
Bokeh for charts. Bokeh is a Python plotting package that renders its charts in a browser (using HTML and JavaScript). This makes it a good choice for this project. An alternative is Altair, but my experience is that Bokeh is more mature and more amenable to being embedded in web pages.
JavaScript for widgets. I need to add drop down boxes, radio buttons, sliders, and tabs etc. I’ll use whatever libraries are appropriate, but I want code gen to do most of the heavy lifting.
Render.com for deployment. I wanted to deploy my project quickly, which means I don’t want to build out my own deployment solution on AWS etc., I want something more packaged.
I used Cursor for the entire project.
The build process and issues
Building the database
My initial database format gave highly complicated Django models that broke Django’s ORM. I rebuilt the database using a much simpler schema. The lesson here is to keep the database reasonably close to the format in which it will be displayed.
My app design called for violin plots of attendance by season and by league tier. This is several hundred plots. Originally, I was going to calculate the kernel density estimates for the violin plots at run time, but I decided it would slow the application down too much, so I calculated them beforehand and saved them to a database table. This is a typical trade-off.
For this part of the process, I didn’t find code generation useful.
The next stage was uploading my data to the database. Here, I found code generation very useful. It enabled me to quickly create a Python program to upload data and check the database for consistency.
Building Django
Code gen was a huge boost here. I gave Cursor a markdown file specifying what I wanted and it generated the project very quickly. The UI wasn’t quite what I wanted, but by prompting Cursor, I was able to get it there. It let me create and manipulate dropdown boxes, tabs, and widgets very easily – far, far faster than hand coding. I did try and create a more detailed initial spec, but I found that after a few pages of spec, code generation gets worse; I got better results by an incremental approach.
(One part of the app, a dropdown box and menu. Note the widget and the entire app layout was AI code generated.)
For one UI element, I needed to create an API interface to supply JSON rather than HTML. Code gen let me create it in seconds.
However, there were problems.
Code gen didn’t do well with generating Bokeh code for my plots and I had to intervene to re-write the code.
It did even worse with retrieving data from Django models. Although I aligned my data as closely as I could to the app, it was still necessary to aggregate data. I found code generation did a really poor job and the code needed to be re-written. Code gen was helpful to figure out Django’s model API though.
In one complex case, I needed to break Django’s ORM and make a SQL call directly to the database. Here, code gen worked correctly on the first pass, creating good-quality SQL immediately.
My use of code gen was not one-and-done, it was an interactive process. I used code generation to create code at the block and function level.
Bokeh
My app is very chart heavy, having more than 10 charts and there aren't that many examples of this type of app that I could find. This means that AI code gen doesn't have much to learn from.
Code gen didn’t do well with generating Bokeh code for my plots and I had to intervene to re-write code.
I needed to access the Bokeh chart data from the widget callbacks and update the charts with new data (in JavaScript). This involved a building a JSON API, which code gen created very easily. Sadly, code gen had a lot harder issue with the JavaScript callback. It’s first pass was gibberish and refining the prompt didn’t help. I had to intervene and ask for code gen on a code block-by-block basis. Even then, I had to re-write some lines of code. Unless the situation changes, my view is, code generation for this kind of problem is probably limited to function definition and block-by-block code generation, with hand coding to correct/improve issues.
Render
By this stage, I had an app that worked correctly on my local machine. The final step was deployment so it would be accessible on the public internet. The sundai.club and others, use Render.com and other similar services to rapidly deploy their apps. I decided to use the free tier of Render.com.
Render’s free tier is good enough for demo purposes, but it isn’t powerful enough for a commercial deployment (which is fair); that's why I’m not linking to my app in this blog post: too much traffic will consume my free allowance.
Unlike some of its competitors, Render uses Postgres rather than SQLite as its database, hence my choice of Postgres. This means deployment is in two stages:
- Get the database deployed.
- Linking the Django app to the database and deploy it.
This process was more complicated than I expected and I ran into trouble. The documentation wasn’t as clear as it needed to be, which didn’t help. The consistent advice in the Render documentation was to turn off debug. This made diagnosing problems almost impossible. I turned debug on and fixed my problems very quickly. To be clear: code gen was of no help whatsoever.
However, it’s my view this process could be better documented and subsequent deployments could go very smoothly.
General comments about AI code generation
- Typically, many organizations require code to pass checks (linting, PEP8, test cases etc.) before the developer can check it into source control. Code generation makes it easier and faster to pass these checks. Commenting and code documentation is also much, much faster.
- Code generation works really well for “commodity” tasks and is really well-suited to Django. It mostly works well with UI code generation, provided there’s not much complexity.
- It doesn’t do well with complex data manipulations, although its SQL can be surprisingly good.
- It doesn’t do well with Bokeh code.
- It doesn’t do well with complex UI callbacks where data has to be manipulated in particular ways.
Where my app ended up
End-to-end, it took about two weeks, including numerous blind alleys, restarts, and time spent digging up answers. Knowing what I know now, I could probably create an app of this complexity in less than 5 days, less with more people.
My app has multiple pages, with multiple charts on each page (well over 10 charts in total). The chart types include violin plots, line charts, and heatmaps. Because they're Bokeh charts, we have built-in chart interactivity. I have widgets (e.g., sliders, radio buttons) controlling some of the charts, which communicate back to the database to update the plots. Of course, I also have Django's user management features.
Discussion
There were quite a few surprises along the way in this project: I had expected code generation to do better with Bokeh and callback code, I’d expected Render to be easier to use, and I thought the database would be easier to build. Notably, the Render and database issues are learning issues; it’s possible to avoid these costs on future projects.
I’ve heard some criticism of code generated apps from people who have produced 70% or even 80% of what they want, but are unable to go further. I can see why this happens. Code gen will only take you so far, and will produce junk under some circumstances that are likely to occur with moderately complex apps. When things get tough, it requires a human with the right skills to step in. If you don’t have the right skills, your project stalls.
Bottom line: it’s possible to do rapid application development and deployment with the right approach, the right tools, and using code gen correctly. Training is key.
Using the app
I want to tinker with my app, so I don't want to exhaust my Render free tier. If you'd like to see my app, drop me a line and I'll grant you access.









No comments:
Post a Comment