Showing posts with label AI. Show all posts
Showing posts with label AI. Show all posts

Monday, November 17, 2025

Data scientists need to learn JavaScript

Moving quickly

Over the last few months, I've become very interested in rapid prototype development for data science projects. Here's the key question I asked myself: how can a data scientist build their own app as quickly as possible? Nowadays, speed means code gen, but that's only part of the solution.

The options

The obvious quick development path is using Streamlit; that doesn't require any new skills because it's all in Python. Streamlit is great, and I've used it extensively, but it only takes you so far and it doesn't really scale. Streamlit is really for internal demos, and it's very good at that.

The more sustainable solution is using Django. It's a bigger and more complex beast, but it's scalable. Django requires Python skills, which is fine for most data scientists. Of course, Django apps are deployed on the web and users access them as web pages.

The UI is one place code gen breaks down under pressure

Where things get tricky is adding widgets to Django apps. You might want your app to take some action when the user clicks a button, or have widgets controlling charts etc. Code gen will nicely provide you with the basics, but once you start to do more complicated UI tasks, like updating chart data, you need to write JavaScript or be able to correct code gen'd JavaScript.

(As an aside, for my money, the reason why a number of code gen projects stall is because code gen only takes you so far. To do anything really useful, you need to intervene, providing detailed guidance, and writing code where necessary. This means JavaScript code.)

JavaScript != Python

JavaScript is very much not Python. Even a cursory glance will tell you the JavaScript syntax is unlike Python. More subtly, and more importantly, some of the underlying ideas and approaches are quire different. The bottom line is, a Python programmer is not going to write good enough JavaScript without training.

To build even a medium complexity data science app, you need to know how JavaScript callbacks work, how arrays work, how to debug in the browser, and so on. Because code gen is doing most of the heavy lifting for you, you don't need to be a craftsman, but you do need to be a journeyman.

What data scientists need to do

The elevator pitch is simple:

  • If you want to build a scalable data science app, you need to use Django (or something like it).
  • To make the UI work properly, code gen needs adult supervision and intervention.
  • This means knowing JavaScript.
(Data Scientist becoming JavaScript programmer. Gemini.)

In my view, all that's needed here is a short course, a good book, and some practice. A week should be enough time for an experienced Python programmer to get to where they need to be.

What skillset should data scientists have?

AI is shaking everything up, including data science. In my view, data scientists will have to do more than their "traditional" role. Data scientists who can turn their analysis into apps will have an advantage. 

For me, the skillset a data scientist will need looks a lot like the skillset of a full-stack developer. This means data scientists knowing a bit of JavaScript, code gen, deployment technologies, and so on. They won't need to be experts, but they will need "good enough" skills.

Wednesday, November 12, 2025

How to rapidly build and deploy data science apps using code gen

Introduction

If you want to rapidly build and deploy apps with a data science team, this blog post is written for you.

(Canva)

I’ve seen how small teams of MIT and Harvard students at the sundai.club in Boston are able to produce functioning web apps in twelve hours. I want to understand how they’re doing it, adapt what they’re doing for business, and create data science heavy apps very quickly. This blog post is about what I’ve learned.

Almost all of the sundai.club projects use an LLM as part of their project (e.g., using agentic systems to analyze health insurance denials), but that’s not how they’re able to build so quickly. They get development speed through code generation, the appropriate use of tools, and the use of deployment technologies like Vercel or Render. 

(Building prototypes in 12 hours: the inspiration for this blog post.)

Inspired by what I’ve seen, I developed a pathfinder project to learn how to do rapid development and deployment using AI code gen and deployment tools. My goal was to find out:

  • The skills needed and the depth to which they’re needed.
  • Major stumbling blocks and coping strategies.
  • The process to rapidly build apps.

I'm going to share what I've learned in this blog post. 

Summary of findings

Process is key

Rapid development relies on having three key elements in place:

  • Using the right tools.
  • Having the right skill set.
  • Using AI code gen correctly.

Tools

Fast development must use these tools:

  • AI-enabled IDE.
  • Deployment platform like Render or Vercel.
  • Git.

Data scientists tend to use notebooks and that’s a major problem for rapid development; notebook-based development isn’t going to work. Speed requires the consistent use of AI-enabled IDEs like Cursor or Lovable. These IDEs use AI code generation at the project and code block level, and can generate code in different languages (Python, SQL, JavaScript etc,). They have the ability to generate test code, comment code, and make code PEP8 complaint. It’s not just one-off code gen, it’s applying AI to the whole code development process.

(Screen shot of Cursor used in this project.)

Using a deployment platform like Render or Vercel means deployment can be extremely fast. Data scientists don’t have deployment skills, but these products are straightforward enough that some written guidance should be enough. 

Deployment platforms retrieve code from Git-based systems (e.g., GitHub, GitLab etc.), so data scientists need some familiarity with them. Desktop tools (like GitHub Desktop) make it easier, but they have to be used, which is a process and management issue.

Skillsets and training

The skillset needed is the same as a full-stack engineer with a few tweaks, which is a challenge for data scientists who mostly lack some of the key skills. Here are the skillsets, level needed, and training required for data scientists.

  • Hands-on experience with AI code generation and AI-enabled IDE.
    • What’s needed:
      • Ability to appropriately use code gen at the project and code-block levels. This could be with Cursor, Claude Code, or something similar.
      • Understanding code gen strengths and weaknesses and when not to use it.
      • Experience developing code using an IDE.
    • Training: 
      • To get going, an internal training session plus a series of exercises would be a good choice.
      • At the time of writing, there are no good off-the-shelf courses.
  • Python
    • What’s needed:
      • Decent Python coding skills, including the ability to write functions appropriately (data scientists sometimes struggle here).
      • Django uses inheritance and function decorators, so understanding these properties of Python is important. 
      • Use of virtual environments.
    • Training:
      • Most data scientists have “good enough” Python.
      • The additional knowledge should come from a good advanced Python book. 
      • Consider using experienced software engineers to train data scientists in missing skills, like decomposing tasks into functions, PEP8 and so on.
  • SQL and building a database
    • What’s needed:
      • Create databases, create tables, insert data into tables, write queries.
    • Training:
      • Most data scientists have “good enough” SQL.
      • Additional training could be a books or online tutorials.
  • Django
    • What’s needed:
      • An understanding of Django’s architecture and how it works.
      • The ability to build an app in Django.
    • Training:
      • On the whole, data scientists don’t know Django.
      • The training provided by a short course or a decent text book should be enough.
      • Writing a couple of simple Django apps by hand should be part of the training.
      • This may take 40 hours.
  • JavaScript
    • What’s needed:
      • Ability to work with functions (including callbacks), variables, and arrays.
      • Ability to debug JavaScript in the browser.
      • These skills are needed to add and debug UI widgets. Code generation isn't enough.
    • Training:
      • A short course (or a reasonable text book) plus a few tutorial examples will be enough.
  • HTML and CSS
    • What’s needed:
      • A low level of familiarity is enough.
    • Training:
      • Tutorials on the web or a few YouTube videos should be enough.
  • Git
    • What’s needed:
      • The ability to use Git-based source control systems. 
      • It's needed because deployment platforms rely on code being on Git.
    • Training:
      • Most data scientists have a weak understanding of Git. 
      • A hands-on training course would be the most useful approach.

Code gen is not one-size-fits-all

AI code gen is a tremendous productivity boost and enabler in many areas but not all. For key tasks, like database design and app deployment, AI code gen doesn’t help at all. In other areas, for example, complex database/dataframe manipulations and handling some advanced UI issues, AI helps somewhat but it needs substantial guidance. The AI coding productivity benefit is a range from negative to greatly positive depending on the task. 

The trick is to use AI code gen appropriately and provide adult supervision. This means reviewing what AI produces and intervening. It means knowing when to stop prompting and when to start coding.

Recommendations before attempting rapid application development

  • Make sure your team have the skills I’ve outlined above, either individually or collectively.
  • Use the right tools in the right way.
  • Don’t set unreasonable expectation, understand that your first attempts will be slow as you learn.
  • Run a pilot project or two with loose deadlines. From the pilot project, codify the lessons and ways of working. Focus especially on AI code gen and deployment.

How I learned rapid development: my pathfinder app

For this project, I chose to build an app that analyzes the results of English League Football (soccer) games since the league began in 1888 to the most recently completed season (2024-2025). 

The data set is quite large, which means a database back end. The database will need multiple tables.

It’s a very chart-heavy app. Some of the charts are violin plots that need kernel density estimation, and I’ve added curve fitting and confidence intervals on some line plots. That’s not the most sophisticated data analysis, but it’s enough to prove a point about the use of data science methods in apps. Notably, charts are not covered in most Django texts.

(Just one of the plots from my app. Note the year slider at the bottom.)

In several cases, the charts need widgets: sliders to select the year and radio buttons to select different leagues. This means either using ‘native’ JavaScript or libraries specific to the charting tool (Bokeh). I chose to use native JavaScript for greater flexibility.

To get started, I roughly drew out what I wanted the app to look like. This included different themed analysis (trends over time, goal analysis, etc.) and the charts I wanted. I added widgets to my design where appropriate.

The stack

Here’s the stack I used for this project.

Django was the web framework, which means it handles incoming and outgoing data, manages users, and manages data. Django is very mature, and is very well supported by AI code generation (in particular, Cursor). Django is written in Python.

Postgres. “Out of the box”, Django supports SQLite, but Render (my deployment solution) requires Postgres. 

Bokeh for charts. Bokeh is a Python plotting package that renders its charts in a browser (using HTML and JavaScript). This makes it a good choice for this project. An alternative is Altair, but my experience is that Bokeh is more mature and more amenable to being embedded in web pages.

JavaScript for widgets. I need to add drop down boxes, radio buttons, sliders, and tabs etc. I’ll use whatever libraries are appropriate, but I want code gen to do most of the heavy lifting.

Render.com for deployment. I wanted to deploy my project quickly, which means I don’t want to build out my own deployment solution on AWS etc., I want something more packaged.

I used Cursor for the entire project.

The build process and issues

Building the database

My initial database format gave highly complicated Django models that broke Django’s ORM. I rebuilt the database using a much simpler schema. The lesson here is to keep the database reasonably close to the format in which it will be displayed. 

My app design called for violin plots of attendance by season and by league tier. This is several hundred plots. Originally, I was going to calculate the kernel density estimates for the violin plots at run time, but I decided it would slow the application down too much, so I calculated them beforehand and saved them to a database table. This is a typical trade-off.

For this part of the process, I didn’t find code generation useful.

The next stage was uploading my data to the database. Here, I found code generation very useful. It enabled me to quickly create a Python program to upload data and check the database for consistency.

Building Django

Code gen was a huge boost here. I gave Cursor a markdown file specifying what I wanted and it generated the project very quickly. The UI wasn’t quite what I wanted, but by prompting Cursor, I was able to get it there. It let me create and manipulate dropdown boxes, tabs, and widgets very easily – far, far faster than hand coding. I did try and create a more detailed initial spec, but I found that after a few pages of spec, code generation gets worse; I got better results by an incremental approach.

(One part of the app, a dropdown box and menu. Note the widget and the entire app layout was AI code generated.)

The simplest part of the project is a view of club performance over time. Using a detailed prompt, I was able to get all of the functionality working using only code gen. This functionality included dropdown selection box, club history display, league over time, matches played by season. It needed some tweaks, but I did the tweaks using code gen. Getting this simple functionality running took an hour or two.

Towards the end of the project, I added an admin panel for admin users to create. edit, and delete "ordinary" users. With code gen, This took less than half an hour, including bug fixes and UI tweaks.

For one UI element, I needed to create an API interface to supply JSON rather than HTML. Code gen let me create it in seconds.

However, there were problems.

Code gen didn’t do well with generating Bokeh code for my plots and I had to intervene to re-write the code.

It did even worse with retrieving data from Django models. Although I aligned my data as closely as I could to the app, it was still necessary to aggregate data. I found code generation did a really poor job and the code needed to be re-written. Code gen was helpful to figure out Django’s model API though.

In one complex case, I needed to break Django’s ORM and make a SQL call directly to the database. Here, code gen worked correctly on the first pass, creating good-quality SQL immediately.

My use of code gen was not one-and-done, it was an interactive process. I used code generation to create code at the block and function level.

Bokeh

My app is very chart heavy, having more than 10 charts and there aren't that many examples of this type of app that I could find. This means that AI code gen doesn't have much to learn from. 

(One of the Bokeh charts. Note the interactive controls on the right of the plot and the fact the plot is part of a tabbed display.)

Code gen didn’t do well with generating Bokeh code for my plots and I had to intervene to re-write code.

I needed to access the Bokeh chart data from the widget callbacks and update the charts with new data (in JavaScript). This involved a building a JSON API, which code gen created very easily. Sadly, code gen had a much harder time with the JavaScript callback. It’s first pass was gibberish and refining the prompt didn’t help. I had to intervene and ask for code gen on a code block-by-block basis. Even then, I had to re-write some lines of code. Unless the situation changes, my view is, code generation for this kind of problem is probably limited to function definition and block-by-block code generation, with hand coding to correct/improve issues.

(Some of the hand-written code. Code gen couldn't create this.)

Render

By this stage, I had an app that worked correctly on my local machine. The final step was deployment so it would be accessible on the public internet. The sundai.club and others, use Render.com and other similar services to rapidly deploy their apps, so I decided to use the free tier of Render.com.

Render’s free tier is good enough for demo purposes, but it isn’t powerful enough for a commercial deployment (which is fair); that's why I’m not linking to my app in this blog post: too much traffic will consume my free allowance.

Unlike some of its competitors, Render uses Postgres rather than SQLite as its database, hence my choice of Postgres. This means deployment is in two stages:

  • Get the database deployed.
  • Linking the Django app to the database and deploy it.

The process was more complicated than I expected and I ran into trouble. The documentation wasn’t as clear as it needed to be, which didn’t help. The consistent advice in the Render documentation was to turn off debug. This made diagnosing problems almost impossible. I turned debug on and fixed my problems quickly. 

To be clear: code gen was of no help whatsoever.

(Part of Render's deployment screen.)

However, it’s my view this process could be better documented and subsequent deployments could go very smoothly.

General comments about AI code generation

  • Typically, many organizations require code to pass checks (linting, PEP8, test cases etc.) before the developer can check it into source control. Code generation makes it easier and faster to pass these checks. Commenting and code documentation is also much, much faster. 
  • Code generation works really well for “commodity” tasks and is really well-suited to Django. It mostly works well with UI code generation, provided there’s not much complexity.
  • It doesn’t do well with complex data manipulations, although its SQL can be surprisingly good.
  • It doesn’t do well with Bokeh code.
  • It doesn’t do well with complex UI callbacks where data has to be manipulated in particular ways.

Where my app ended up

End-to-end, it took about two weeks, including numerous blind alleys, restarts, and time spent digging up answers. Knowing what I know now, I could probably create an app of this complexity in less than 5 days, fewer still with more people.

My app has multiple pages, with multiple charts on each page (well over 10 charts in total). The chart types include violin plots, line charts, and heatmaps. Because they're Bokeh charts, my app has built-in chart interactivity. I have widgets (e.g., sliders, radio buttons) controlling some of the charts, which communicate back to the database to update the plots. Of course, I also have Django's user management features.

Discussion

There were quite a few surprises along the way in this project: I had expected code generation to do better with Bokeh and callback code, I’d expected Render to be easier to use, and I thought the database would be easier to build. Notably, the Render and database issues are learning issues; it’s possible to avoid these costs on future projects. 

I’ve heard some criticism of code generated apps from people who have produced 70% or even 80% of what they want, but are unable to go further. I can see why this happens. Code gen will only take you so far, and will produce junk under some circumstances that are likely to occur with moderately complex apps. When things get tough, it requires a human with the right skills to step in. If you don’t have the right skills, your project stalls. 

My goal with this project was to figure out the skills needed for rapid application development and deployment. I wanted to figure out the costs of enabling a data science team to build their own apps. What I found is the skill set needed is the skill set of a full-stack engineer. In other words, rapid development and deployment is firmly in the realm of software engineers and not data scientists. If data scientists want to build apps, there's a learning curve and a leaning cost. Frankly, I'm coming round to the opinion that data scientists need a broader software skill set.

For a future version of this project, I would be tempted to split off the UI entirely. The Django code would be entirely a JSON server, accessed through the API. The front end would be in Next.js. This would mean having charting software entirely in JavaScript. Obviously, there's a learning curve cost here, but I think it would give more consistency and ultimately an easier to maintain solution. Once again, it points to the need for a full-stack skill set.

To make this project go faster next time, here's what I would do:

  • Make the database structure reasonably close to how data is to be displayed. Don't get too clever and don't try to optimize it before you begin.
  • Figure out a way to commoditize creating charts and updating them through a JavaScript callback. The goal is of course to make the process more amenable to code generation. 
  • Related to charts, figure out a better way of using the ORM to avoid using SQL for more complex queries. Figure out a way to get better ORM code generation results.
  • Document the Render deployment process and have a simple checklist or template code.

Bottom line: it’s possible to do rapid application development and deployment with the right approach, the right tools, and using code gen correctly. Training is key.

Using the app

I want to tinker with my app, so I don't want to exhaust my Render free tier. If you'd like to see my app, drop me a line (https://www.linkedin.com/in/mikewoodward/) and I'll grant you access.

If you want to see my app code, that's easier. You can see it here: https://github.com/MikeWoodward/English-Football-Forecasting/tree/main/5%20Django%20app 

Sunday, October 26, 2025

Context 7: code generation using the most recent libraries

The problem

One of my complaints about using AI code gen with Cursor has been its "backwardness"; it tends to use older versions of libraries. This sometimes means your generated code isn't as well-structured as it could be. It's why AI code gen has sometimes felt to me like working with a grumpy older senior engineer.

What we want is some way of telling Cursor (or AI code gen in general) to use the latest library versions and use the latest code samples. Of course, we could supply links to the library documentation ourselves as part of the prompt, but this is tedious, which means we're prone to forgetting it.

Wouldn't it be great to have a list of all the latest libraries and documentation and supply it directly to Cursor via an MCP server? That means, we'll always point to the latest version of the code and docs, Cursor will pick it up automatically, and someone else bears the cost of keeping the whole thing up to date. With such a service, we could always generate code using the latest version of libraries.

(Gemini)

The solution

As you've guessed, such a thing exists and it's called Context 7. Context 7 provides links to the latest version of over 49,000 libraries, everything from Next.js to Requests. It provides these links in a form that's usable by LLMs.

If you really wanted to, you could include these links via a prompt. For example, for Streamlit, you could use the results here in a prompt: https://context7.com/websites/streamlit_io.  But that's inconvenient. You're better off using the Context 7 MCP Server and telling Cursor to use it in code generation.

How to implement it

There's a lot of advice about installing the Context 7 MCP Server on Cursor, some of it misleading and others wrong or out of date. Here's the easiest way to do it:

  1. Have Cursor running in the background.
  2. Go to this page on GitHub: https://github.com/upstash/context7
  3. Go down to the section "Install in Cursor" and expand the section.
  4. Click on the "Add to Cursor" button: 

This should automatically add the Context MCP server to your installed MCP servers.  To check that it has, do the following in Cursor:

  1. Click on the "Cursor" menu option, then click "Settings...".
  2. Click "Cursor Settings".  
  3. Click "Tools & MCP".
You should see this:


Next, you need to tell Cursor to use Context 7 when generating code. You could do this on every prompt, but that's tedious. You're much better off adding a rule. The Context 7 GitHub page even tells you how to do it: https://github.com/upstash/context7?tab=readme-ov-file#-tips.

Using it

This is the best part, if you've added the MCP server and you've added the rule, there's nothing else to do, you'll be using the latest version of your libraries when you generate code. I've heard a few people comment that they needed to restart Cursor, but I found it works just fine.

The cost

Using Context 7 will cost you more tokens, but in my view, it's a price worth paying for more up to date code.

Who's behind Context 7?

A company called Upstash have created Context 7 and they're providing it for free. To be clear: I have no affiliation of any kind with Upstash and have received no benefit or reward from them.

Bottom line

Use Context 7 in your code generation.

Tuesday, September 2, 2025

Em dash = AI slop?

Punctuation as a giveaway

Recently, I've seen a lot of comments on the web that the use of em dashes is a dead giveaway that an article has been written by AI. This immediately made me think of my own use of dashes and semicolons.  I don't use AI for text generation, but I wondered if my writing might be mistaken for AI because of my use of punctuation. I decided to take a deeper look at the whole area.

(Gemini, with some assistance.)

Punctuation symbols

Let's start by looking at the symbols themselves.

Symbol Name Commentary
Em dash Not easily available from my keyboard.
Named because it's the width of a capital M.
HTML: —
Markdown: ---
En dash Not easily available from my keyboard.
Named because it's the width of a capital N.
HTML: –
Markdown: --
- Hyphen Available easily on my keyboard (minus sign)
; Semicolon Easily available from my keyboard.
, Comma Easily available from my keyboard.

Grammatical use

I'm not going to go into grammar too much here because I'm the wrong person to do that, but I will very briefly summarize the situation (my favorite grammar book is "Rules for writers" by Hacker and Sommers, check it out if you want a good grammar reference). Commas and semicolons have different grammatical purposes and their use goes back a long time. Hyphens are a more modern invention and seem to have some of the same usage of both commas and semicolons; a sort of generic punctuation mark. 

As far as I can tell, em dashes, en dashes, and hyphens are used for more or less the same grammatical purpose; they're interchangeable. Some websites suggest that there is a difference in grammatical use between — and – and these are reputable websites, so there may be some fine distinction. For most people and most uses,  em dashes, en dashes, and hyphens serve the same purpose.

Usage in the real world by people

Recent writing seems to favor the use of - rather than ;, especially in short form communications like text messages or even emails.  I've noticed some modern authors are using hyphens instead of semicolons, in fact, I've met a professional writer who always used hyphens and never semicolons. Overall, semicolon usage seems to be in decline.

If I'm typing in text, normally I only use characters easily available from my keyboard, unless I'm using a special character like a currency symbol (e.g. €). In other words, it's unlikely I'll use em dashes or en dashes. Given that it's hard to tell the different dashes apart, it's hard to understand why anyone (any human) other than a professional typesetter would use a dash other than a - (hyphen). In the sentence below, have I used an Em dash or an En dash or even a hyphen? 

"David lived in Paris 2005–2010."

It's hard to tell isn't it? Which means for humans, em dashes, en dashes, and hyphens can't easily be distinguished.

Is it a reliable AI detector?

Recent English usage seems to favor - over ;, so you can see why an AI might learn to use - rather than ;. As I said earlier, there are some websites that distinguish different uses between —, –, and -, so it's possible an AI will apply these rules too.  You can sometimes detect non-native English speakers because their English is too good, they don't make the mistakes native speakers do, and something similar may be happening here. An AI may be applying a "dashes" rule that a native writer wouldn't.

Is it a smoking gun proof? Probably not. I'm sure there are writers who love different dashes, and of course, the software they're using may convert hyphens into different types of dashes for them. But it is a strong indicator. 

I find distinguishing between dashes hard, but peeking at the underlying HTML or Markdown gives way the use of em-dashes and en-dashes immediately. So if you have access to the text, you can check.

By contrast, the use of a ; may indicate a human writer, until of course, AIs learn how to use it (im)properly.

Wednesday, June 25, 2025

AI networking in the Boston area

A lot's happening in Boston - where should I find out more?

There's a lot of AI work going on in the Boston area covering the whole spectrum, from foundational model development, to new AI applications, to corporates developing new AI-powered apps, to entrepreneurs creating new businesses, to students building prototypes in 12 hours. Pretty much every night of the week you can go to a group where you can find out more; there are a ton of different groups out there. But not all of them are created equal. I've been to a lot of groups and here are my recommendations for the best ones that meet on a regular basis. The list is alphabetical.

(Google Gemini)

AI Tinkerers

What it is

Monthly meeting where participants show the AI projects they've been working on. Mostly, but not exclusively, presentations from the Sundai Club (Harvard and MIT weekly hackathons). Attendance is over 150.

Commentary

This is where I go to when I want to see what's possible and find out more about the cutting edge. It's where I found out what tools like Cursor could really do. There are a number of VCs in attendance watching for anything interesting.

How often it meets

Once a month at Microsoft NERD.

Positives

You get to see what the cutting edge is really like.

Negatives

I found networking at this event less useful than some of the other events.

How to join

https://boston.aitinkerers.org/

AI Woodstock

What it is 

A networking event for people interested in AI. It attracts practitioners,  some VCs, recruiters, academics, and entrepreneurs. Attendee numbers vary, but typically over 100.

Commentary

This is networking only, there are no presentations or speakers of any kind. You turn up to the venue and introduce yourself to other people, and get talking. I've met people who are starting companies, people who are working on side gigs, and people who are working in AI for large companies. 

The quality is high; I've learned a lot about what's going on and what companies in the Boston area are doing. 

The venue is both good and bad. It's held in a corner of the Time Out Market near Fenway Park. This is a large space with lots of food and drink vendors, it attracts the bright young things of the Boston area who go there to eat and drink after work. AI Woodstock doesn't take over the whole space or rope off a portion of it and AI Woodstock attendees are only identified by name badges. This means you're chatting away to someone about their AI enabled app while someone is walking by with their drink and app to meet their friends. The background noise level can be really high at times.

How often it meets 

Once a month at the Time Out Market near Fenway Park.

Positives

  • Networking. This is one of the best places to meet people who are active in AI in Boston.
  • Venue. It's nice to meet somewhere that's not Cambridge and the food and drink offerings are great.

Negatives

  • Venue. The noise level can get high and it can get quite crowded. The mix of bright young things out to have a good time and AI people is a bit odd.

How to join

https://www.meetup.com/ai-woodstock/ - choose Boston

Boston Generative AI Meetup

What it is

This is a combination of networking and panel session. During the networking, I've met VCs, solo entrepreneurs, AI staff at large companies, academics, and more. Attendance varies, but typically over 200.

Commentary

This is held in Microsoft NERD in Cambridge and it's the only event in the space. This means it starts a bit later and has to finish on time. 

Quality is very high and I've met a lot of interesting people. I met someone who showed me an app they'd developed and told me how they'd done it, which was impressive and informative.

The panel sessions have been a mixed bag; it's interesting to see people speak, and I found out a lot of useful information, but the panel topics were just so-so for me. Frankly, what the panelists said was useful but the overall topic was not.

How often it meets

About once a month.

Positives

  • Networking. 
  • Venue.
  • Information. The panels have mentioned things I found really useful.

Negatives

  • Panel session topics were a bit blah.

How to join

https://www.meetup.com/boston-generative-ai-meetup/

PyData Boston

What it is

Presentations plus networking. This is almost all machine learning/data science/AI practitioners in the Boston area (no VCs, no business people, instead there are academics and engineers). The presentations are mostly on technical topics, e.g. JAX. Attendance varies, but usually 50-100.

Commentary

I've learned more technical content from this group than any other. The presentations are in-depth and not for people who don't have a goodish background in Python or data science.

How often it meets

Once a month, usually at the Moderna building in Cambridge.

Positives

  • Best technical event. In-depth presentations have helped educate me and point out areas where I need to learn more. Conversations have been (technically) informative.
  • Probably the friendliest group of all of them.

Negatives

  • No entrepreneurs, no VCs, no executive management.

How to join

https://www.meetup.com/pydata-boston-cambridge/

Common problems

There's a refrain I've heard from almost all event organizers and that's the problem of no-shows. The no-show rate is typically 40% or so, which is hugely frustrating as there's often a a waiting list of attendees. Some of these events have instituted a sign-in policy, if you don't turn up and sign in, you can't attend future events, and I can see more events doing it in future. If you sign up, go.

One-off events

As well as these monthly events, there are also one-off events that happen sporadically. Obviously, I can't review them here, but I will say this, the quality is mostly very high but it is variable.

What's missing

I'm surprised by what I'm not hearing at these events. I'm not hearing implementation stories from existing ("mature") companies. Through private channels, I'm hearing that the failure rate for AI projects can be quite high, but by contrast I've been told that insurance companies are embracing AI for customer facing work and getting great results. I've met developers working on AI enabled apps for insurance companies and they tell me their projects have management buy-in and are being rolled out.

I'd love to hear someone from one of these large companies get up and speak about what they did to encourage success and the roadblocks on the way. In other words, I'd like to see something like "Strategies and tactics for successful AI projects" run by people who've done it.

Your thoughts

I've surely missed off groups from this list. If you know of a good group, please let me know either through LinkedIn or commenting on this post.

Logistic regression - a simple briefing

A briefing on logistic regression

I've been looking again at logistic regression and going over some of the theory behind it. In a previous blog post, I talked about how I used Manus to get a report on logistic regression and I showed what Manus gave me. I thought it was good, B+, but not great, and I had some criticisms of what Manus produced. The obvious challenge is, could I do better? This blog post is my attempt to explain logistic regression better than Manus.

What problems are we trying to solve?

There are a huge class of problems where we’re trying to predict a binary result, here are some examples:

  • The results of a referendum, e.g., whether or not to remain in or leave the EU.
  • Whether to give drug A or drug B to a patient with a condition.
  • Which team will win the World Cup or Super Bowl or World Series.
  • Is this transaction fraudulent?

Typically, we’ll have a bunch of different data we can use to base our prediction model on. For example, for a drug choice, we may have age, gender, weight, smoker or not, and so on. These are called features. Corresponding to this feature set, we’ll have a set of outcomes (also called labels), for example, for the drug case, it might be something like percentage survival (a% survived given drug A compared to b% for drug B). This makes logistic regression a supervised machine learning method.

In this blog post, I’ll show you how you can turn feature data into binary classification predictions using logistic regression. I’ll also show you how you can extend logistic regression beyond binary classification problems.

Before we dive into logistic regression, I need to define some concepts.

What are the odds?

Logistic regression relies on the odds or the odds ratio, so I’m going to define what it is using an example.

For two different drug treatments, we have different rates of survival. Here’s a table adapted from [1] that shows the probability of survival for fictitious study. 

Standard treatment New treatment Totals
Died 152 (38%) 17 (17%) 169
Survived 248 (62%) 103 (83%) 351
Totals 400 (100%) 120 (100%) 520

Plainly, the new treatment is much better. But how much better?

In statistics, we define the odds as being the ratio of the probability of something happening to it not happening:

\[odds = \dfrac{p}{1 - p}\]

So, if there’s a 70% chance of something happening, the odds of it happening are 2.333. Probabilities can range from 0 to 1 (or 0% to 100%), whereas odds can range from 0 to infinity. Here’s the table above recast in terms of odds.

Standard treatment New treatment
Died 0.613 0.165
Survived 1.632 6.059

The odds ratio tells us how much more likely an outcome is. A couple of examples should make this clearer. 

The odds ratio for death with the standard treatment compared to the new is:

\[odds \: ratio = \dfrac{0.613}{0.165} = 3.71...\]

This means a patient is 3.71 times more likely to die if they’re given the standard treatment compared to the new.

More hopefully, the odds ratio for survival with the new treatment compared to the old is:

\[odds \: ratio = \dfrac{6.059}{1.632} = 3.71...\]

Unfortunately, most of the websites out there are a bit sloppy with their definitions. Many of them conflate “odds” and “odds ratio”. You should be aware that they’re two different things:

  • The odds is the probability of something happening divided by the probability of it not happening.
  • The odds ratio compares the odds of an event in one group to the odds of the same event in another group.

The odds are going to be important for logistic regression.

The sigmoid function

Our goal is to model probability (e.g. the probability that the best treatment is drug A), so mathematically, we want a modeling function that has a y-value that varies between 0 and 1. Because we’re going to use gradient methods to fit values, we need the derivative of the function, so our modeling function must be differentiable. We don’t want gaps or ‘kinks’ in the modeling function, so we want it to be continuous.

There are many functions that fit these requirements (for example, the error function). In practice, the choice is the sigmoid function for deep mathematical reasons; if you analyze a two-class distribution using Bayesian analysis, the sigmoid function appears as part of the posterior probability distribution [2].  That's beyond where I want to go for this blog post, so if you want to find out more, chase down the reference.

Mathematically, the sigmoid function is:

\[\sigma(x) = \dfrac{1}{1 + e^{-x}} \]

And graphically, it looks like this:

I’ve shown the sigmoid function in one dimension, as a function of \(x\). It’s important to realize that the sigmoid function can have multiple parameters (e.g. \(\sigma(x, y, z)\)), it’s just much, much harder to draw.

The sigmoid and the odds

We can write the odds as:

\[odds = \dfrac{1}{1-p}\]

Taking the natural log of both sides (this is called the logit function):

\[ln(odds) = ln \left( \dfrac{1}{1-p} \right)\]

In machine learning, we're building a prediction function from \(n\) features \(x\), so we can write:

\[\hat{y} = w_1 \cdot x_1 + w_2 \cdot x_2 \cdots + w_n \cdot x_n\]

For reasons I'll explain later, this is the log odds:

\[\hat{y} = w_1 \cdot x_1 + w_2 \cdot x_2 \cdots + w_n \cdot x_n = ln \left( \dfrac{1}{1-p} \right)\]

With a little tedious rearranging, this becomes:

\[p = \dfrac{1}{1 + e^{-(w_1 \cdot x_1 + w_2 \cdot x_2 \cdots + w_n \cdot x_n)}}\]

Which is exactly the sigmoid function I showed you earlier.

So the probability \(p\) is modeled by the sigmoid function.

This is the "derivation" provided in most courses and textbooks, but it ought to leave you unsatisfied. The key point is unexplained,  why is the log odds the function \(w_1 \cdot x_1 + w_2 \cdot x_2 \cdots + w_n \cdot x_n \)?  The answer is complicated and relies on a Bayesian analysis [3]. Remember, logistic regression is taught before Bayesian analysis, so lecturers or authors have a choice; either divert into Bayesian analysis, or use a hand-waving derivation like the one I've used above. Neither choice is good. I'm not going to go into Bayes here, I'll just refer you to more advanced references if you're interested [4].

Sigmoid to classification

In the previous section, I told you that we calculate a probability value. How does that relate to classification? Let's take an example.

Imagine two teams, A and B playing a game. The probability of team A winning is \(p(A)\) and the probability of team B winning is \(p(B)\). From probability theory, we know that \(p(A) + p(B) = 1\), which we can rearrange as \(p(B) = 1 - p(A)\). Let's say we're running a simulation of this game with the probability \(p = p(A)\). So when p is "close" to 1, we say A will win and when p is close to 0, we say B will win. 

What do we mean by close? By "default", we might say that if \(p >= 0.5\) then we chose A and if \(p < 0.5\) we chose B. That seems sensible and it's the default choice of scikit-learn as we'll see, but it is possible to select other thresholds.

(Don't worry about the difference between  \(p >= 0.5\) and \(p < 0.5\) - that only becomes an issue under very specific circumstances.) 

Features and functions

Before we dive into an example of using logistic regression, it's worth a quick detour to talk about some of the properties of the sigmoid function. 

  • The y axis varies from 0 to 1.
  • The x axis varies from \(-\infty\) to  \(\infty\)
  • The gradient changes rapidly around \(x=0\) but much more slowly as you move away from zero. In fact, once you go past \(x=5\) or \(x=-5\) the curve pretty much flattens. This can be a problem for some models.
  • The "transition region" between \(y=0\) and \(y=1\) is quite narrow, meaning we "should" be able to assign probabilities away from \(p=0.5\) most of the time, in other words, we can make strong predictions about classification.

How logistic regression works

Calculating a cost function is key, however, it does involve some math that would take several pages and I don't want to turn this into a huge blog post. There are a number of blog posts online that delve into the details if you want more, checkout references [7, 8].

In linear regression, the method used to minimize the cost function is gradient descent (or a similar method like ADAM). That's not the case with logistic regression. Instead we use something called maximum likelihood estimation, and as its name suggests, this is based on maximizing the likelihood our model will predict the data we see. This approach relies on calculating a log likelihood function and using a gradient ascent method to maximize likelihood. This is an iterative process. You can read more in references [5, 6].

Some code

I'm not going to show you a full set of code, but I am going to show you the "edited highlights". I created an example for this blog post, but all the ancillary stuff got in the way of what I wanted to tell you, so I just pulled out the pieces I thought that would be most helpful. For context, my code generates some data and attempts to classify it.

There are multiple libraries on Python that have logistic regression, I'm going to focus on the one most people use to explore ideas, scikit-learn.

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import StandardScaler

train_test_split splits the data into a test set and training set. I'm not going to show how that works, it's pretty standard,

Machine learning algorithms tend to work better when the features are scaled. A lot of the time, this isn't an issue, but if the values of features range very, very differently, this can be an issue for the numeric algorithms. Here's an example: let's say feature 1 ranges from 0.001 to 0.002 and feature 2 ranges from 1,000,000 to 2,000,000, then we may have a problem. The solution is to scale the features over the same 0 to 1 range. Notably, scaling is also a problem for many curve fitting type algorithms too.  Here's the scaling code for my simple example:

scaler = StandardScaler()
features_scaled = scaler.fit_transform(features)

Fitting is simply calling the fit method on the LogisticRegression model, so:

# Create and train scikit-learn logistic regression model
model = LogisticRegression(
random_state=random_state,
max_iter=max_iterations,
solver='liblinear'
)
# Train the model on scaled features
model.fit(features_scaled, labels)

As you might expect, max_iter stops the fitting process from going on forever. random_state controls the random number generator used; it's only applicable to some solvers like the 'liblinear' one I've used here. The solver is the type of equation solver used. There's a choice of different solvers which have different properties and are therefore good for different sorts of data, I've chosen 'liblinear' because it's simple.

fit works exactly as you think it might.

Here's how we make predictions with the test and training data sets:

test_features_scaled = scaler.transform(test_features)
train_features_scaled = scaler.transform(train_features)
train_predictions = model.predict(train_features_scaled)
test_predictions = model.predict(test_features_scaled)

This is pretty straightforward, but I want to draw your attention to the scaling going on here. Remember, we scaled the features when we created the model, so we have to scale the features when we're making predictions. 

The predict method uses a 0.5 threshold as I explained earlier. If we'd wanted another threshold, say 0.7, we would have used the predict_proba method.

We can measure how good our model is with the  function accuracy_score.

train_accuracy = accuracy_score(train_labels, train_predictions)
test_accuracy = accuracy_score(test_labels, test_predictions)

This gives a simple number for the accuracy of the train and test set predictions. 

You can get a more detailed report using classification_report:

        classification_report(test_labels, test_predictions)

This gives a set of various "correctness" measures.

Here's a summary of the stages:

  • Test/train split
  • Scaling
  • Fit the model
  • Predict results
  • Check the accuracy of the prediction.

Some issues with the sigmoid

Logistic regression is core to neural nets (it's all in the activation function), and as you know, neural nets have exploded in popularity. So any issues with logistic regression take on an outsize importance. 

Sigmoids suffers from the "vanishing gradient" problem I hinted at earlier. As \(x\) becomes more positive or negative, the \(y\) value gets closer to 0 or 1, so the gradient (first derivative) gets smaller and smaller. In turn, this can make training deep neural nets harder.

Sigmoids aren't zero centered, which can cause problems for modeling some systems.

Exponential calculations cost more computation time than other, simpler functions. If you have thousands, or evens millions of nets, that soon adds up.

Fortunately, sigmoids aren't the only game in town. There are a number of alternatives to the sigmoid, but I won't go into them here. You should just know they exist.

Beyond binary

In this post, I've talked about simple binary classification. The formula and examples I've given all revolve around simple binary splits. But what if you want to classify something into three or more buckets?  Logistic regression can be extended for more than two possible outputs and can be extended to the case where the outputs are ordered (ordinal).

In practice, we use more or less the same code we used for the binary classification case, but we make slightly different calls to the LogisticRegression function. The scikit-learn documentation has a really nice three-way classification demo you can see here: https://scikit-learn.org/stable/auto_examples/linear_model/plot_logistic_multinomial.html.

What did Manus say?

Previously, I asked Manus to give me a report on logistic regression. I thought it's results were OK, but I thought I could do better. Here's what Manus did: https://blog.engora.com/2025/05/the-importance-of-logistic-regression.html, and of course, you're reading my take. 

Manus got the main points of logistic regression, but over emphasized some areas and glossed over others. It was a B+ effort I thought. Digging into it, I can see Manus reported back on the consensus of the blogs and articles out there on the web. That's fine (the "wisdom of the crowd"), but it's limited. There's a lot of repetition and low-quality content out there, and Manus reflected that. It missed nuances because most of the stuff out there did too.

The code Manus generated was good and it's explanation of the code was good. It did miss explaining some things I thought were important, but on the whole I was happy with it.

Overall, I'm still very bullish on Manus. It's a great place to start and may even be enough of itself for many people, but if you really want to know what's going on, you have to do the work.

References

[1] Sperandei S. Understanding logistic regression analysis. Biochem Med (Zagreb). 2014 Feb 15;24(1):12-8. doi: 10.11613/BM.2014.003. PMID: 24627710; PMCID: PMC3936971.

[2] Bishop, C.M. and Nasrabadi, N.M., 2006. Pattern recognition and machine learning (Vol. 4, No. 4, p. 738). New York: springer.

[3] https://www.dailydoseofds.com/why-do-we-use-sigmoid-in-logistic-regression/

[4] Norton, E.C. and Dowd, B.E., 2018. Log odds and the interpretation of logit models. Health services research, 53(2), pp.859-878.

[5] https://www.geeksforgeeks.org/machine-learning/understanding-logistic-regression/

[6] https://www.countbayesie.com/blog/2019/6/12/logistic-regression-from-bayes-theorem

[7] https://medium.com/analytics-vidhya/derivative-of-log-loss-function-for-logistic-regression-9b832f025c2d

[8] https://medium.com/data-science/introduction-to-logistic-regression-66248243c148

Monday, June 16, 2025

Tell me on a Sundai.club – something novel in Boston?

At several events in the Boston area, I heard talk of something called the Sundai Club, a weekly AI hackathon for MIT and Harvard students. At the AI Tinkerers group, I saw some of their projects and I was impressed. This blog post is about the club and what I’ve observed from their presentations and from their code.

(Canva)

What impressed me

During the AI Tinkerers event, I saw several demos of “products” created by small teams of Sandai Club undergraduate students in 12 hours. Of course, all of the demos used AI, either to do processing in the background and/or for code generation. These demos were good enough to clearly demonstrate a value proposition.  

Let me repeat this because it’s important. A small group of undergraduate students are regularly building working prototypes in 12 hours. The impressive thing is the productivity and the quality coming from students.

Of course, the output is a prototype, but with AI, they’ve got a substantial productivity boost. All the UIs looked good and all the prototypes did something interesting.

I was impressed enough to dig deeper, hence this review.

How the club operates

This is a student club for MIT and Harvard students. It meets every Sunday from 10am to 10pm for a full day’s hacking. Not all the 12 hours is spent hacking, there’s a sunset run and presentations. Some of the sessions are sponsored by AI companies or companies in the adjacent space. Sponsorship often means providing free compute resources for example, computing power or hosting etc.

They have a website you can visit: https://www.sundai.club/ 

My review of their code

Most of the projects are posted on the website and of those, most have GitHub pages where you can view the code. I spent some time dissecting several projects to figure out what’s going on. Here are my thoughts.

Code quality is surprisingly good. It’s readable and well-structured. Is this because it’s at least partly AI generated? Probably. 

Code length is surprisingly short. You can read over all the code for one of these projects in less than 10 minutes.

Notably, they do use a lot of “new” services. This includes newer libraries and newer hosting services. This is a hidden benefit: their development speed isn’t just from AI, it’s from using the right (non-AI) tools.

LLM connections are simple. It’s just API calls and prompts. This was the surprise for me, I was expecting something more complicated.

Importantly, they use agentic AI IDEs. Cursor was the one I saw used the most, but I’ve heard of projects using Lovable and I’m sure there’s Windsurf usage too. In fact, a Sundai club presentation was the first time I saw people “vibe coding” using voice (via the Whisper add-on). Agentic IDEs seem to be key to the productivity gains I saw. 

Why is this so interesting

  • They’re producing prototype “products” in less than 12 hours with a small team. This would have taken more than two or three weeks in the past.
  • The quality of the code is high. It’s at least as good as some professional code.
  • They’re using the latest libraries, IDEs, and tools. They really are on the cutting edge.

Next steps

The most obvious thing you can do is visit their website: https://www.sundai.club/ and view some of their projects.

If you’re in the Boston area, you can often catch Sundai Club presentations at the AI Tinkerers group, which is open to anyone: https://boston.aitinkerers.org/ 

Saturday, June 7, 2025

How to Thrive in an AI-Driven Future

On Wednesday, I went to a panel session in Boston on AI. I thought my notes might be useful to others, so here they are. The title of the panel was "How to Thrive in an AI-Driven Future", the thing thriving is the city of Boston and the surrounding areas.

What was the panel about?

The panel was about the current state of AI in the Boston area, focused on how Boston might become a hub for AI in the near term. It discussed the Boston areas' strengths and weaknesses, and along the way, it pointed out a number of great AI resources in the local area.

(King of Hearts, CC BY-SA 4.0 <https://creativecommons.org/licenses/by-sa/4.0>, via Wikimedia Commons)

It was held in the Boston Museum of Science on Wednesday June 4th, 2025.

Who was on the panel?

  • Paul Baier, CEO GAI Insights - moderator 
  • Sabrina Mansur, new Executive Director of Mass AI Hub
  • Jay Ash, CEO of MACP, Massachusetts Competitive Partnership
  • Chloe Fang, President of MIT Sloan AI Club

What did the panel talk about?

The panel was at its most comfortable talking about universities and students. There was a lot of chatter about Harvard and MIT (the event was sponsored by Harvard Business School Alumni) and the 435,000 students in the state. There was also mention of the state's great educational record. Jay brought up the term "the brain state" for Massachusetts.

Apparently, there are about 100 business incubators in MA housing 7,500 companies, with 20,000 employees. These are bigger numbers than I would have expected.

Several panel members mentioned the Massachusetts Green Higher Performance Computing Center in Holyoke. I didn't know about it and it's nice to hear about these kinds of initiatives, but no-one connected it to promoting or developing AI in the state.

Sabrina talked about the state releasing some data sets in the near future as data commons. The panel all agreed this would be a great step forward. It wasn't clear what data was going to be released, when, and how, but comments later on seemed to indicate this would be Massachusetts data only.

A good deal was made of the state's $100mn AI Hub initiative, but it seems like this money has been approved but not allocated and it's not clear what it will be spent on and when. There was a hint that there might be some focus on SMBs rather than large businesses.

Chloe talked about how AI and code gen has enabled new people players. She said that a few years ago, MBA students didn't have the technical skills to build demo products, but now, with the rise of code gen, they can. She talked about MBA hackathons, something that would have been impossible until recently.

The whole panel seemed to have a love affair with the MIT and Harvard Sundai Club. This is a student club that meets on a Sunday and produces complete apps in a 12 hour period, obviously focused on AI. (I agree, there's some very interesting things going on there.)

There was some discussion on making regulation in the state appropriate, but no discussion about what that might mean.

While there was a lot of discussion on problems, there were strikingly few ideas on how to resolve them. Two issues in particular came up:

  • Funding
  • Livability

The panel contrasted how "easy" it is to get funding in San Francisco compared to Boston, and that's both at the early stage and the growth stage. There were some comments that this view is overblown and that it's easier than people think to get funding in the Boston area. Frankly, there were no real suggestions on how to change things. One ideas was to tell students in Boston that it's possible to get funding here, but that's about the only suggestion that panel had.

There were a couple of questions around livability. An audience question pointed out that rents in the Boston area are high (though San Francisco and New York rents are probably higher), but the panel dodged the question. On the subject of "things to do for twenty-somethings", the panel deferred to the youngest panel member, but again, nothing substantive was said. The panelists did talk about Boston being an international city and how its downtown doesn't really live up to that right now; the view was, Boston city government needed to step up.

Boston AI Week, which is being held in the Fall, was heavily promoted. 

What were my take-aways?

While there was a lot of discussion on problems, there were strikingly few ideas on how to resolve them and I'm not sure then panel had thought the issues through.

MIT and Harvard (in that order), dominate the intellectual landscape and mind share. They certainly dominated the panel's thinking. In my view, this is fair and the other universities only have themselves to blame for being left behind. While they don't have the resources of Harvard and MIT, they could run the equivalent of the Sundai Club, and they could put people up for panel sessions like this. They could also organize events etc. Yes, it's harder for them, and yes MIT and Harvard have more resources, but they could still do a lot more.

I was left with the feeling that there's no real coordination behind Boston's AI groups. While there are individuals doing great things (Paul Baier being one), I don't get the sense of an overarching and coordinated strategy. 


(Two of the three trains I had to catch to get home.)

The International city thing struck a chord with me. My trip in was easy, I parked up and got one train right to the door of the Museum of Science. On the way back, things went wrong. I had to get three trains and a shuttle bus (almost two hours door-to-door, shocking). Nothing about my return trip said "international city".

Friday, June 6, 2025

Google's ADK for agentic AI development - and some general thoughts

Some observations on agent development

On Tuesday, June 3rd, 2025, I spent the day at Google's Cambridge, MA site at their "Build with AI" event. It was a hands-on tutorial to make agentic AI systems using Google's technology. The event crystallized a few things for me and helped sharpen my thinking. 

In this blog post, I'm going to review the workshop and talk about my general thoughts

(Kenneth C. Zirkel, CC BY 4.0 <https://creativecommons.org/licenses/by/4.0>, via Wikimedia Commons)

Workshop review

The goal of the workshop was to build a working agentic AI system using MCP, A2A, and the Google Agent Development Kit (ADK). Of course, this was all done using GCP.

The session started with an overview and some theory. Thankfully, this was done well; the team kept the introductions short and dived straight into the workshop. The theory was standard stuff, an introduction to the technologies used and some of the relevant history.

Something like 70% of the workshop was setting up various Google services, for example, a web server to serve the app, a server to serve the backend, and so on. Thankfully, this was all script based, but there were a lot of scripts. This really brought home to me the role of DevOps in AI. Someone asked about AI ops, and I agree with the question, it all felt like an outgrowth of DevOps.

The Python code we did use was pretty simple. Frankly, it was just a few API calls.  The focus was on the API call arguments, making sure we had the right arguments in place for what we were trying to do. I'm going to go as far as saying that the Python coding piece was trivial; there was nothing that would cause problems even for an entry-level programmer. It was made even easier by being cut and paste, we didn't even have to figure out the right arguments.

The presenter was keen to point out the message passing between servers and how we could debug it through the Google environment. This was my concern. I've tried to debug message passing between independent systems before, and it wasn't a good experience. Having Google provide a "trace" is very helpful and reduces my concerns quite a bit.

The workshop took about four hours and I managed to build the complete system a little while before the end.

Overall, I enjoyed it and got a lot out of it. Could I build their demo system from scratch by myself? No. The reason is, all the setup that needs to be done with the various servers. It's not at all clear to me the why behind some of the config scripts. But note the problem is not a data science one, or a software one, it's a DevOps problem. Do I feel I understand A2A and MCP better? Yes. Do I recommend the workshop? Yes.

The workshop is called "Build with AI" and it's going on the road soon.

General thoughts

Agentic systems are not the preserve of data scientists any more. In fact, it's hard to understand what benefits a data scientist would bring to the table.

Over the last year, the development of various abstractions, for example, A2A, MCP, LangChain, etc.  have made it much easier to build AI systems. We've got to the stage where these things are pretty much "off-the-shelf" APIs. With one glaring exception,  AI, and agentic AI in particular, now looks like a software engineering problem, so it feels like the preserve of software engineers.

Because frameworks like MCP and A2A are all about inter-system communication, message passing is now key. Frameworks all use some form of JSON message passing underneath. This make debugging much harder and means we need to see what messages have been passed between systems. To their credit, Google knows this and has produced software to let you trace messages. Debugging message passing is still new to many software engineers and I expect some problems, even with Google's tools.

AI systems are all about calls from one system to another. This obviously means permissioning, but it also means cost. A poor set up can cost a company a great deal of money and/or give a poor user experience. These kinds of problems are usually associated with DevOps. In fact, my overall impression of the Google workshop was it was mostly DevOps with some basic coding thrown in. 

In mid-2025, what skills do you need to develop agentic AI systems?

  • Software engineering
  • Devops.

There's no requirement for data science. In fact, you don't need to know how any of the LLMs work under the hood.

This is a brave new world.