Wednesday, February 11, 2026

Data is the new Lego

In 2018, I wrote a company blog post. As with most corporate content of this type, it was eventually deleted. But I liked what I wrote and I want to keep it, so I found it on the Wayback Machine and I'm reposting it here.

Reposting it serves another purpose. My post was plagiarized by someone who claimed it as their own. I want to own my work and not have other people claim it as theirs. Plagiarist have an easier time cheating if the original is hidden away on the Wayback Machine.

You can find the piece on the Wayback Machine here: https://web.archive.org/web/20190820192824/https://www.truefit.com/en/Blog/August-2018/Data-is-the-New-Lego 

It was written for the company True Fit: https://truefit.com/

(Gemini)

Here's the post.

-----------

When I was a child, I used to love playing with Lego, or “Legos” as my American friends often say; my brothers and I built spaceships and trucks and houses and animals. As time went on, our creations became more ambitious, functional, and lifelike. We could each have insisted our Lego was our own, but by pooling resources, we collectively went further. Family and friends gave us Lego including unusual and hard to find bricks, which enabled us to make more accurate models. We were growing up too, and as our play became more sophisticated, we learned how to build better models.

I’m not young anymore and my bones creak on cold mornings, but I still remember playing with Lego as I go to work each morning and play with data to build models. Using data to solve real world problems, like style, fit, and size recommendations, is surprisingly like my childhood Lego memories. To build something useful you need lots of data, data diversity, and the knowledge to build the right models in the right way.

If you don’t have enough Lego bricks, the things you build aren’t realistic; the model is crude, the colors don’t match, and there are gaps. It’s the same with machine learning and computer models; if you don’t have enough data, your models are crude, and you have quantitative and qualitative errors. The history of computer modeling is rife with examples of people making bad decisions using models made with incomplete data. In dealing with style, fit, and size recommendations, not enough data means giving bad advice because your models are too crude to accurately model people and garments. This is where pooling data wins; by pooling our Lego, my brothers and I could build what we wanted; in fashion, by pooling data from many retailers, you can build better models because you have a more complete picture of consumers’ behavior and the unique style characteristics, size, and shape of garments.

To build a good quality Lego model you need a diversity of pieces – models built with just the standard 2x4 bricks are crude and inaccurate. This is where getting Lego from friends and family was so useful – we got more diverse bricks that let us build more accurate models. In fashion, you need a diversity of data on people and garments too. Simply extrapolating from the average size to plus sizes is like using 2x4 bricks for everything; one size does not fit all and you end up with something that isn’t accurate for users who aren’t ‘average’.

Simply assuming US consumers and apparel are the same as German consumers and apparel is like using the same few Lego bricks for different models; different markets need different data. Simply believing a $20,000 dress fits the same as a $100 dress is like building Lego models when the special pieces you need are missing; it’s the kind of thing you do when you don’t have the data you need. In fact, having data on $100 and $20,000 dresses lets you build richer models that make better recommendations for all dresses. The key to good modeling is having data on a diverse set of consumers and garments. 

Young children make crude Lego models, the colors don’t match and the shapes are wrong; older children build working models with careful color schemes. A similar thing happens with data and algorithms. As you get to know and manipulate your data, your algorithms, and their interactions, you come to understand their limitations and you strive to build something better. As time goes by, increasing volumes of data point out the flaws in your work and you fix them– your models become better and better. In other words, the learning curve applies to building Lego and computer modeling.

It might be a brutal childhood truth, but the children with the most Lego, the best pieces, and the time to play produce the best models. The same brutal truth applies for any AI based machine learning or computer modeling project. The projects with the biggest data volumes, the most diverse data, and the best teams to use that data will produce the most accurate models. That’s why it’s fun to play with the massive data set from True Fit’s fashion Genome: it includes data from the largest number of retailers and brands; there’s a diversity of country, people sizes, and garments; and my colleagues know what they’re doing. There’s the added benefit of doing something novel and helping people find clothes they’ll love, that suit their personal style preferences, and will fit and flatter them – Lego models only make a few people happy but style, fit, and size recommendations can make millions of people happier by helping them connect more easily with the clothes and shoes that better express who they are and how they feel. Coming to work each day, it’s like playing with the world’s largest Lego set and it makes me happy.

Sometimes late at night, when it’s quiet and there’s no-one around to judge, I quietly put together Lego models. It’s a consoling and comforting reminder of my childhood, like eating ice cream, playing chase, and England losing in the World Cup. Lego has taught me a lot about data and models and collaboration. But there’s one big difference between building Lego models with my brothers and building computer models with my colleagues: I don’t fight with my colleagues quite so often.

True Fit is determined to improve the customer shopping experience by using its rich data collection from thousands of brands to provide accurate size recommendations. A larger collection of Lego increases the size of scope of projects that can be built just as a vast data collection increases the scope of customers who are provided with accurate style, fit, and size recommendations. To learn more about True Fit's data collection, called the Genome, visit here.

The perceptron

Why study the perceptron?

Perceptrons were one of the first learning systems and an important early stepping-stone to most recent AI innovations. That alone would be motivation enough to study them, however the reaction of the press, and the consequences of the hype, are a cautionary tale for us in 2026.

I'm going to share with you the why and the how of the perceptron, with some of the consequences of the hype.

Why do we care about systems that learn?

Go back to the 1950s, why would you care about a system that can learn? There’s the obvious coolness of it, but there are important real-world applications.

Photo analysts study reconnaissance photos looking for hidden bunkers or other items of military significance. The work is tiring and boring at times, but it’s hard to automate because it relies on human interpretation rather than a hard and fast set of rules. The “enemy” constantly changes how they disguise their installations, so whoever or whatever is analyzing photos must continually learn.

A similar problem occurs in post offices. If a post office wants to automate letter sorting, it has to automate reading handwritten addresses. Each person’s handwriting is different, which means creating definitive rules about letter or number formation is hard.

A learning system can adapt itself to new information and so stay productive when things change. In practice, this means it can be taught to recognize a new way a country is disguising a bunker or a new way someone is writing the number 5. It doesn’t require its creators to continually tweak settings. Of course, these automated systems can process letters or images etc. much faster (and cheaper) than human beings, which makes them very attractive.

Given the demand existed, how can you create a system that learns?

How do biological systems learn?

The obvious learning systems are biological. By the 1950s, we’d made some progress understanding how brains work, in particular, we had a basic understanding of how neurons worked, which are the lowest level of processing in the brain. 

Neurons take sensory input signals from dendrites into the soma, where the input is “processed”. If the input signal crosses some threshold, the soma fires an output signal (an action potential) through an axon. Neurons learn by changing the way they “weight” different dendrite signals, so changing the conditions under which they fire. 

The output of one neuron could be the input to another neuron and real brains have layers of processing. 

The picture below shows the arrangement for a single neuron.

(Gemini)

My explanation of how neurons work is very simplistic and in reality, it’s much more complicated. In real brains, neurons learn together and there are other biological processes going on involving dendrites. If you want to read more about biological neurons, here are some good references:

The perceptron 

In 1957, at the Cornell Aeronautical Laboratory in Buffalo, New York, the psychologist Frank Rosenblatt was studying human learning (specifically, the neuron) and trying to replicate it in software and hardware. His team built a prototype system, called the perceptron, that could “learn” in a very limited sense. The learning task was simple image classification.

(Rosenblatt and the perceptron. National Museum of the U.S. Navy, Public domain, via Wikimedia Commons)

The Mark I Perceptron input was a 20x20 photocell array; a photocell is very limited form of digital camera. These 400 inputs were fed to “association units” that weighted the inputs. The weights were set by potentiometers that were adjusted by electric motors. Importantly, the initial weights were random to avoid bias. The system summed the weights and used a simple threshold algorithm (response units) to decide the image classification, if the sum of the weighted signals was above the threshold the algorithm output a signal (a true output), if the sum of the weighted signals was below the threshold, the algorithm did not output a signal (a false output). Technically, the name of the threshold function is a Heaviside step function. If the perceptron made an error, the relevant weights were adjusted. The perceptron required 50 training iterations to reliably distinguish between squares and triangles.

(From the perceptron user manual.)

In 2026, this sounds really basic, but in 1957 it was a breakthrough. Rosenblatt and his team had demonstrated that a machine could learn and change how it “sees” the world.

References:

The perceptron theory

Here’s a simple representation of the perceptron. The inputs from the photocell are fed in and assigned weights. There’s a bias term to account for bias in the photocells, for example, the photocells might give a very small signal instead of zero when there’s no image. The weighted inputs (and the bias) are summed. If the weighted sum exceeds some threshold, the perceptron fires, if not, it doesn’t.

(The perceptron is a linear classifier, meaning it can only separates point on a hyperplane. In two dimensions, this means it can only separate points using a straight line.)

Mathematically, this is how it works.

\[u = \sum w_i x_i + b \]

\[y = f(u(x)) = \begin{cases} 1, & \text{if } u(x) > \theta \\ 0, & \text{otherwise} \end{cases}\]

In the vector notation used in machine learning, the equations are usually written:

\[y = h( \textbf{ w} \cdot \textbf{x } + b ) \]

where h is the Heaviside step function.

So far, this is pretty simple, but how does it learn? Rosenblatt insisted on starting training from a random state, so that gives us a starting point. Then we expose the perceptron to some training data where we know what the output should be (the data is labeled). Here’s how we update the weights:

\[ w_i  \leftarrow w_i + \Delta w_i \]

\[ \Delta w_i = \eta(t - o)x_i \]

where:

  • \(t\) is the target or correct output
  • \(o\) is the measured output
  • \(\eta\) is the training rate and \( 0 \lt \eta \leq 1\)

We update the weights and try again in an iterative loop. This continues until we can successfully predict the training data set within a certain error, or we’ve reached a set number of iterations, or we’re seeing no improvement. This is similar to how machine learning systems work today.

References:

Perceptron problems

There were lots of issues with the perceptron in its original form. Let’s start with the worst: the hype.

Rosenblatt gave interviews to the press on his system and they ran with it, but not in a good way. A 1958 New York Times article was typical, the headline read “NEW NAVY DEVICE LEARNS BY DOING; Psychologist Shows Embryo of Computer Designed to Read and Grow Wiser”, with a lede: “The Navy revealed the embryo of an electronic computer today that it expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.” Other press stories were similarly sensational and hyped the technology. The press very much set the expectation that walking, talking AIs were just around the corner. Of course, the technology couldn’t deliver what the press forecast, which helped lead to a loss of confidence.

The technical problems varied from the straightforward to the severe.

The original perceptron used a simple threshold to decide whether to fire or not, but this caused problems for training weights. Most important training algorithms use derivatives (for example, gradient descent). A simple threshold isn’t differentiable, which means it can’t be used in these kinds of training algorithms. Fortunately, this is relatively easy to fix using a differentiable function to replace the simple threshold. There are a number of possible differentiable functions, and a popular choice is the sigmoid function. (The function that decides whether to fire or not is now called the activation function).

A more serious problem is the logical limitations of the simple perceptron. As Minksy and Papert showed in 1969, there are some logical structures (most notably, XOR), you can’t build using the simple single-layer perceptron architecture. Although multi-layer networks solve these problems, the Minsky and Papert book and their papers significantly damaged research in this area, as we'll see.

This is only a summary of the difficulties the perceptron faced. For a fuller description, check out: https://yuxi.ml/essays/posts/perceptron-controversy/

What happened next

By the early 1970s, the hype bubble had burst. Minsky and Papert’s book had an impact and governments found disappointing results from funding perceptron-based projects; projects promised big results, but in reality, very little was produced. Governmental patience eventually wore thin and eventually they concluded this form of AI research wasn't worth funding. The research money went elsewhere leading to the first “AI Winter” which lasted for a decade or so. 

Sadly, AI experienced another hype bubble and collapse in the late 1980s, a second "AI Winter". As a whole, AI research began to get a bad reputation.

The “AI Winters” bled talent and money away from neural network development, but research still continued.  Although multi-layer networks had been developed by the 1960s, it wasn’t known how to train them until the Rumelhart, Hinton, and Williams 1986 paper “Learning representations by back-propagating errors” [https://www.nature.com/articles/323533a0] popularized the back propagation method. Convolutional Neural Networks (CNNs) using back propagation and a convolutional structure were demonstrated in 1989. With these technologies as the backbone, LLMs were developed starting in the mid-to-late 2010s. It’s only the enormous success of LLMs that has brought a flood of money into AI research and a resurgence of interest in its origins.

Rosenblatt had a wide variety of research interests, including astronomy and photometry (measuring light). By any measure he was a genius. Unfortunately, in 1971 he died at the age of 43 in a boating accident. His death was just a few years into the first "AI Winter", so he saw the hype and the subsequent bubble bursting. Sadly, he never go to see how the field eventually developed.

Thoughts on the story

The original perceptron was very much based on what had gone before, but it was a breakthrough and ahead of its time, which was part of the problem. The necessary technology wasn’t there to advance quickly. Unfortunately, the hype in the press, fed by  Rosenblatt and others, set unrealistic expectations. While great for short-term research funding, it was terrible for the long-term when the hype bubble burst.

AI as a whole has been prone to hype cycles through its entire existence. It's no wonder there's a lot of discussion online about the latest AI bubble bursting. My feeling is, it is different this time, but we're still in a bubble and people are going to get hurt when it eventually pops.

Monday, February 9, 2026

Learning by hand is better than learning by AI

Accelerating learning with AI?

Recently, I've been learning a new LLM API from a vendor. There's a ton of documentation to wade through to get to what I need to know and the vendor's examples are overly detailed. In other words, it's costly to figure out how to use their API.

(Gemini)

I decided to use code gen to get me up and running quickly. In the process, I found out how to speed up learning, but equally important, I found out what not to do.

Code gen everywhere!

My first thought was to code gen the entire problem and figure out what was going on from the code. This didn't work so well.

The code worked and gave me the answer I expected, but there were two problems. Firstly, the code was bloated and secondly, it wasn't clear why it was doing what it was doing. The bloated code made it hard to wade through and zero in on what I wanted. It wasn't clear to me why it had split something into two operations, despite code gen commenting the code. Because I didn't know the vendor's API, I couldn't be sure the code was correct; it didn't look right, but was it?

Hand coding wins - mostly

I recoded the whole thing by hand the old fashioned way, but using the generated code as an inspiration (what function to call and what arguments to use). I tried the LLM calls in the way I thought they should work, but the code didn't work the way I thought it would. On the upside, the error message I got was very helpful and I tracked down why it didn't work. Now I knew why code gen had made two LLM calls instead of one and I knew what outputs and inputs I should use.

The next step was properly formatting the final output. Foolishly, I tried code gen again. It gave me code, but once again, I couldn't follow why it was doing what it was doing. I went back looking at the data structure in detail and moved forward by hand.

But code gen was still helpful. I used it to help me fill in API argument calls and to build a Pydantic data structure. I also used it to format my code. Yes, this isn't as helpful as I'd hoped, but it's still something and it still made things easier for me.

Why code gen didn't work fully

Code gen created functioning code, not tutorial code, so the comments it generated weren't appropriate to learn what was going on and why.

Because I didn't know the API, I couldn't tell if code gen was correct. As it turned out, code gen produced code that was overly complex, but it was correct.

Lessons

This experience crystallized some other experiences I've had with AI code gen.

If I didn't care about understanding what's going on underneath, code gen would be OK. It would work perfectly well for a demo. Where things start to go wrong is if you're building a production system where performance matters or a system that will be long-lived - in these cases the why of coding matters.

Code generation is an accelerator if you know what you're doing. If you don't know the libraries (or language) you're using, you're on thin ice. Eventually, something bad is going to happen and you won't know how to fix it.