"Nobody knows how AI works"

Tech leaders keep calling AI a "black box". But every step inside is legible; the mystery is emergence, not mechanism — the same puzzle physics has lived with for a century.

It's become a fashionable thing to say. In 2023, Google's Sundar Pichai called generative AI a "black box", conceding that even the experts can't fully explain why it produces what it does. In April 2025, Anthropic's Dario Amodei devoted an entire essay, The Urgency of Interpretability , to the point — arguing that we don't truly understand our own creations, and calling the situation "essentially unprecedented in the history of technology". MIT Technology Review ran a piece headlined, simply, "Nobody knows how AI works".

It makes for a great soundbite, or clickbait. And of course, it's not quite true, or at least it's not a helpful or accurate way of stating the matter. These remarks aren't wrong, but they’re aimed at explaining the fundamental difference between AI models and computer programs to an audience that struggles to understand why AIs can get basic facts wrong and do unexpected things. Yet stripped of that context, they imply something close to magic: that we built a machine whose inner workings are a mystery even to the people who built it. That isn't what's meant — and the difference matters.

When people say we "don't know how AI works", they’re really talking about the emergent properties from running at a scale that involves so many inputs (to create the ‘trained’ statistical model) and so many calculations to push words through it in operation, that nobody can keep track of it all well enough to say exactly why a given input led to a given output.

Yet we still know essentially how it works; every step is a matrix multiplication, a weighted sum, a softmax. There is no hidden machinery. What we can't easily do is point at a set of numbers and say "that's where it learned irony". That doesn’t mean we don’t know where those numbers came from, or that we didn’t deliberately design the mechanism and the training corpus to produce such a result, even if making good predictions about those results is often beyond reach.

We know how to do this

This kind of situation is indeed unprecedented in computing — but not elsewhere in science.

Take thermodynamics, as one example. We've understood the laws governing a single molecule for well over a century. Yet temperature, pressure and boiling points aren't written into any one molecule; they emerge from trillions of them, and we describe them statistically. Knowing the rules for one particle tells you almost nothing about when the kettle will boil.

Or turbulence. The Navier–Stokes equations have described fluid motion since the 1800s, and nobody doubts them. But they're so intractable that we still study turbulence with wind tunnels and simulations rather than solving them outright.

An LLM is the same shape of problem. The units are simple and fully understood. The behaviour is emergent, statistical, and best studied by observation rather than derivation. We tune the parameters, watch what happens, refine our theories of the middle, and try again — exactly as scientists in many other disciplines have always done.

How does it work then?

So what is an AI doing inside? Surprisingly little — a handful of operations, repeated billions of times.

Text is first chopped into tokens (roughly, word-fragments), and each token becomes a list of numbers — a vector — that pins its meaning to a position relative to other tokens. Words used in similar ways sit near each other.

Those vectors pass through weight matrices: vast grids of numbers the model settled on during training. Multiplying a vector by a matrix is just arithmetic, but stacked deep it lets the model reshape and recombine information.

The clever bit is called attention, the heart of the transformer. For every token it asks: which other tokens matter here, and by how much? It scores each word against all the others across the input text (the context window), so that for example, each "it" can be tied to the previous noun that it’s talking about. That is how context is held together as the model works, and reducing the problem of keeping track of the relationships between words throughout a text, to this simple — yet computationally expensive — idea, was the key breakthrough in 2017 that brought us to where we are today.

The softmax function then turns those raw scores into proportions that add up to one — "70% of the attention here, 20% there..." The very same function chooses the next word: not a single correct answer, but a probability spread across the whole vocabulary, from which one is sampled.

That's the whole trick. Tokens in; matrices multiply; attention weighs context; softmax turns numbers into probabilities; repeat. Every step is legible. At no point does the maths become unknowable.

From reflex to reflection

What I've described so far is a single pass: prompt in, answer out. On its own, that's closer to reflex than thought — the model blurts out its best statistical guess, much as the brain completes a familiar pattern before you've consciously registered it. And this of course is the root of ‘hallucinations’ in AI output, just as the things we sometimes register in “the corner of our eye” are often not what our first reaction might have been to them. Under more extreme conditions, such as vision loss through macular degeneration, the brain is quite happy to do exactly the same as an LLM and predict what could go in the gaps based on past experience.

The systems we actually use add layers on top of that reflex.

Reasoning models are coaxed to think "out loud" before answering — generating intermediate steps, checking their own work, revising. The machinery underneath is unchanged; it's the same forward pass, run again on its own output. But the behaviour shifts from blurting to deliberating: the difference between an instinctive answer and one you've worked through on paper.

Agents wrap the model in a loop — plan, act, observe the result, adjust, repeat — often reaching for tools or memory along the way. This is goal-directed behaviour: pursuing an aim across many steps rather than reacting to a single prompt.

Guardrails sit alongside as inhibition — filters and learned constraints that suppress certain responses before they surface, much as the brain checks an impulse before it becomes an act.

The parallel with the human brain is hard to miss. We don't house thought in any single neuron; it emerges from layered systems — brainstem reflexes, emotional responses, deliberate reasoning, the executive control that arbitrates between them. Stack enough layers and behaviour starts to look, from the outside, like deliberate thought.

Whether anything is experienced in there is a far harder question, and the uncomfortable truth may be that it’s not a million miles away from what goes on in a biological human brain. But resembling deliberation is not necessarily the same as being conscious — and we have no agreed test to tell the two apart, in silicon or in skulls.

In any case, the mechanical point holds at every level: reflex, reasoning, planning, restraint are each an orchestration of legible steps. As the stack grows taller, it becomes less feasible to trace, alter or even observe… but it never becomes magic.

In conclusion

We know perfectly well how AI works. And at the most basic level, it was designed and built deliberately to work the way that it does. The processes happening inside an LLM are not even difficult to grasp. We just can’t control and engineer AI with the same sort of precision as a computer program, simply because it is a statistical model and not a series of logical tests and instructions.

What I think genuinely does surprise anybody who takes the time to learn the mechanisms at work in AI, is exactly how well it works, just by scaling a simple process up into billions of parameters, feeding enough curated training text into it, then letting it talk to itself a while to figure out the questions it needs to ask and answer to produce a good result.

No comments yet