Code Was Never About Software

AI labs trained their model for the most specific skill. They got the most general capability.

Feb 10, 2026

There’s a moment in The Matrix when Neo stops running from Agent Smith. The hallway fight goes silent. Neo’s eyes shift focus. The walls, the Agent, the bullets, all of it dissolves into cascading green code.

This is happening right now in the digital world.

Every spreadsheet formula, every email routing rule, every database query, every API call, we realize it’s all code. Walls and doors are just user interfaces. The substrate is code, all the way down. And we just built AI that read and write code the way Neo reads the Matrix.

The frenzy around Claude Code, Codex, ClawdBook and other coding agents isn’t hype. It’s the sound of people realizing what they actually built.

Labs trained AI to code because code was measurable.

What they didn’t expect was the Specificity Paradox: training for the most specific skill produced the most general capability.

They aimed for developer productivity. They built a universal automation layer for the digital world.

A minimalist red line figure gazes upward into a dense cascade of code characters — visualizing the moment the digital world reveals itself as pure programmable substrate, the article's Neo metaphor.

Why Labs Trained for Code

That wasn’t the original plan. AI labs focused on coding because it’s a massive market and a clean training signal.

When you train on producing content, you need complex systems to steer the AI: once you’ve solved grammar and factual accuracy, what makes one article objectively better than another? That’s a hard question. Code is simpler. It has binary success criteria: the tests pass or they fail. No ambiguity. You can measure improvement.

So labs invested in coding because they could track progress. Every benchmark improvement was legible. But something unexpected emerged. Researchers (Arvix) found that reinforcement learning on code and math consistently improved performance on scientific reasoning, planning, and instruction-following — domains far removed from programming. Code’s binary feedback loop (compiles or crashes) turned out to be the ideal training ground for general problem-solving. Train an AI to decompose a coding problem, and it learns to decompose any structured problem.

Coding became the clearest measurable proxy for general intelligence. Not because it’s the most important skill, but because it’s the most verifiable one — and verifiability, it turns out, is what makes learning transfer.

Then there’s the market. McKinsey pegs the software engineering market at $2.6 to $4.4 trillion. An AI that can code can capture a significant share of this. Sam Altman said that in many companies, AI-generated code is “probably past 50% now” — and that was a year ago. At frontier labs today, some engineers don’t write code at all. They direct agents that do.

But what labs realized is that when training an AI to code, in reality you’re not teaching it a skill. You’re teaching it to manipulate the substrate of the entire digital world.

Digital = Code

Dean W. Ball is a policy researcher. Not a developer. In one month, he built an autonomous options trader, a prediction market agent, a corn yield prediction model using satellite data, automated his legislative research pipeline, created an art market monitor, replicated three machine learning research papers, and built a personal blog with a content management system.

None of these are coding tasks. All were done through code.

The pattern is the point. These aren’t coding tasks solved with code. They’re information tasks that were always code underneath.

A financial analyst reading SEC filings, extracting key metrics, and building a comparison table is executing the same read-process-output-verify loop as a developer parsing logs and generating a bug report. The coding agent doesn’t need to understand finance. It needs to understand files, data structures, and output formats. It already does.

This is why the command line matters. As Nathan Lambert puts it, a coding agent doesn’t need to be restricted to software development — it can control your entire computer. The CLI (command line interface, this green line with a blinking cursor) is the raw interface to the digital world. No UI constraints. No walled gardens. File systems, databases, networks, APIs, cloud infrastructure — all reachable. And if you can reach it, you can automate it.

I didn’t fully understand this until I started using Claude Code to run this newsletter. I can ask it to scan thirty research sources, extract the key arguments, cross-reference them, and generate a structured brief. Editorial work, not coding. But underneath, every step is code — reading files, parsing text, writing output. The workflow is programming. I just described it in English instead of Python.

What Neo Sees

The $15 Trillion Target

If the digital world is code all the way down, then coding agents aren’t confined to software. Doug O’Laughlin calls coding the beachhead, not the destination. The real target is the $15 trillion information economy — finance, legal, consulting, healthcare, analysis. Think about what information workers actually do. They read unstructured material, apply domain knowledge, produce structured output, and verify it against standards.

Agents already run this loop for software. They’ll run it for everything else.

Accenture is training 30,000 professionals on Claude — targeting financial services, life sciences, healthcare, public sector. Not developers. Information workers. The beachhead is established. Now comes the advance.

The Collapse of Software

If an agent can query a database, generate a chart, and email it to a stakeholder — what’s the CRM for? What’s the BI dashboard for? The polished user interface loses value.

The wrappers lose value. The substrate gains it.

This is the dynamic I wrote about last year — software becoming ephemeral, invoked on demand rather than installed and maintained. But the mechanism is clearer now. Coding agents don’t just replace apps. They can build the app, use it, and discard it, all in seconds. Why maintain a CRM when you can generate the exact workflow you need, run it once, and move on?

Bain & Company identifies the three classic SaaS moats — data lock-in, workflow lock-in, integration complexity — and finds all three eroding as agents migrate data, bypass UIs, and simplify integration. Yet Ben Thompson argues software companies have more moats than skeptics recognize: compliance infrastructure, audit trails, deep workflow embedment. Boring defenses, but real ones. The honest read: the last decade of SaaS was about growing the pie. The next decade will be about fighting for it.

Microsoft sees the threat. They’re renting compute to companies — OpenAI, Anthropic — that are dismantling their Office 365 moat. O’Laughlin puts it bluntly: accelerate Azure growth, and you arm the companies tearing down your productivity software castle. Protect Office 365, and you starve the cloud revenue that represents your future.

The Mastery Gap Returns

But the most important consequence isn’t economic. It’s human.

Dean Ball compares coding agents to the piano — easiest instrument to start playing, hardest to master. Anyone can produce a satisfying tone on a piano. Getting to Carnegie Hall takes decades.

Same with coding agents. The barrier to entry collapsed. Allie K. Miller describes the shift from execution to “director” role — you’re no longer writing code, you’re orchestrating what gets built. At Anthropic, engineers now delegate 90% of code to Claude. Boris Cherny, creator of Claude Code, shipped 259 pull requests in one month without opening an IDE once.

But those numbers hide a prerequisite. Cherny knows what good code looks like. He can read a diff, spot an architectural flaw, reject a pull request that passes every test but solves the wrong problem. The 90% he delegates is the 90% his expertise makes delegatable.

Delegation isn’t abdication. Knowing what to build, how to verify it, when to push back — that still requires expertise. You can’t automate what you can’t articulate.

The gap between having the tool and using it well remains as wide as ever.

This is where the Neo metaphor breaks down, and that’s instructive. In The Matrix, once Neo saw the code, he had godlike power. Instant mastery. In the real world, seeing the code is just the beginning. Knowing what to build with it — that’s the hard part, and the irreducibly human part.

We’re all Neo now, standing in front of the Matrix. Some will reshape it. Most are still learning to see.

Olivier Martinez

Feb 10

Sorry in french ;)

Ton article le dit très bien : les labs ont entraîné sur le code parce que le feedback est binaire. Ça compile ou ça plante. Pas d'ambiguïté. Et c'est précisément ce signal clair qui a permis le "transfer learning" vers le "raisonnement structuré". Mais il y a un paradoxe que tu traverse à mon avis : cette même propriété qui a fait du code le terrain d'entraînement idéal est exactement ce qui rend l'extrapolation compliquée dans notre monde "d'information" au sens théorie et pratique de l'information. Dans le code, "verify" veut dire faire passer un test. Dans l'information au sens le plus général, "verify " veut dire évaluer la fiabilité ou juger de la pertinence vs le monde réel, ou encore repérer ce qui manque dans un raisonnement qui a l'air complet. Aucun test unitaire au monde ne sait faire ça, mais je peux me tromper. Ton rticle met la vérification comme 4ème temps d'une boucle automatisable (read-process-output-verify), mais pour quiconque a déjà vérifié une information, c'est un peu comme mettre "jouer du piano" comme 4ème étape d'un tuto en cinq points, pour reprendre l'image ;)

Si je prends comme exemple la génération d'un document quelconque. L'agent cherche et lit des sources, les traite, et produit/génère. Le résultat a l'air solide. Mais pour moi c'est là que ça se complique, parce que le 4ème temps, celui qui fait la différence entre un document utile et un document inutile ou dangereux, n'est pas de la même nature que les trois précédents. Ce n'est pas une opération : c'est un jugement. Et un jugement, pour moi ça ne se code pas en Python ni ne se décrit in extenso en anglais/français. La cible à 15 000 milliards existe sûrement, mais elle repose sur l'hypothèse que "verify" est un verbe comme les autres dans la boucle.

Pour l'instant et de mon point de vue, "verify" c'est le verbe qui résiste. Même si les labs, par exemple Anthropic pour ne pas le nommer, tentent tant bien que mal à coup de "contitutions" et autres "rails guards" d'y parvenir. Mais s'ils y arrivaient vraiment, le problème alors serait bien plus compliqué : qui aurait la maitrise du jugement/verify au final ? L'humain ou un modèle guidé par une constitution "imposée" ? Ok je dévie ;)

4 replies by Jean-Paul Paoli and others

Robert M. Ford

Mar 14

I keep coming back to your specificity point. I've been building with multiple AI tools on the same product — Claude, ChatGPT, Lovable — and the thing that actually makes it work isn't the models. It's three markdown files: architecture.md, constraints.md, and a decision log. Every session starts with those files as context.

These files are specificity artifacts. They're what's left after code stops being the container for intent. The interesting thing is they compound — every decision logged makes the next session's output more precise, across tools that share no memory with each other.

Your framing names what I've been watching happen in practice but hadn't articulated at that level: the discipline of specificity outlives the medium it was expressed in.

5 more comments...

Discussion about this post

Ready for more?