Dude, where’s my Kaizen?

I Spent Twenty Years in the Future

2026-03-23T00:00:00+00:00

Last Sunday I was at a community conference on agentic AI in Hamburg. The talks were good, but that’s not what stuck with me. It was the mood. The room was buzzing — that specific energy you get when a group of people is genuinely excited about what they’re doing and building - and looking very positively into the future.

The conference was held in English, as these things tend to be. International visitors, international speakers. Nobody thought twice about it.

Then I walked out, checked my phone, and the world outside that room felt like a completely different reality. What’s playing out on a geopolitical level is the fuel for nightmares and tragedy at a larger and larger scale.

The difference between those two perceptions caused me to be a bit nostalgic. And since writing things down is a good way to channel emotions, here we go.

The language that connected us

For the last fourteen years, English has been the language I use day-to-day. Not because I moved to an English-speaking country — because I worked with international teams where English was the bridge.

The nationalities kept adding up. Colleagues from the US, Ukraine, Russia, South America, Africa, the Middle East, Romania, Bulgaria. I even managed someone in China. At some point the list stopped being worth mentioning, because it was just how work looked for me. You joined a team, there were people from everywhere, you got on with it. English made it unremarkable.

And what a blast it was. Getting to know all of these people, working with them, building things together. Some of them became friends — the kind that stick around long after you’ve left a company and moved on. A lot of my closer friendships today are international, scattered across countries and time zones, because that’s the world I’ve spent my career in.

The people I actually know

Here’s what strikes me. I’ve recently worked in a company where colleagues of Ukrainian and Russian origin got along perfectly well. Not despite the political tensions between their countries — it just didn’t come up that way. They were colleagues. They were good at their jobs. That was the relevant fact. And they showed each other that they genuinely cared for each other.

Same thing with Americans. The time I spent working working remotely for an US company, I got to know a lot of great people. I wouldn’t want to miss that experience. Yet when I see what their country is doing right now, it couldn’t be further from how I’d like the world to be. The Americans I know — the ones I worked with, ate lunch with, solved problems with — bear no resemblance to what their government currently projects.

I’ve had interactions with colleagues from the Middle East too. Every single one: respectful, on an eye-to-eye level. I’ve personally never been in a discussion about whose religion is more appropriate or whose way of living is better. It just wasn’t part of the equation.

There’s a distance between the people I actually know and the headlines about their countries. And the distance is enormous.

What worked, at least in my experience

Looking back, I think what made all of this work was surprisingly simple. Nobody had to give up who they were. Nobody had to pretend to be someone else. The deal was: show up open, do good work, be respectful, meet people halfway. Integration, not assimilation. There’s a difference, and it matters.

I realise this sounds a bit like Gene Roddenberry’s Star Trek — the bridge of the Enterprise, where it didn’t matter where you came from as long as you were competent and brought the right attitude. And honestly? That’s not a bad description of the best teams I’ve worked on. Origin mattered less than what you contributed. Different perspectives in one room didn’t create friction — they created something better than any of us could have produced alone.

That wasn’t a philosophy I adopted from a book. It’s just what happened when the conditions were right.

The bubble I live in

Now, I’m not naive about this. I know I live in a bubble. I work with highly educated people. I always have. The international, English-speaking, mobile professional class — that’s my world. And it’s a self-reinforcing loop: education leads to better jobs, better jobs lead to international exposure, international exposure loosens your attachment to national identity, and the whole thing feeds back on itself.

The same globalisation that opened my world narrowed it for others. I see that. People whose jobs moved to other countries, whose economic security eroded while mine grew. I think that’s a decent part of what fuels the enthusiasm for Trump in the US, for the AfD in Germany. People lost things during these decades of change. Real things. I’m not going to dismiss that.

What I notice, though, is that the rightward push happening in Germany doesn’t pull me. And I think it’s precisely because of that loop. My sense of belonging was shaped more by those international teams than by national identity. When I see the push towards “this country belongs to us,” I feel oddly disconnected — not because the sentiment is illegitimate, but because it doesn’t map to my experience. Through all of those years working across borders, Germany as a state kinda stopped being the primary thing I identify with.

I’m not sure what to make of that. But I’m not going to pretend it’s not there.

What’s left

I’m not trying to convince anyone of anything here. This is just what’s going on in my head at the moment.

I spent twenty years in what was supposed to be the future. The international, collaborative, borders-don’t-really-matter version of the world. And in my experience, it worked. Not as a utopia, not without its blind spots and costs — but as a way of working with people, it genuinely worked. People are people. They’re not their countries.

I’m not sure that future is still coming for everyone. The structures that made it easy — open borders, shared optimism, the assumption that connection was better than isolation — those feel less certain than they did even five years ago. The people haven’t changed. My Ukrainian friends are still my friends. My American friends are still my friends. The few Russian friends I got to know better are still my friends. But the scaffolding around those relationships is shifting in ways that make me uneasy.

Maybe this is just a person in his forties being nostalgic about the past. Maybe the window I worked inside was always smaller than it felt from the inside. I don’t know. But it happened, and it was good, and that feels worth writing down.

Who Owns Web 4.0?

2026-02-27T00:00:00+00:00

The first post in this series laid out the architecture. The second made the consumer case. The third explored what the shift means for builders. Each one was implicitly asking a question I’ve been deferring: if this platform takes shape, who controls it?

That question matters more than usual. Prior platform transitions created gatekeepers — Microsoft for the desktop, Google for the web, Apple and Google for mobile. Each gatekeeper captured the economics of their era. An LLM platform that mediates your transactions, research, and decision-making concentrates more personal data and economic leverage than any platform before it. Who controls it isn’t just a business question. It’s a structural one.

And the answer is less settled than most coverage suggests.

Google’s Innovator’s Dilemma

I mentioned in the first post that Google’s absence from MCP adoption looked like the innovator’s dilemma playing out in real time. That’s worth unpacking, because Google’s strategy represents the strongest counterargument to the entire platform thesis.

Google derives roughly 75% of its revenue from advertising, overwhelmingly through search. An LLM platform that routes user intent directly to services — bypassing the search results page entirely — is an existential threat to that model. Google’s response has been to embed Gemini into Search, Gmail, Docs, and the rest of its product suite. This is the rational strategy of the company with the most to lose: make AI a feature of your existing products, not a new platform that cannibalises them.

And it might work. If Google’s “AI as feature layer” approach succeeds — if users prefer AI-enhanced search over delegation to an agent — then no new platform gatekeeper emerges. The current distribution and attention structure holds. Google stays dominant. This is the version of the future where my thesis is wrong, and it’s credible specifically because Google has billions of users, the most popular browser, the dominant mobile OS, and an advertising model that funds aggressive integration.

But Google’s resistance to MCP — building proprietary integrations rather than adopting the open protocol — is a tell. It signals that Google sees MCP-style cross-platform interoperability as a threat, not an opportunity. The company that won the web by being the best at routing user intent is betting against the next generation of intent-routing infrastructure. That’s the classic innovator’s dilemma: the rational move for the incumbent is the wrong move for the market.

Three Bets on Where the Platform Lives

If Google is playing defence, three companies are playing offence — and they’re making fundamentally different bets about where platform control sits.

Apple: the device controls the agent

Apple has the structural advantage that should make this obvious: devices (iPhone, Mac, Watch, AirPods), a payments infrastructure (Apple Pay), a consumer identity system (Apple ID), a developer ecosystem (App Store), and the trust that comes from a decade of privacy positioning.

The multi-device handoff is Apple’s natural strength. Earbuds handle conversation and low-stakes delegation — “remind me to cancel that subscription” or “what’s the weather in Porto next week.” High-stakes confirmation flows to the phone, where the screen and biometrics live — “confirm this EUR 340 booking with Face ID.” The device ecosystem works together. No other company has this.

But Apple hasn’t executed on the model layer. Apple Intelligence has been underwhelming by industry consensus. The privacy constraints that make Apple trusted also make cloud-based inference harder — you can’t route personal data through a cloud model and maintain the privacy brand. On-device inference limits model size and capability. Apple’s structural advantage is irrelevant if they can’t ship a competitive agent.

The race is whether insurgents establish distribution and consumer habits before Apple catches up. Historically, Apple has been late to categories and then won them — smartphones, tablets, watches. But they’ve also been late and lost — social, search, maps, streaming. This could go either way, and it depends heavily on whether Apple’s privacy constraints are a moat or a shackle on the model layer.

OpenAI: hedging across two layers

The OpenClaw acquisition gives OpenAI a different kind of reach. OpenClaw runs on any private compute — it’s a self-hosted agent, not a platform-hosted one — but it uses WhatsApp, Telegram, and Signal as communication channels, meeting users in the apps they already open fifty times a day. The Jony Ive hire signals a purpose-built device play. ClawHub provides a marketplace. This looks like a full platform stack being assembled: model + agent + app store + distribution + device.

But Ben Evans makes a compelling argument that OpenAI’s stack looks ambitious and lacks the coercive power that made prior platform stacks work. No network effects. No structural lock-in. Shallow engagement — roughly 5% of ChatGPT’s users pay, and 80% sent fewer than 1,000 messages in 2025. Product strategy flows from research, not user needs. Evans cites Fidji Simo: “I get a researcher pinging me saying: ‘I have something pretty cool. How are you going to use it in chat?’” — the opposite of Jobs’ “start with the customer experience and work backwards.”

A Wardley Map analysis of the AI platform wars adds context. Enterprise market share has shifted — Anthropic from 12% to 40%, OpenAI declining from 50% to 27%. OpenAI’s projected $14B in losses against Anthropic’s path to profitability suggests fundamentally different strategies: OpenAI is executing a capital-intensive land-grab; Anthropic is running a protocol-and-partnerships play. The analysis draws a Palantir analogy for OpenAI’s forward-deployed engineer approach — effective for enterprise penetration but expensive and hard to scale — and calls out Microsoft as a cautionary tale of distribution without differentiation.

Evans’ critique sharpens why the OpenClaw acquisition matters so much. It’s not just a talent hire — it’s an attempt to manufacture the engagement depth that OpenAI currently lacks. If users interact with agents through the messaging apps they already live in, the engagement problem looks different than “log into ChatGPT and type a prompt.” Whether that works is genuinely uncertain.

There’s a more fundamental fragility underneath. OpenAI’s strategy is the most capital-intensive of the three — projected losses of $14B, sustained by venture funding and the Microsoft relationship. If the engagement depth doesn’t materialise, if the revenue model doesn’t close the gap, the burn rate becomes existential. This isn’t hypothetical risk-mongering — it’s the arithmetic of a company spending at infrastructure scale without infrastructure-scale revenue. A bankruptcy or distress acquisition wouldn’t kill the ecosystem. OpenClaw is open-source. ClawHub’s marketplace patterns would persist. Consumer habits, if they’d formed, would migrate. Android would have survived Google’s hypothetical failure in 2008. But it would abruptly reshape who controls the platform — which is exactly the question this post is about.

Anthropic: the protocol is the platform

MCP as the HTTP for agent interaction. MCP Apps as the rendering layer. No device hardware. No messaging distribution. The bet is that the protocol layer is where value accrues, and that devices are interchangeable clients — just as browsers are interchangeable clients for the web. Claude is a client, ChatGPT is a client, VS Code is a client, and any future agent can be a client. If you control the standard they all use, you don’t need to control the device.

This is the most elegant strategy and arguably the riskiest. It depends on MCP adoption continuing — which it has, impressively — but also on the protocol layer being the point of leverage rather than the client layer. HTTP is open and foundational, but the platform economics of the web didn’t accrue to the protocol designers. They accrued to Google, Facebook, Amazon — companies that built the best discovery and services on top. Anthropic’s bet is that the AI platform era plays out differently. It might. But the historical pattern isn’t encouraging for protocol owners.

There’s a paradox worth naming. Anthropic is leaner than OpenAI — closer to profitability, less dependent on a single capital partner — but still pre-profit and burning cash in a race with better-funded competitors. If Anthropic fails, though, something interesting happens: MCP doesn’t die with it. The protocol is open, already adopted by competitors, already the basis for a growing ecosystem. HTTP didn’t need its designers to persist. If MCP survives Anthropic — and I think it would — that’s actually evidence for the protocol thesis. The standard becomes infrastructure that outlasts its creator. It also means the protocol bet might succeed even if the company that made it doesn’t.

And the fragility isn’t only financial. Anthropic is currently facing pressure from the Pentagon to remove restrictions on military use of Claude. Defense Secretary Hegseth has threatened to invoke the Defense Production Act or declare Anthropic a “supply chain risk” if it doesn’t grant the military full, unrestricted access — including areas the company considers red lines, like mass surveillance and autonomous weapons. The company whose safety positioning underpins MCP’s credibility is being told to compromise that positioning or face existential consequences. For the protocol thesis, this sharpens the question: if Anthropic’s independence is compromised — by financial pressure, acquisition, or political coercion — does MCP’s openness protect the ecosystem? Or does the protocol’s credibility depend on its creator remaining independent and principled?

Notably, neither Evans nor the Wardley Map analysis covers Apple. Both focus on the model-layer competition and largely ignore the device layer. I think that omission is itself revealing — the device play is the underexplored bet in the current discourse.

The Missing Layer: Discovery

This is the thread I’ve been deferring since the first post and that the third post flagged explicitly: MCP today is like HTTP without DNS and without search engines. The protocol works. Finding the right service for a given intent doesn’t.

Right now, if you want to use an MCP tool, you need to know it exists, install it, configure it. That’s roughly where the web was in 1993 — you needed to know the URL. What made the web into a platform was the discovery layer: Yahoo, then AltaVista, then Google. What makes Web 4.0 into a platform is the equivalent layer: user expresses intent, a discovery system identifies relevant services, the right schemas get loaded dynamically.

Whoever builds this controls the platform economics. The protocol is open — anyone can build an MCP server. But which server gets invoked for a given user intent is a discovery decision, and discovery has winner-take-most dynamics. Google didn’t control HTTP. Google controlled which websites you found. The same structure applies here.

This is also where the platform services stack gets built: identity (who is the user?), payments (how do transactions settle?), trust infrastructure (biometric confirmation, audit trails, dispute resolution), and legal liability (who’s responsible when the agent books the wrong flight?). None of these exist yet in a standardised form for agent interactions. Whoever bundles them into a coherent platform services layer creates something providers can’t easily replicate on their own — and something they’ll depend on.

The third post explored the prisoner’s dilemma: providers join because the alternative — being invisible while competitors are visible — is worse. But that dilemma only kicks in when consumer attention consolidates on a small number of agent interfaces. Whether that happens, and how quickly, is the central uncertainty.

Multi-Agent Architecture and Platform Power

There’s a competitive dynamics dimension I haven’t addressed yet, and I think it matters more than it’s getting credit for: multi-agent architecture.

The current framing — one general agent mediating between user and services via MCP — is a simplification. What’s already emerging is a layered model: a general orchestrator that routes to specialised sub-agents. These specialised agents might be smaller, domain-trained models that handle specific verticals — a medical agent trained on clinical literature, a legal agent on case law, a financial agent calibrated for regulatory compliance.

This is a different economic tier than MCP services. An MCP server exposes an API: “here are the tools I offer, here’s how to call them.” A specialised sub-agent is a model trained on proprietary data that handles a category of reasoning. The distinction matters for competitive dynamics.

Who trains the specialised models? If the general platform trains them on licensed domain data, the platform captures another layer of value — it’s not just routing to services, it’s becoming the service for entire categories. If domain experts train and host their own specialised agents, the model looks more distributed. Both paths are plausible, and the outcome probably varies by domain. Healthcare data governance is different from travel inventory.

Who controls the routing? The general orchestrator deciding which specialised agent handles a given request is another form of platform power — a layer above MCP discovery. If the orchestrator routes your medical question to one specialised agent over another, that’s a decision with economic consequences. The same discovery-layer dynamics apply, one level of abstraction up.

Does this widen or narrow participation? On one hand, multi-agent architecture could broaden the ecosystem. A domain expert who can’t build an MCP server might fine-tune a specialised model on their proprietary data and make it available through the orchestration layer. On the other hand, training and hosting models requires more capital and technical sophistication than building an API endpoint. The barrier to the most valuable tier of participation might actually be higher.

My instinct is that multi-agent architecture reinforces platform dynamics rather than disrupting them. The orchestration layer — deciding which specialised agent to invoke, how to combine outputs from multiple sub-agents, when to escalate to more capable (and expensive) models — is a natural fit for the general platform to control. And the economics of “route to the cheapest model that can handle the task” create competitive pressure that favours whoever controls the routing.

The implication for the discovery question is direct. The platform doesn’t just need to discover MCP servers for a given intent. It needs to identify the right specialised agent, the right MCP services, and orchestrate them together. More complex routing means more platform leverage.

Technology Sovereignty

The third post noted that not everyone will hand their data to a US platform — the demand for delegation fragments by trust model. I said I’d come back to this. The geopolitical context has since shifted in ways that make sovereignty much more than an abstract policy concern.

Why an LLM platform is worse than Google dependency

European policymakers have worried about dependence on US tech platforms for years. The current dependencies — Google for search, AWS for cloud infrastructure, Microsoft for productivity — each operate at a specific layer. Google sees your search queries. AWS hosts your compute. Each platform sees a slice of your digital life.

An LLM agent platform that mediates your transactions, research, and decision-making sees all of it. Your intent, your deliberations, your financial decisions, your health research, your professional communications — all flowing through a single entity, controlled by a single jurisdiction. This isn’t another layer of dependence. It’s the integration of all previous dependencies into one.

And the US has demonstrated, repeatedly, that it will weaponise technology dependencies when it serves its interests. SWIFT exclusions to punish geopolitical opponents. The Huawei ban to restrict a competitor. The CLOUD Act compelling access to data held by US companies anywhere in the world. The TikTok forced sale to control a foreign-owned platform operating on US soil. These aren’t hypotheticals. They’re precedent.

The most recent example is the most direct. In February 2026, the Pentagon gave Anthropic an ultimatum: grant the military full, unrestricted access to Claude — including for mass surveillance and autonomous weapons systems — or face the Defense Production Act and be declared a supply chain risk. The company that created MCP, the open protocol at the centre of this series, is being coerced by its own government into dropping its core safety commitments. Whether Anthropic holds the line or caves is an open question at the time of writing. But the signal to anyone outside the US is unambiguous: a US AI company’s terms of service, safety commitments, and acceptable use policies can be overridden by government directive. If you’re a European business or government building on Claude, that’s not an abstract sovereignty concern. It’s a live demonstration of what US jurisdictional control over your AI infrastructure actually looks like.

The shift in European sentiment

Until recently, technology sovereignty was an abstract policy discussion — something for Brussels regulators and think-tank papers. I think the Trump administration’s posture toward allies has changed that at a fundamental level.

When the US president threatens military action against Denmark — a NATO ally — over Greenland, something shifts in how Europeans think about dependence on American institutions. I notice it in conversations, in public discourse, in the tone of policy discussions that used to be technocratic and are now visceral. The sentiment I observe isn’t “we should consider diversifying our technology stack.” It’s “we cannot afford to depend on a country that threatens its allies.”

This isn’t about whether the Greenland threat was militarily serious. It’s about what it revealed about the reliability of the transatlantic relationship as a foundation for technology dependence. If the US is willing to threaten a NATO ally over territory, the assumption that US tech platforms will remain available, neutral, and governed by rule of law becomes harder to sustain. Add the broader pattern — tariff threats against EU members, transactional framing of security alliances, explicit scepticism toward multilateral institutions — and the picture is of a relationship where the terms can change unilaterally.

I want to be careful here. I’m not arguing that European tech sovereignty is easy or that the sentiment translates automatically into capability. I’m arguing that the political will has shifted in a way that makes sovereign alternatives more likely to be funded, politically supported, and adopted than they were even two years ago. Public opinion is a precondition for policy, and the opinion has moved decisively.

For engineers and engineering leaders in Europe, this isn’t abstract. It’s the question of whether the infrastructure you build on — the cloud, the models, the agent platforms — will still be available on the same terms in five years. And increasingly, the answer that comes back from the policy environment is: don’t bet on it.

The risk isn’t only political. The AI companies building these platforms are burning capital at historic rates, most are pre-profit, and the competitive landscape could shift abruptly through bankruptcy, distress acquisition, or strategic pivot. Building your stack on a platform whose provider might not exist in three years — or might exist under very different ownership and terms — compounds the jurisdictional risk. The sovereignty question and the capital fragility question aren’t separate concerns. They reinforce each other.

Trust-model fragmentation

The fragmentation isn’t random. It follows institutional and geopolitical lines:

US users and businesses are the natural early adopters of US-based agent platforms. The trust question exists, but convenience and ecosystem maturity overcome it for most.
European users and regulators will increasingly demand that agent interactions involving financial, health, and personal data either run on European infrastructure or comply with European governance frameworks. This isn’t hypothetical — it’s the trajectory that GDPR, the Digital Markets Act, eIDAS, and PSD2 have been on for years. The political climate accelerates it.
Privacy-sensitive users everywhere will prefer open-source or self-hosted agents — as I noted in the second post. The demand for delegation doesn’t disappear. It routes through different infrastructure.
China builds its own parallel ecosystem. This is already happening with DeepSeek and others, and it doesn’t depend on the Web 4.0 thesis specifically.

The realistic picture isn’t sovereign platform races in every major bloc. India, Brazil, the Gulf states building their own agent platforms is speculative. What I think is more likely: US platforms dominating by default, China’s parallel ecosystem serving its domestic market, and Europe as the critical swing — with regulatory infrastructure that could credibly underpin sovereign alternatives, but a track record of not building the consumer-facing platforms to use them.

The EU’s position

Europe has strong regulatory infrastructure and weak platform-building capability. That’s been the pattern for two decades, and I’m not going to pretend I expect it to change overnight.

And the picture is more contradictory than the sovereignty rhetoric suggests. The EU restricts US tech companies with one hand — GDPR, DMA, antitrust actions — while individual member states deepen dependency with the other. Germany integrates Palantir, a US company with deep ties to American intelligence agencies, into state police infrastructure. The same political class that calls for technology sovereignty signs procurement contracts that entrench the opposite. This isn’t unique to Germany — it’s the pattern across the bloc. The regulatory posture and the procurement reality point in different directions, and the procurement reality is what actually builds dependency.

Mistral is the most credible European bet on the model layer — a French company with genuine technical capability and government backing. But a model alone isn’t a platform. The platform requires discovery, payments, trust infrastructure, developer ecosystem — all the layers we’ve been discussing.

What Europe can build is compliance infrastructure — the regulatory frameworks that make agent-mediated transactions legally sound in European contexts: biometric confirmation under eIDAS, payment processing under PSD2, data governance under GDPR, dispute resolution under consumer protection law. US platforms will need to integrate with this to serve European users. That’s a real business opportunity for European companies. But I wouldn’t overstate it. Compliance is a tax the platforms pay, not a lever that shifts control. US platforms could build the compliance layer themselves, acquire the providers, or partner with existing infrastructure. The leverage would only be strategic if it were integrated into a competing European platform services stack — and that’s the “build a European platform” problem that the continent has failed at for two decades.

The open protocol is the escape hatch. MCP being open means European alternatives don’t need to convince providers to adopt a different standard — they can consume the same MCP servers that US platforms consume. The protocol is shared; the platform services layer on top can be sovereign. That’s necessary but not sufficient. An open protocol without a discovery layer, without integrated payments, without trust infrastructure, is just a specification.

The question is whether European institutions fund and build those platform services before the US alternatives become the default — before the habit forms and switching costs accumulate. Given the shift in political will, I think there’s a window. Windows close.

What this means for the competitive picture

Technology sovereignty adds a dimension that pure market analysis misses. The “winner” of Web 4.0 might not be one company. It might be a fragmented landscape: US platforms serving the American market and parts of the world that accept US jurisdiction, European alternatives serving regulated and privacy-conscious contexts, China’s ecosystem serving its domestic market, and open-source agents serving the long tail.

That fragmentation reduces the platform power concentration the thesis predicts — but doesn’t eliminate it within each zone. The US platform winner still dominates a massive market. And the dynamics within each zone are similar: discovery concentration, platform services capture, provider prisoner’s dilemma.

The sovereignty question isn’t a footnote on the platform thesis. It’s a structural force shaping who controls the platform and on what terms. And right now, the political wind in Europe is blowing in a direction that makes this question far more consequential than the model benchmark charts that dominate tech coverage.

The Strongest Counterarguments

I want to close the competitive analysis by engaging honestly with the case against platform consolidation. This section draws significantly on Evans’ analysis, which I think provides the most rigorous version of the sceptical case.

The engagement problem

Evans’ data on ChatGPT usage is the hardest piece of evidence to dismiss: 800-900 million users, only 5% paying, 80% having sent fewer than 1,000 messages in 2025. US teens use chatbots “a few times a week or less,” not daily. If hundreds of millions of people have access to the tool and most don’t engage deeply, that’s empirical evidence against the delegation habit forming.

My response: current chatbot usage measures a conversational search tool, not a delegation agent. The habit we’re predicting — delegate a task, review curated options, confirm — doesn’t exist in product form yet. Nobody has shipped a consumer agent that books flights, compares insurance, and coordinates across providers with the flow the thesis describes. Judging the delegation habit by ChatGPT engagement data is like judging the mobile app economy by WAP browser usage in 2004.

But I want to be honest: this is a promissory argument. I’m saying “the product hasn’t been built yet, so the data doesn’t apply.” That’s convenient. Evans’ counter is straightforward — the tool exists, hundreds of millions have tried it, and deep engagement hasn’t formed for most. If the delegation product ships and people still don’t use it deeply, the thesis is wrong. That’s the test.

Evans argues that reducing complex products to simple API calls historically fails because providers resist ceding control of their user experience. Developers don’t want to become “dumb API calls.” This directly challenges the MCP-as-HTTP analogy.

Here I think the response is stronger. MCP Apps return rich UIs. Providers keep control of their presentation, branding, and user experience within the agent’s interface. The parallel isn’t “website reduced to API call” — it’s “website rendered inside a browser the provider doesn’t control.” That’s exactly what the web already is. Providers have lived with that bargain for three decades.

But the tension is real and deserves naming. Providers will resist. The degree to which MCP Apps actually preserve provider agency — versus becoming thin templates the platform constrains — determines adoption speed. If the platform starts overriding provider UIs for “consistency” or inserting its own elements, the widget fallacy comes true. The current architecture avoids that failure mode. Whether it stays that way as the platform matures is a governance question, not a technical one.

What if the platform doesn’t consolidate?

The most fundamental counterargument: LLMs become a distributed feature layer — every app gets an AI assistant, no cross-cutting platform emerges. Google’s “AI in everything” strategy wins. The web stays the web, with better autocomplete.

I think this is genuinely possible. The platform thesis requires consumer attention to consolidate on a small number of agent interfaces. If it doesn’t — if people prefer AI embedded in the products they already use — the platform economics I’ve described don’t materialise.

What makes me think consolidation is more likely: cross-provider coordination. The killer advantage of a general agent isn’t that it’s better at any single task than a specialised in-app AI. It’s that it handles the coordination across providers that no single product can. Trip planning across flights, hotels, and restaurants. Insurance comparison across carriers. Moving to a new city and handling address registration, utilities, internet, and GP in one flow. These coordination tasks are inherently cross-cutting. A distributed feature layer can’t do them.

But “more likely” isn’t “certain,” and the timing gap between “clearly better in theory” and “actually used by consumers” has killed more theses than I can count.

What I Think Is Actually at Stake

The race to own Web 4.0 isn’t a race to build the best model. Models are converging — half a dozen companies produce roughly comparable ones, and the gap keeps closing. The race is to build the best platform services layer: discovery, payments, trust infrastructure, identity, dispute resolution, legal liability. The things that make delegated transactions work and that make providers willing to participate.

Google is playing defence against a threat that might not materialise. Apple has the strongest structural hand but may not play it in time. OpenAI is assembling the most complete stack but lacks the engagement depth and structural moats that made prior platforms durable. Anthropic is making the most technically elegant bet — that the protocol is the platform — against a historical pattern where protocol designers rarely capture the economics. And the sovereignty question means the answer might not be one company or even one geography.

I don’t know who wins. I don’t think anyone does yet. What I think is: whoever builds the discovery layer and bundles trust infrastructure around it captures the economics of the next era. The protocol is open. The platform built on top won’t be — and the contest for that layer is barely underway.

Previously: the architecture. And: the consumer demand story. And: what it means for builders. Next: what Web 4.0 does to us.

The Supply Side of Web 4.0

2026-02-26T00:00:00+00:00

What this post covers:

Why providers would join a platform that commoditises them
Three forces killing the SaaS playbook — independent of the platform thesis
Which products die, get commoditised, or survive
What to build instead — where durable value lives
The solopreneur wave — two overlapping dynamics
The developer transition — boom, plateau, contraction
Where VC capital flows — and how the model reshapes
Platform economics and discovery

In the first post of this series, I laid out the architecture: an open protocol, an agent layer, a marketplace, multi-client adoption. In the second, I made the consumer case — why delegation beats self-service for friction-heavy tasks, and why this time the experience threshold is crossed.

This post flips to the builder’s side. If the demand story holds, what does it mean for your company, your product, your career?

I want to be upfront: everything that follows is speculative. The first two posts laid out an architecture and a demand case. This one reasons through the builder implications if those arguments hold. They might not. The shift might play out very differently from what I’m imagining, or it might stall entirely. I’d rather think through the possibilities honestly than pretend to certainty I don’t have.

The key question from the builder’s chair: how much of self-service is tolerated rather than valued?

I covered the full circle — human intermediaries to self-service to AI intermediaries — in the first post. Self-service originally won not just on cost but because it was less conflicted: no commissions, you saw the raw options. But in the pursuit of growth, self-service has reintroduced many of the same problems through different mechanisms — promoted placements on comparison sites, dark patterns in checkout flows, hidden referral fees shaping which options surface, algorithmic curation that isn’t neutral. And the economic model has degraded too: subscriptions get more expensive year over year, products add features nobody asked for to justify the increases, and services actively downgrade their existing tiers to create new premium ones — Amazon Prime retrofitting ads into a paid service and charging extra to remove them is the pattern in its purest form. The “less conflicted” advantage that made self-service win has substantially eroded, while the labour stayed with you. AI intermediaries bring the convenience back, and the trust problem with it — but the honest comparison isn’t against the pristine self-service of 2005. It’s against the degraded self-service of today, where you do all the work and get manipulated. That lowers the bar considerably.

The circle only closes, though, for tasks where self-service is endured, not enjoyed.

Think about what you actually do when you interact with most software. You compare insurance quotes — not because comparing is fun, but because you need coverage. You file expenses — not because the interface is rewarding, but because you want to get reimbursed. You research flights across six tabs, navigate government paperwork, set up utilities in a new city, dispute a charge with your bank. The second post described this from the consumer’s side. From the builder’s side, the uncomfortable question is: how many of your users are there because they want to be, and how many are there because they have to be?

I think the tolerated category is very big. Insurance, travel, expense reporting, government services, price comparison, scheduling, routine financial tasks, HR workflows, procurement — the list of software people endure to reach an outcome dwarfs the list of software people choose to use. And the delegation model is a direct substitute for endurance.

Some of these sectors are heavily regulated, and agent-mediated transactions in insurance, finance, or healthcare will require more controlled integrations than booking a restaurant — full forensics on why a service was selected, audit trails, compliance infrastructure. That shapes the timeline, not the direction. And the addressable market fragments by trust model too — not everyone will hand their data to a US platform. I’ll develop both of these threads in a later post.

Why Would Providers Join?

The consumer demand story is one half of the equation. A platform with no services is a search engine with no websites. The thesis depends on providers participating — many of whom get commoditised in the process. So why would they?

I think the answer is a prisoner’s dilemma, not a willing embrace — though this depends on the consumer habit actually forming. As the second post argued, the demand is latent, not proven. Providers wouldn’t need to join a platform nobody uses. But they don’t need the habit to have fully formed — just the risk that it might.

Hotels didn’t choose Booking.com. They joined for incremental bookings — a new channel, low risk. Then consumer behaviour shifted. Travellers started searching on Booking.com first, comparing across properties, expecting the same cancellation policies and review transparency everywhere. By the time hotels realised their direct bookings were declining, opting out meant invisibility. The platform had become the default discovery layer.

The pattern repeats. Businesses didn’t choose to depend on Google — they had to be where consumers looked. Developers didn’t love the App Store’s 30% cut, but that’s where the users were. First movers get distribution advantage; once critical mass forms, holdouts are punished.

But the pattern doesn’t hold everywhere, and the base rate for platforms achieving this kind of coercive supply-side dynamic is low. Most platforms never get there. Enterprise software was never fully intermediated by a discovery platform. Professional services resisted aggregation for decades. The prisoner’s dilemma requires a discovery choke point — consumer attention consolidating on a small number of interfaces.

And here’s the tension: MCP is an open protocol. There’s no proprietary gatekeeper baked into the architecture. As I argued in the first post, though, an open protocol doesn’t prevent concentration at the discovery layer. HTTP is open; Google still controls most of search. If consumer attention migrates to a small number of agent interfaces — Claude, ChatGPT, perhaps a handful of others — those interfaces become the choke point, and the prisoner’s dilemma kicks in. If that attention migration doesn’t happen, neither does the supply-side pressure. The whole argument is conditional on the consumer habit forming, which is far from guaranteed — and historically, most bets on this kind of consolidation have been wrong.

But there’s a positive case too, and I think it’s underplayed. MCP Apps mean providers aren’t fully commoditised — they control their presentation, their brand, their UX within the agent’s interface. A provider on an agent platform is a website rendered in a browser they don’t control. That’s exactly what the web already is, and providers have lived with that for decades.

The analogy goes further than presentation. Just as some users type a URL directly and others click the first Google result, some users will say “book me a flight” and let the agent choose, while others will say “book me a flight on Lufthansa” or configure their agent to always prefer certain providers. The discovery layer matters for users who don’t specify — but it’s not the only path to the provider. That’s closer to the open web than to a walled garden.

The agent also sends qualified, high-intent traffic — no idle browsing, no bouncing. And providers shed the cost of building and maintaining a full consumer-facing application. Their “interface” becomes a display template. For many companies, that’s a genuine saving.

The honest breakdown: companies with real capability behind the UI benefit — they get distribution at lower cost. Companies whose only value was the UI get destroyed. Companies in the middle join reluctantly because the alternative is worse. That’s the pattern I see from Booking.com, Google, and the App Store — though it played out differently in enterprise software and professional services, where discovery never consolidated the same way.

Three Forces Killing the SaaS Playbook

I touched on this in the first post, but it’s worth a closer look. These pressures apply regardless of whether Web 4.0 materialises.

Post-ZIRP economics → The funding environment that enabled “grow at all costs” SaaS is gone. Higher interest rates compress multiples, raise the profitability bar, and make the traditional playbook — raise money, build a UI, sell subscriptions, grow — harder to execute. The era of funding SaaS companies on the promise of eventually finding margins is over.
LLM commoditisation → General-purpose models now do much of what SaaS products charged subscriptions for: summarisation, analysis, comparison, natural language queries on structured data. The value of wrapping these capabilities in a dedicated product with a monthly price tag erodes when the user’s general-purpose agent does the same thing with broader context.
Agentic coding → This is the one builders feel most viscerally. Solopreneurs using AI coding tools build in a weekend what previously required a funded team. Competition explodes. Moats that relied on “we built it first and it’s polished” weaken when anyone can build a polished version in days. Product shelf life shortens.

These three forces are independent. They don’t need the platform thesis to be right. They’re already reshaping what gets built, what gets funded, and how long it lasts.

The Disruption Framework in Depth

In the first post, I sketched three categories: dies, commoditised, untouched. Here’s the richer picture.

Dies → Products that are essentially a presentation layer over data or services that exist elsewhere, with minimal proprietary logic behind the UI. The value is making something accessible or comparable — and an agent does that natively.

Price comparison aggregators that pull public pricing into a grid. Simple booking intermediaries that wrap someone else’s inventory in a nicer interface. Form-filling tools that walk you through a public government form. Dashboard-only analytics that visualise data you already own. Thin job board aggregators that scrape listings from elsewhere.

These products exist because the alternative — doing it manually — is worse. An agent is a better alternative.
Gets commoditised → Products with real underlying capability — proprietary data, workflow logic, network effects — but whose interface premium erodes. The product persists as infrastructure the agent calls into, but the UI becomes less relevant and revenue may compress.

Some examples: CRM systems like Salesforce, where the data, workflows, and permissions persist but users increasingly interact through the agent layer. Job boards with proprietary inventory and employer networks — the matching UI erodes but the network and employer-side tools remain. Expense management with approval chains and accounting integrations. Project management tools where the workflow engine persists as infrastructure.

Commodity creative production falls here too — template-based marketing assets, stock photo replacement, social media content at scale. The output is functional rather than craft.

The administrative uses of productivity tools also land here. Formatting a report in Google Docs, generating a summary spreadsheet from raw data, setting up a Miro board from a template. You’re using the tool as a means to an end, and the agent handles the mechanical parts.
Untouched → Tasks where the interaction fidelity exceeds what text and voice can express, or where the quality and control threshold makes delegation unacceptable. This isn’t about being “creative” — it’s about the gap between what you can describe in conversation and what you need to do with your hands and eyes.

Professional design tools like Figma and Photoshop, where spatial and visual micro-decisions can’t be art-directed through a chat interface at the fidelity a brand requires. Music production in Ableton or Logic, where the authoring surface is the creative process. Productivity tools as authoring environments — writing in Google Docs, thinking visually on a Miro board, building a model in Excel where the cells are the thinking. Portfolio management, where control over financial decisions is the point.

Some products straddle categories depending on how they’re used — productivity tools are the clearest case. The same app is untouched when you’re thinking on the canvas and commoditised when you’re doing mechanical work to get a task done. The categories describe uses, not products. These lists aren’t complete — they’re meant to illustrate the principle, not define the boundary.

This framework maps onto the internet bifurcation from the first post. Products in the transactional kill zone — booking, comparing, scheduling, filing — are where “dies” and “commoditised” concentrate. Content and control-as-value products sit in the safe zone. If you’re building, the first question is which territory your product lives in.

There’s also an adaptation path worth naming. SaaS incumbents aren’t standing still. Salesforce is building Agentforce. Microsoft has Copilot. These embedded agents have a genuine advantage: native access to the product’s internal data, custom fields, workflows, and context that a general agent accessing the same product via MCP would only partially see.

I think the most interesting trajectory here is from “commoditised” toward something closer to “untouched” — incumbents that evolve into specialised agents, plugged into the general platform via MCP or something similar. A general-purpose model can’t and won’t handle everything. Complex CRM workflows, deep financial modelling, sophisticated supply chain logic — these may be better served by specialised agents that the general agent orchestrates rather than replaces.

Whether that’s what actually happens, or whether the general agent gets good enough to handle most of it directly, I genuinely don’t know. But it’s a credible path, and it means “commoditised” isn’t necessarily a permanent destination.

The shallow AI trap is a different story. I see three layers of vulnerability in the current wave of “AI-powered” SaaS:

Most exposed → Features that wrap general LLM capabilities — “ask your data in natural language,” “AI-powered summaries.” If the general agent gets broad data access, it does this natively with broader context, and an agent that queries across your tools outperforms a chatbot trapped inside one product. But this depends on users granting that access, which is far from automatic.
Moderately exposed → AI features that use proprietary data for interface-level tasks — smart filtering, auto-categorisation, predictive UI. If agents bypass the interface entirely, the investment in a “smart interface” is wasted. Though there’s a counter-case here: a product’s embedded AI has full access to its internal data, and for complex single-domain tasks, it may remain superior to a general agent that only sees what the MCP server exposes. The question is whether users care enough about that edge to stay inside the product’s interface.
Least exposed → AI features that create genuinely new capabilities — medical imaging analysis, fraud detection on proprietary transaction graphs, structural engineering simulations. Here the AI is the product. These become infrastructure the agent invokes rather than replaces. Products with deep domain AI and proprietary data are in a fundamentally different category.

There’s an irony I keep coming back to. As I noted in the first post, companies adding shallow LLM features may be training their users to prefer delegation — to type natural language instead of clicking buttons, to expect conversational interaction instead of navigating menus. If that habit transfers to a general agent, it makes those same in-product features unnecessary.

That’s a big “if,” though. Users might just as easily develop a preference for in-context AI — the assistant that already has their data — rather than a general agent that needs to be granted access. I think the general agent wins for cross-tool tasks, but the outcome for single-product workflows is genuinely open.

What to Build Instead

If the interface premium erodes, where does durable value live?

Proprietary raw data → Unique datasets that literally don’t exist elsewhere — sensor networks, transaction records, genomic databases, satellite imagery. The agent needs to access these, and the provider charges for it.

But there’s a squeeze even here. A firm like Gartner charges tens of thousands a year for market research, and their value is partly raw data (survey results, vendor assessments) and partly synthesised analysis (trend reports, analyst interpretation). An LLM can approximate the synthesis from public sources. The safe zone is truly unique raw data that can’t be approximated — not the interpretation layer built on top of it.
Real-world execution → Booking a flight, processing a payment, shipping a package, dispatching a technician. These require infrastructure the agent doesn’t own. The physical world is a moat.
Domain logic → Complex calculations, compliance rules, risk models. The agent can invoke these but not replicate them — yet. This boundary shifts as models improve, which means it’s safe for now, not safe forever. The honest framing: domain logic is a time-bounded advantage whose clock is ticking.
Certified trust → Medical diagnosis tools, legal compliance checkers, financial audit systems. Users need certified outputs, not LLM approximations. This is the most durable safe zone because it’s regulatory, not technical. Regulations don’t move at the speed of model improvement.

One category I think is underappreciated: capabilities that expand who gets to use a service. As I argued in the second post, the delegation model doesn’t just help existing users — it makes services accessible to people currently excluded by complex interfaces. Building for that accessibility layer through the agent is a durable value source, not a feature.

The mental model shift: don’t build interfaces, build capabilities. The MCP app model is essentially this — you’re not building a full application, you’re building a service an agent invokes. Smaller surface area, lower development cost, but accessed at the exact moment of commercial intent.

The Solopreneur Wave

AI coding tools are enabling a wave of solopreneurs building products that previously required funded teams. This looks like a SaaS renaissance. I think it’s actually two overlapping waves that will diverge.

The last SaaS hurrah → Solopreneurs building traditional SaaS products — comparison tools, dashboards, aggregators — faster and cheaper than ever. But if everyone can build a SaaS product in a weekend, competition explodes, niches fill instantly, and the moat — UI polish — is exactly the thing agents commoditise.
The first Web 4.0 ecosystem → Solopreneurs building MCP skills, niche domain tools, agent infrastructure — things that serve the new platform rather than compete with it. Building an MCP skill is closer to building an API than building an app. The economics favour small teams.

The challenge is that most solopreneurs can’t easily tell which wave they’re riding. But there are litmus tests that I think are useful, even if they’re not binary:

Does your product’s value survive if the user never sees your UI? If yes, you’re building capability. If no, you’re building something the agent eventually bypasses.
Would an agent invoke your product, or replace it? An MCP skill that provides flight pricing data serves the agent. A comparison website that displays flight pricing competes with it.
Does your moat strengthen or weaken as agents improve? Proprietary data and physical-world infrastructure get more valuable as more agents need access. UI polish and curation get less valuable as agents bypass them.

Even “last hurrah” products may generate real revenue for years — the transition won’t be instant. At solopreneur cost structures, a three-to-five-year revenue window is a perfectly good business. It’s just not a venture-scale one.

The Developer Transition

Here’s where I think the picture gets uncomfortable, especially for those of us who build software for a living.

The transition itself is a massive employment programme for developers. Building MCP servers, designing tool schemas, constructing trust infrastructure, migrating services from UI-first to API-first — agents aren’t close to doing this autonomously. The scope of work is very large. In the near term, developer demand may actually increase.

But there’s something different about this transition: the productivity multiplier arrives simultaneously. When the web arrived, humans had to build all the web software. When mobile arrived, humans had to build all the mobile apps. This time, agentic coding tools dramatically amplify what each developer can produce. The new platform gets built by fewer people than any prior transition required.

I think of it as three reinforcing dynamics:

The barrier to building drops. Solopreneurs build what previously required funded teams. More software gets built, but by fewer professional developers.
The longevity of what gets built drops. If anyone can ship a product in a weekend, competition explodes, niches fill instantly, and most products have short shelf lives.
Infrastructure gets built by fewer, more productive engineers. The Web 4.0 buildout doesn’t require the same headcount as prior platform buildouts.

This suggests a three-phase timeline. A transition boom — where we are now, possibly more demand, not less. A plateau — infrastructure matures, patterns stabilise, new SaaS products stop being built at the old rate. And a possible contraction — similar or more software to build, but dramatically fewer people needed to build it, with shorter commercial lifespans.

Now, I should be honest about what I’m not sure of here. Historically, every productivity improvement in software development — higher-level languages, frameworks, cloud infrastructure, open source — has been accompanied by a massive expansion in what gets built. The productivity gains got absorbed by growing demand. This is the Jevons paradox applied to software: make it cheaper to build, and people build more of it, and the net effect on employment is neutral or positive. That pattern has held for decades.

I think this time might be different, for two reasons. First, the productivity leap appears to be qualitatively larger than prior transitions — though the magnitude is genuinely uncertain and varies enormously by category of work. Second, the demand expansion depends on new markets opening up, and if the agent layer handles much of what those new markets would need, the demand for custom software may not expand the way it did before.

But I want to be clear: I’m arguing against a pattern that has held through every prior transition. I might be wrong. The contraction phase might never come. New categories of software we can’t yet imagine might absorb the productivity gains entirely, the way mobile created an entire economy that didn’t exist before.

What I’m more confident about: even in the optimistic scenario, the type of developer work shifts substantially. The reframing matters — it’s not necessarily “fewer developers” but “different work, different skills, different relationship to the tools.”

I think developers are safe because of the transition, not despite it — and that safety is time-bounded by the transition’s duration. The same pattern as travel agents training customers to use Expedia. But the duration is genuinely uncertain, and the historical pattern of induced demand is a real counterargument I can’t dismiss.

The Capital Picture

VC capital will flow into Web 4.0 infrastructure because capital needs deployment. That’s a mechanism, not validation. When capital searches for a narrative, it tends to create the conditions for that narrative to become real — the same thing happened with mobile, and also with Web 3. Capital flowing doesn’t prove the thesis is correct.

What’s investable: vertical agents (the “Uber for LLM-mediated travel booking”), infrastructure plays (payments, trust, identity for the agent ecosystem), picks-and-shovels (developer tools for building MCP apps), and content aggregators (bundled access to data sources for LLM consumption).

But there’s a deeper tension: the transition that VC money accelerates may ultimately reshape the VC model itself. The traditional software VC playbook — fund a team to build a SaaS product, scale on recurring revenue, exit at a revenue multiple — depends on software being expensive to build, defensible through UX, and monetised through subscriptions. If all three legs weaken simultaneously, the model’s foundations erode.

I think the industry bifurcates. Infrastructure-scale capital still works — foundation models, compute, trust and payments systems require billions and offer venture-scale returns. Small and niche thrives — micro-funds backing solopreneurs building MCP skills and vertical tools, lower check sizes, faster cycles. The middle hollows out — the classic Series A-B SaaS fund is most at risk, because its deal flow is the category being disrupted.

What partially fills the gap: incumbent defensive capital. Google, Salesforce, enterprise SaaS companies — the ones fighting hardest against irrelevance fund ecosystem development to control the transition or stay relevant through it. Different incentives than growth-seeking VC, but it keeps the ecosystem capitalised.

Platform Economics and Provider Incentives

The platform’s leverage is discovery and attention, not protocol lock-in. That distinction is less comforting than it sounds.

MCP is open. Providers can technically be accessed directly. There’s no distribution lock-in to justify App Store-level commissions. But as I noted in the first post, MCP today is like HTTP without DNS and without search engines — the protocol works, but the discovery infrastructure doesn’t exist yet.

Whoever builds that discovery layer — the thing that maps user intent to relevant services — captures the economics. Discovery has winner-take-most dynamics. HTTP being open didn’t prevent Google from monopolising search. MCP being open doesn’t prevent a discovery monopoly. It just moves the bottleneck up the stack.

The value proposition to providers is bundled services (identity, payments, trust infrastructure), the threat of invisibility as consumer attention migrates to chat-based interfaces, and legal liability infrastructure (biometric confirmation, audit trails, dispute resolution) that makes delegated transactions legally sound. The protocol is open; the platform services built on top of it probably won’t be.

Monetisation sits on a spectrum of plausibility. Discovery and promoted placement is most likely — this is Google’s proven model, and when the agent selects between competing services for a given intent, whoever controls that selection controls the business model. Subscription bundling is plausible — “Claude Pro: $50/month, includes access to 200+ premium data sources” is the Spotify model applied to services. Transaction commissions are more speculative — the open protocol limits platform leverage compared to the App Store’s walled garden.

From the provider’s chair, the question is: what are the levers I can pull to be discovered? The answer looks more familiar than you might expect. Quality-of-service rankings (TrustPilot, G2, or platform-native ratings), paid placement that’s clearly labelled as such, direct user preference (“always use Lufthansa”), and brand strength that makes users name you explicitly. These are the same dynamics as the web today.

Sponsored results would need to be transparent — and there’s a plausible freemium model where lower-tier users see promoted results while paid users don’t. None of this is novel.

The convenience/conflict trade-off I introduced in the first post applies here too, but from the builder’s side: the platform’s incentive to monetise discovery is real, and smaller providers who can’t pay for placement may struggle for visibility. How that tension resolves — through regulation, competition between agent platforms, or user demand for unbiased results — is an open question I’ll return to in a later post.

Inference costs are the near-term constraint worth naming. Agent interactions are currently more expensive per transaction than serving web pages — multiple tool calls, reasoning, context management. But inference costs are falling rapidly, driven by model efficiency improvements, open-source commoditisation pressure (DeepSeek and others), and the shift from capability arms races to efficiency optimisation.

The trend lines are steep — costs per token have dropped by orders of magnitude in two years and show no sign of levelling off. I think this is a near-term constraint, not a structural barrier, but I’m making a bet on continued cost reduction that could prove wrong.

What Comes Next

If any of this is directionally right, the builders who thrive are the ones who provide things the agent can’t do on its own. Proprietary data. Physical-world execution. Certified trust. Specialised agents with deep domain logic. Capabilities, not chrome.

The SaaS era was the golden age of building interfaces. Founders built companies around the insight that a better UI could make people pay a subscription. If the pressures I’ve described in this post play out, that model gets squeezed from multiple directions simultaneously. What comes next rewards capability over interface.

But I want to close with honest uncertainty. I’ve spent this post reasoning through a scenario, not predicting a future. The consumer delegation habit might not form. The general agent might stay a power-user tool, not a mass-market shift. Incumbents might successfully embed AI in ways that strengthen rather than erode their positions. The whole thing might be a pattern I’m seeing because I’m looking for one.

I’ve tried to flag the conditional assumptions throughout — the “ifs” matter as much as the conclusions. The cost of thinking through this and being wrong is low. The cost of not thinking through it and being right is higher. That asymmetry is what motivates the series, not confidence in the outcome.

Previously: the architecture and the platform thesis. And: the consumer demand story. Next: who controls Web 4.0 — and does it matter?

Why Web 4.0 Might Succeed Where Web 3 Failed

2026-02-25T00:00:00+00:00

In my previous post, I argued that the architecture of the next platform shift is being assembled. Protocol, agent layer, marketplace, multi-client adoption — the pieces are falling into place.

But architecture doesn’t guarantee adoption. Web 3 had architecture too.

The question for Web 4.0 isn’t whether the technology works. It’s whether anyone actually wants what it offers. I think the answer is yes — but not for the reasons most AI coverage suggests.

What Web 3 Got Wrong

I think Web 3 failed because it solved a problem consumers didn’t have.

The pitch was compelling if you were a technologist: decentralised ownership, self-sovereign identity, trustless transactions, no intermediaries. You’d own your data, your digital assets, your identity — free from platform control. I get the appeal. I really do.

But consumers heard something different: manage a wallet, pay gas fees, understand blockchain mechanics, lose everything if you misplace a seed phrase. More friction, more complexity, more cognitive load — for a benefit most people never asked for. Consumers don’t lie awake thinking about data sovereignty. They lie awake thinking about whether they’re getting ripped off on their car insurance.

Web 3 asked people to care about the plumbing. If I’m right about Web 4.0, it makes the plumbing invisible.

The Problem Everyone Already Has

Web 4.0 starts from the other end — a problem so pervasive people have stopped noticing it.

You want to book a trip. So you open Skyscanner for flights, Booking.com for hotels, Google Maps for restaurants, TripAdvisor for reviews. You cross-reference across tabs, apply filters, convert currencies, check cancellation policies, compare loyalty programmes. Two hours later you’ve made a booking — and you’re not confident it was the best option.

You need insurance. You visit comparison sites that have their own promoted placements, wade through fine print, try to understand the difference between policies that look identical but aren’t. You either spend hours becoming a temporary expert or you pick something and hope for the best.

You’re moving to a new city. You need to register your address, find an internet provider, set up utilities, locate a GP, understand the local public transport options, figure out waste collection rules. Each of these is a different website, a different account, a different set of forms — and none of them talk to each other.

You’re looking for a job. You know what you want — but every job board requires you to mentally translate that into their rigid implementation. Location dropdown, seniority checkbox, keyword search that doesn’t understand context. You don’t think in filters. You think “senior engineering role, remote-friendly, interesting product, not adtech.” Good luck expressing that in a search form.

You want to dispute a charge on your bank statement. You navigate a phone tree, wait on hold, explain the situation to someone who transfers you, explain it again, get told to fill out a form online, log in, can’t find the form, call back. An hour of your life, gone.

These aren’t edge cases. This is just… Tuesday. This is what digital life actually looks like for most people, most of the time. The web made everything available. It also made everything your job. And the expertise gap between a savvy digital native and everyone else is enormous — and it’s not closing.

Here’s where I think Web 4.0 comes in: describe what you want, and an agent does the comparison, synthesis, and decision-prep for you. You review the options and confirm. I think of this as the labour inversion — the self-service era offloaded work from paid intermediaries onto you, the unpaid user. Web 4.0 inverts it back. The interaction model swings to conversational and personalised again, like having a travel agent or insurance broker, but at software cost instead of human cost.

Voice, Vision, and Who Gets to Participate

Here’s what I think is the most underappreciated part of this shift. It’s not just text. It’s multimodal interaction — and it changes who technology is for.

“Plan me a trip to Portugal, under two thousand euros, first week of April” is natural to say out loud. It’s awkward to type into a search box. It’s impossible to express through Skyscanner’s 14 filter dropdowns. Voice is the most natural interface humans have — we’ve been using it for a few hundred thousand years. Every other interface is a workaround for the fact that computers couldn’t understand us.

But voice alone isn’t the full picture. Think about this: you’re abroad, staring at a restaurant menu you can’t read. Today, you open Google Translate, point your camera, squint at the overlay, then open a separate app to check allergens, then maybe ask a waiter anyway. With a multimodal agent, you take a photo and say “what’s safe for me to eat here?” One interaction. It knows your dietary restrictions because you’ve told it before. That’s not science fiction — the individual capabilities exist today. What’s missing is the integration.

Or you get a letter from your insurance company full of jargon. Today, you either puzzle through it or call someone during business hours. With an agent, you snap a photo and ask “what does this actually mean for me?” The agent reads the letter, cross-references your policy, and explains it in plain language. I find myself imagining these scenarios constantly, and each time I think: this is clearly better. Maybe I’m wrong. But it’s hard to see how “navigate five apps and become a temporary expert” beats “have a conversation.”

What strikes me is how naturally the modalities shift with context. You’re cooking with your hands full — voice in, voice out. You’re commuting — you ask about weekend plans, and a curated list gets pushed to your phone for later. You’re at a desk comparing mortgage offers — the agent talks you through the trade-offs while showing a comparison table. The interface adapts to the situation instead of forcing you to adapt to it.

And this is where it gets bigger than convenience. If this plays out, it doesn’t just help existing users. It expands who gets to participate in the digital economy entirely. Your grandmother who can’t navigate Booking.com. A visually impaired user who struggles with complex web interfaces. Someone whose first language isn’t the one the app was designed in. A first-generation immigrant trying to navigate a new country’s bureaucracy. People who are digitally literate enough to have a conversation but not to operate a comparison tool.

That’s not a UX improvement. That’s a fundamentally different answer to the question of who technology is for. The web democratised access to information. I think Web 4.0 could democratise the expertise needed to act on it. I’m deliberately leaving pricing and access tiers out of this discussion — the capability argument stands on its own, and the economics deserve their own treatment in a later post.

Why This Time the Experience Is Different

I covered the 2016 bot parallel in the previous post — same vision, different timing. But it’s worth dwelling on what the consumer experience actually felt like.

Ordering flowers through a Messenger bot in 2016 went something like this: you typed “I want to order flowers.” The bot replied with a menu. You picked an option. It asked for a delivery date. You typed a date. It didn’t understand the format. You tried again. It offered three bouquets with tiny images. You picked one. It asked for an address. The whole interaction took longer and felt worse than just using the website.

The threshold isn’t whether delegation is possible. It’s whether it’s meaningfully better than doing it yourself — for enough use cases, often enough, that the habit forms.

I think that threshold is starting to be crossed — not across the board, but for a specific category of tasks. The friction-heavy transactional ones: comparing, researching, coordinating across providers. For those, an agent that actually understands your intent and presents a curated shortlist is genuinely faster and less effortful than doing it yourself. It’s not perfect. It’s not there for everything. But when natural language understanding actually works, the experience flips from frustrating to convenient — and that flip, I think, is what changes habits.

The Trade-offs Are Real

I don’t want to oversell this.

The conversational format genuinely kills a category of dark patterns — the ones that are structural to the UI. Hidden cancellation flows, pre-checked boxes, buried opt-outs, confusing checkout sequences designed to prevent you from doing what you want. When an agent acts on your behalf, there’s no button to hide. “Cancel my subscription” just happens.

But not all manipulation is structural. Urgency messaging (“only 2 left!”), scarcity pressure, and information shaping don’t depend on UI layout — they depend on business incentives, and those survive the format change. An agent that says “I found a great rate but it expires in two hours” is the same trick in a more trusted voice. And new vectors take their place on top of that. When your agent recommends three flight options, are those the best three — or the three whose providers paid for placement? The agent feels like your assistant. That intimacy is what makes it useful, and it’s also what makes the new manipulation vectors harder to detect. A promoted result on Google is visually marked and feels like what it is — advertising. A promoted recommendation from your personal agent feels like advice.

The question isn’t whether Web 4.0 is perfect. It’s whether the net effect is better than the status quo — and I think it is. Today’s digital experience is already adversarial, already full of dark patterns. But the new vectors are subtler, and “your agent might not be fully on your side” is a tension that doesn’t go away. I’ll come back to this in a later post.

Where It Doesn’t Reach

Now, I could write a breathless piece about how this changes everything. But I don’t think it does, and I’d rather be honest about where I think Web 4.0 changes very little.

Content consumption stays native. You don’t delegate scrolling Instagram, watching YouTube, or browsing Spotify. The experience is the product. An agent can play something for you (“play something for cooking”), but that’s voice control, not disintermediation — Alexa already does this and Spotify isn’t commoditised.

Simple habitual tasks stay where they are. If you order the same coffee every morning through the same app, delegation adds nothing. The current interface is already one tap.

Privacy-sensitive users opt out. Some people will never be comfortable with an agent that sees their financial decisions, medical research, and personal deliberations. That’s rational, not technophobic. The concentration of personal data in a single platform is unprecedented, and not everyone will accept the trade-off.

Power users who want control. Some people want to compare 40 flights themselves. They enjoy the process. The interface is the value. Web 4.0 doesn’t serve them and doesn’t need to.

I think knowing where a thesis doesn’t apply is at least as important as knowing where it does.

The Demand That Doesn’t Know It Exists

Here’s the paradox that makes this hard to evaluate: the demand is latent.

Developers are excited — OpenClaw’s growth shows that. But consumers aren’t asking for Web 4.0. They’re not campaigning against app fatigue. There’s no protest movement against having to compare insurance policies yourself. People have normalised the friction because it’s all they’ve known since the self-service era began.

Now, “latent demand” is a convenient argument — you can use it to justify almost anything. But the pattern is real. Nobody asked for a smartphone before the iPhone. Nobody asked for a web browser before Netscape. The demand became obvious after the product existed — not before. That doesn’t mean every “latent demand” claim is valid. It means you can’t use the absence of demand as proof that the demand doesn’t exist.

The test isn’t whether people are asking for this today. It’s whether, once they experience it, they go back. If your grandmother uses an agent to book a flight and it works — does she ever open Skyscanner again?

I genuinely don’t know the answer. I’m speculating here — looking at the future through a crystal ball like everyone else. But the direction feels clear to me, even if the timing and shape are uncertain.

The web democratised access. Web 4.0 democratises expertise — if we build it right.

Previously: the architecture and the platform thesis. More soon.

OpenClaw, the End of the SaaS Playbook, and the Arrival of Web 4.0

2026-02-24T00:00:00+00:00

After twenty years in software engineering, I took a sabbatical. The first real break of my career. I told myself I’d rest. What actually happened is that I spent months going deep on agentic engineering — building with Claude Code, MCP servers, agent orchestration — and trying to make sense of where this industry is heading. Not for a client or an employer. For myself. I’d already been writing about how AI changes our day-to-day work, but the more I built, the more I saw a pattern forming that was bigger than tooling. I needed to understand what I was looking at before deciding what the next chapter of my career looks like.

I’m still figuring that out. But I have a thesis now, and it’s not a small one.

Then OpenAI brought in Peter Steinberger, the creator of OpenClaw — the open-source AI agent that gained 150,000+ GitHub stars in weeks, the fastest-growing project in GitHub’s history — and the pattern clicked.

Most coverage treated the acquisition as a talent hire. A prominent open-source developer joins a well-funded AI lab. Story over. But zoom out: OpenAI is now assembling a proprietary model (GPT), an open-source agent layer (OpenClaw), a skill marketplace (ClawHub), access to users through WhatsApp, Telegram, and Signal, and hardware ambitions through the Jony Ive acquisition. That’s not a talent acquisition. That’s model + agent + app store + distribution + device. To me, it looks like a platform stack being assembled in plain sight. And the structural echo — open-source engine absorbed into a consumer platform with proprietary services on top — is familiar. It’s the Google/Android pattern. Not a prediction that the outcome will be the same, but a recognition that the type of move is the same.

The Web 4.0 Frame

Here’s the pattern I see:

Era	Model	Platform gatekeepers
Web 1.0	Read	Yahoo, AOL, early Google
Web 2.0	Read / Write	Google, Facebook, Apple, Amazon
Web 3.0	Read / Write / Own	Failed — no durable consumer platform
Web 4.0	Read / Write / Delegate	LLM platforms — being assembled now

The numbering is a convention for magnitude of shift, not a claim about logical progression. What matters is the consistent pattern: each prior shift created new distribution gatekeepers. Microsoft dominated the desktop. Google dominated the web. Apple and Google dominated mobile. Whoever controls the primary user interface captures the platform economics — and distribution is the moat.

Web 3.0 was supposed to be the next shift, but it was a solution looking for a problem. Decentralised ownership wasn’t something consumers needed. It failed to produce a single durable consumer platform.

Web 4.0 — LLM-mediated delegation — starts from a problem everyone actually has: digital life is fragmented, adversarial, and labour-intensive. You manage thirty apps with thirty accounts, do your own research, navigate dark patterns, and perform the labour of synthesis and decision-making yourself. The delegation model inverts this.

Now, the individual pieces of this thesis are all over the news — OpenClaw’s viral growth, MCP’s release, AI coding tools, SaaS companies bolting on AI features — each covered as a standalone story. The synthesis hasn’t been made. That either means it’s too early to see, or it’s not there to see. Judge the argument on its merits.

We’ve Been Here Before — And It Failed

If this sounds familiar, it should.

The 2016-2017 bot era had the exact same vision: conversational interface as application platform. Facebook launched Messenger bots in April 2016. Slack integrations proliferated. WeChat mini-programmes launched in January 2017.

The Western versions failed because NLU (natural language understanding) was too primitive. The bots worked — you could order flowers through Messenger — but the experience was consistently worse than just using the app. When the conversational interface can’t actually understand your intent reliably, it’s friction, not convenience.

WeChat succeeded in China under specific conditions: a mobile-first population, near-universal adoption, and no Western competition for the attention graph. It proves the model is viable under the right conditions. It doesn’t prove it’s inevitable.

The question that matters: what’s different now?

MCP: HTTP for the Agent Era

What’s different is that the protocol layer exists.

MCP — the Model Context Protocol — was released by Anthropic in November 2024 as an open standard for LLM-to-service interaction. It was created by a commercial competitor, which gives rivals rational reasons to resist. And yet: ChatGPT adopted it. VS Code adopted it. Goose adopted it. OpenClaw consumes it through mcporter. When competitors adopt your competitor’s protocol, the protocol is winning on merits, not on politics. Google is the notable holdout — and their reasons have more to do with the innovator’s dilemma than protocol preferences, but that’s a story for a later post.

The architectural parallel is striking:

Web	LLM Platform
HTTP	MCP (interaction protocol)
HTML / APIs	Tool schemas (interface contract)
Browser	LLM client app (rendering layer)
Websites	Service providers

In January 2026, MCP Apps shipped — tools that return rich interactive UIs (dashboards, forms, workflows) rendered in sandboxed iframes inside Claude, ChatGPT, VS Code, and Goose. This isn’t text completion. It’s a browser inside the conversation. Build once, accessible everywhere. That’s what made the web win over AOL and CompuServe.

What’s changed since 2016:

The NLU gap is closed. The core failure point in 2016 — the agent couldn’t reliably understand you — is resolved.
The protocol layer exists. In 2016, every bot platform had proprietary APIs. Now there’s a standard.
Rich UI ships. In 2016, you got text and basic button cards. Now you get full interactive interfaces.
Multi-client from day one. In 2016, you were locked to Messenger or Slack. Now the same tool works everywhere.

But here’s what’s still missing — and it matters. There’s no discovery layer. Right now, MCP is like HTTP without DNS and without search engines. The protocol works; finding the right service for a given intent doesn’t. The missing piece is on-demand schema loading: user expresses intent, a discovery layer identifies relevant services, only those schemas get loaded. This is DNS for MCP. And critically, this discovery layer is the platform economics. Whoever controls which tools get loaded for a given intent controls the business model.

The HTTP analogy carries a warning, too. HTTP being open didn’t prevent Google from monopolising discovery. An open protocol moves power concentration from the protocol layer to the discovery layer. It doesn’t eliminate it. “Open protocol” sounds like it guarantees a competitive market. It guarantees a contestable protocol. The platform built on top can still be a monopoly.

Delegation, Not Point-and-Click

The interaction model is what actually changes things for users — and for what we build.

Web 4.0 isn’t point-and-click. The paradigm is delegation + review + confirmation: describe intent in natural language, the agent executes, a UI surfaces options for review, you confirm. I think this completes a historical full circle — and the trade-off that swings back with it is worth understanding:

Pre-web: Human intermediaries did the work for you. Your travel agent booked the flights, showed you options, you committed. Conversational, personalised, expensive — but conflicted. They had commissions and preferred partnerships.
Web 2.0 / SaaS: Self-service replaced the intermediary. Skyscanner with 14 filters, 40 results, 6 tabs open — you do the labour. Cheaper, always available — and less conflicted. You saw the raw options. Nobody was steering you. The labour shifted to you, but you trusted the results more.
Web 4.0: AI intermediaries do the work again — conversational and personalised, at software cost. But the conflict-of-interest returns in a subtler form: promoted placements, filtered information. The convenience swings back. So does the trust problem — and the agent feels neutral in a way a commissioned travel agent never did.

Latency gets reframed in this model. Twenty seconds for an agent to research and curate three options is expected. You’re delegating, not clicking a button. Total time-to-outcome drops even though individual operations are slower.

For engineers, the implication is direct. Building for Web 4.0 is closer to building an API than building an app. Expose well-typed tools and display templates. The LLM client handles the rendering. The barrier drops from “build a desktop application” (pre-web) to “build a mobile app” (App Store era) to “expose an API with a display template” (Web 4.0). If you’re building MCP servers today, you’re already building for this.

The Internet Splits — And the SaaS Playbook Ends

I don’t think Web 4.0 replaces the entire internet. If the thesis holds, it reshapes the transactional half.

Web 4.0 territory: Booking, purchasing, comparing, researching, scheduling, finances, insurance, government services. Delegation + review + confirmation is genuinely superior for these tasks. Cross-provider coordination is the killer advantage — trip planning isn’t flights or hotels or restaurants. It’s all of them together, and the agent handles the coordination.

Content territory: Instagram, TikTok, YouTube, Spotify, Netflix, gaming, social media. The consumption experience is the product. You don’t delegate scrolling. This stays native.

Television didn’t kill radio. Mobile didn’t kill the web. New platforms capture the highest-value interactions and coexist with what came before.

Now, the title of this post makes a claim: the SaaS playbook ends. Let me be precise about what I mean. The SaaS playbook ends as the default model — raise money, build a UI, sell subscriptions, grow at all costs. Three independent forces are driving this, and they don’t depend on whether a centralised Web 4.0 platform emerges:

Post-ZIRP economics compressing multiples and raising the bar for profitability.
LLM commoditisation — general-purpose models doing what SaaS products charged subscriptions for.
Agentic coding enabling solopreneurs to build in a weekend what required funded teams, exploding competition and eroding moats.

Here’s how I think the impact shakes out:

Dies. Thin wrappers, comparison tools, “nice UI on top of a database” where users endure the interface to reach an outcome. An agent bypasses these entirely.
Gets commoditised. Products with real underlying capability but whose interface premium erodes. Salesforce still exists, still holds the data, still runs the workflows — but users increasingly interact through the agent layer. It becomes infrastructure. Revenue may compress.
Untouched. Control-as-value workflows where the interface IS the product. Portfolio management, creative tools, dev environments — the things where the process of doing is the point.

The sharpest trap is for SaaS products bolting on shallow LLM features — “ask your data in natural language.” This is building a better search bar in 2003 while Google indexes the entire web. An agent that asks everyone’s data and synthesises across sources will always outperform a chatbot trapped inside one product. And the irony cuts deep: companies adding these features are training their users to prefer delegation — the exact behaviour that makes those features unnecessary.

The Economics — Briefly

I’ll go deeper on the economics in a later post, but the architecture supports multiple revenue streams: per-request micropayments (Twilio model), platform commissions (Apple’s 30% model), promoted placements (Google Ads model), transaction processing (Visa model), and subscription bundling (Spotify model). Combined, that’s more monetisation surface than any current platform captures.

Let me be honest about what this means. Promoted placements will happen. Commissions will happen. These patterns replicate across every platform era — they’re expected, not a betrayal. The conversational format removes old manipulation vectors (hidden fees, buried buttons, urgency messaging) but introduces new ones: information filtering in a trusted context, promoted placements that feel like personal recommendations. The net effect is likely better than today’s adversarial UIs. But the new vectors are subtler, and the agent feeling like your personal assistant makes them harder to detect.

The reliability question matters here too. Confirmation gates (“confirm this EUR 340 booking?”) catch wrong actions — wrong date, wrong flight. They don’t catch filtered information: the agent showing three options when ten exist. For the highest-value use cases — financial decisions, insurance, medical research — this invisible filtering is the more consequential failure mode. Building trust for these domains requires transparency or institutional guarantees that don’t exist yet.

What I Think We’re Watching

The OpenClaw acquisition looks small. I think the signal is much larger than the event.

The early infrastructure of Web 4.0 is being assembled: an open protocol (MCP), rich interactive UI (MCP Apps), an agent layer (OpenClaw), a marketplace (ClawHub), multi-client adoption across competitors. I don’t think the question is whether delegation-based interaction is better for complex tasks — OpenClaw’s developer traction suggests it is, even if GitHub stars don’t prove consumer demand. The questions I keep coming back to: who builds the trust, payments, and discovery layer on top? And whether one company becomes the new Google, or the open protocol enables something more distributed.

Three companies are making three different bets. Apple is betting the device controls the agent. OpenAI is betting on both a purpose-built device (the Jony Ive acquisition) and meeting users where they already are through messaging apps (OpenClaw). Anthropic is betting the protocol is the platform, and devices are interchangeable. Google’s notable absence from MCP — embedding Gemini into existing products instead of building a cross-cutting platform — looks to me like the innovator’s dilemma playing out in real time: the company with the most to lose betting against the platform thesis entirely.

I said at the top that I’m still figuring out where I want to be in this transition. That’s true. But I’ve seen what happens when paradigm shifts catch people off guard — personally — and I’d rather think about this one clearly than be surprised by it. I hope these thoughts help others doing the same.

This is the first in a series. More soon.

On Boilerplate and the Death of Old Principles

2026-02-23T00:00:00+00:00

A few days ago, I published a piece on AI and existential fears — my attempt to navigate the space between hype and dismissal in the current AI-assisted coding landscape. It struck a nerve, and the responses were thoughtful. One in particular made me stop and think.

Krisztina Hirth asked a deceptively simple question:

This is a better question than it might appear. It’s not really about boilerplate. It’s about whether an entire generation of hard-won engineering principles — minimalism, DRY, “every line should earn its place” — needs to be re-examined when the economics of writing and rewriting code fundamentally change.

I think the answer is nuanced. And I think it reveals something important about how our relationship with code is shifting.

The Cheap Rework Argument

Let me start with the part that feels almost heretical to say out loud.

Rework is becoming very cheap.

When a machine can rewrite a module in seconds, the cost calculus of code changes dramatically. In the old world, boilerplate was expensive because humans had to read it, understand it, maintain it, and modify it. Every unnecessary line was a tax on future developers. We built entire philosophies around this — YAGNI, DRY, the relentless pursuit of minimal code — because the human cost of complexity was real and compounding.

But if an AI agent can regenerate a component from scratch in the time it takes you to read the first function? The maintenance cost argument weakens considerably.

Here’s the key insight though: this only holds if you protect what matters. The outward behaviour of your system — your APIs, your data contracts, your persistence formats, the observable behaviours that other systems and users depend on — these are your guardrails. As long as those interfaces are stable and well-defined, the internals become genuinely more disposable than they’ve ever been.

This is contract-based thinking taken to its logical extreme. Programme to interfaces, and let the machine worry about the implementation behind them. From this perspective, boilerplate behind a stable interface really doesn’t matter as much as it used to.

But Here’s Where It Gets Interesting

Now, you might expect me to stop there. “Boilerplate doesn’t matter, embrace the machine, move on.” But I think there’s a second argument that actually pushes in the opposite direction — and it’s one that most people haven’t considered yet.

Removing boilerplate is in the AI agent’s best interest too.

This is the part that surprised me when I started thinking about it carefully. Current AI coding agents operate within a context window — a finite amount of information they can hold and reason about at any given time. This context is precious. It’s the agent’s working memory. And like all working memory, it degrades as you fill it up.

When your codebase is full of noise — boilerplate, repetitive patterns, verbose scaffolding — the agent has to wade through all of it to orient itself. Every unnecessary line of code is a line that competes for attention in a limited context. Less noise means the agent can ground itself on your existing code faster and more accurately. It means you’re getting more out of every token of that context window.

And here’s the practical reality: context quality degrades as it fills up. The more you stuff into it, the less reliably the agent reasons about any of it. A lean, well-structured codebase isn’t just aesthetically pleasing — it’s operationally superior for AI-assisted development.

So the old principle survives, but for an entirely new reason. “No more code than strictly necessary” used to be about respecting the humans who’d maintain it. Now it’s also about respecting the agent that navigates it. The principle didn’t die. Its justification evolved.

And What About Krisztina’s Real Question?

Let me come back to the sharpest part of her question: how much of the code can become boilerplate?

The implication is provocative — maybe we should lean into boilerplate because it’s predictable, standard, and easy for machines to generate and rewrite. If the machine handles it, why fight it?

I think the honest answer has two parts, and they pull in different directions.

Some boilerplate is genuinely eliminable. If a pattern is so standard and mechanical that we’d call it boilerplate, it’s often a sign that a better abstraction exists — or should exist. Think about the route handler that needs request validation, error mapping, serialisation logic, and client-side types, all derivable from an API contract. That’s not code you need to accept. That’s code you can replace with a cleaner abstraction, a smarter framework choice, or a build step that generates it from the contract. That boilerplate should disappear. It’s noise, it bloats the context window, and eliminating it is a genuine win for both you and the agent.

Some boilerplate is unavoidable. There’s mechanical wiring code that simply has to exist somewhere for the system to work. No amount of abstraction elegance removes the fact that services need to be configured, dependencies need to be wired, and infrastructure needs scaffolding. In the old world, this was a constant source of friction and maintenance cost. In an AI-assisted world? This is where the cheap rework argument genuinely applies. If the agent can regenerate this code reliably behind stable interfaces, stop agonising over it. It’s not worth the engineering energy to gold-plate code that a machine can rewrite in seconds.

The practical upshot: eliminate the boilerplate you can — it makes your codebase leaner and your agent more effective. And for the boilerplate you can’t eliminate, stop treating it as precious. Protect the interfaces, let the machine handle the wiring, and save your engineering judgment for the code that actually carries decisions.

Trust Is Good, Double Check Is Better

Now, all of this raises an obvious concern. If machines are writing and rewriting code with increasing autonomy, how do you prevent the whole thing from drowning in technical debt?

This is where I think the conversation needs to move from principles to practice. I still construct my delivery pipeline to enforce good engineering practices — but I enforce them at multiple checkpoints rather than relying on any single gate.

In ideation, before a single line of code is written, the first question is: does this need to exist at all? This is the cheapest place to prevent unnecessary complexity. An AI agent will happily build whatever you ask for. The engineering judgment about whether to build it is still yours.

In planning, every implementation plan gets challenged against good engineering practices. I use specialised review personas — think of them as Staff-level engineering perspectives tuned to specific domains like React, Go, or LLM integrations. The plan gets scrutinised before implementation begins. Does this architecture make sense? Are we introducing unnecessary coupling? Is there a simpler approach?

In review, after the implementation agent finishes its work, the code goes through a thorough pull-request review by specialised personas. Critical, high, and warning issues get kicked back to the implementation agent for fixing. Trust the machine to write the code, but verify the result before it ships.

In scheduled maintenance, I still do periodic reviews of the whole codebase. This catches the holistic problems that individual feature-driven plans and reviews miss — the creeping inconsistencies, the patterns that made sense in isolation but create confusion at scale, the technical debt that accumulates between the cracks.

This isn’t paranoia. It’s the same principle that makes aviation safe: multiple independent checks, because no single check catches everything.

The Principle Evolves

So where does this leave Krisztina’s question?

The old principle — “no more code than strictly necessary” — isn’t dead. But its justification has fundamentally shifted. We used to minimise code because humans paid the maintenance tax. Now we minimise code because it makes AI agents more effective and because it keeps the codebase comprehensible for the humans who still need to operate, debug, and make judgment calls about the system.

The cost of boilerplate has changed. Rework is cheap. But clarity is still expensive, and context is still finite. A lean codebase serves both the human operator and the AI agent. The principle survives not out of nostalgia, but because it turns out to be correct for reasons its original authors never imagined.

And the guardrails? They’ve moved from “write less code” to “build better checkpoints.” The engineering discipline isn’t in every line anymore — it’s in the pipeline that ensures the machine’s output meets the bar.

Welcome to the evolved principle. Same destination, different map.

On AI and Existential Fears

2026-02-20T00:00:00+00:00

Let me tell you about the dumbest smart thing I’ve recently done.

I needed to update my CV. A normal person would spend an afternoon reformatting bullet points and adding their latest role. Instead, I spent multiple weeks deep-diving into Claude Code to build a full application that ingests reference letters, extracts testimonials using LLMs, and produces a credible CV complete with quotes and skill validations.

An afternoon task turned into weeks of engineering. And I regret nothing — because the CV was never the point. The point was understanding what this technology can actually do when you push it, and what it can’t.

That experience, combined with 2+ years of working with AI coding tools professionally, has left me with a perspective on where software engineering is headed that I think is worth sharing. Not because I have all the answers — none of us do — but because I’ve been through enough of the hype cycle to have some battle scars.

Drinking from the Firehose

A bit of context. I’ve spent the last four years working remotely for a US-based startup. American tech culture has a particular relationship with competitive advantage: if a tool exists that might make you faster, you use it. No hand-wringing about purity. No waiting for consensus.

As a result, my organisation went through the full progression: Copilot, then Cursor, then Devin, then Claude and Codex. I watched this evolution happen in real-time — and I watched my own role evolve with it. As an Engineering Manager, I found myself becoming dramatically more hands-on, shipping code on top of my normal leadership responsibilities. My GitHub commit graph tells a story that would have been impossible five years ago.

I’ve been on every stage of the hype cycle with this technology. The initial wonder. The inflated expectations. The disillusionment when it generates confident nonsense. The quiet realisation that it’s genuinely useful when you know how to wield it. I’ve even had the existential crisis — the 3 AM thought of “is my career about to become irrelevant?”

So when Dario Amodei predicted that 90-100% of code would be written by AI by the end of 2025, my initial reaction was unambiguous: full of shit.

The Amodei Prediction, Revisited

We’re past that deadline now. So was he right?

The answer depends entirely on who’s talking and what incentives they have.

Amodei and the creators of Claude Code claim that this is exactly what’s happened inside Anthropic. On social media, you’ll find enthusiastic early adopters sharing similar stories. And I’ll be honest — my own commit graph lends some credibility to a version of that claim. When I’m working with Claude Code, the AI is generating a substantial portion of the raw code output. In that narrow, measurable sense, the prediction isn’t as absurd as it sounded.

But let’s zoom out.

These are still early adopters, not the majority. Most companies are still figuring out what AI-assisted development even means for their workflows, their security posture, their hiring. The enthusiastic voices on social media are a self-selecting sample. And crucially: the people making the boldest claims about AI’s capabilities are the same people selling AI. Amodei is the CEO of a company whose valuation depends on you believing that their technology is transformative. It’s literally his job to say this convincingly.

That doesn’t make him wrong. But it should make you calibrate.

The reality, as I see it: AI-assisted coding is directionally real, genuinely powerful, and already transforming how early adopters work. It is not yet mainstream, not yet well-understood by most organisations, and the narrative is heavily amplified by people with financial stakes in the outcome. Both of these things are true simultaneously.

How This Technology Actually Works — And Why That Matters

Now, let me push back on a common dismissal that I hear from sceptical engineers — one that I’ve caught myself making too.

“It’s just token completion. It’s probabilistic. It doesn’t understand anything.”

This is technically accurate and practically misleading. By that logic, human cognition is “just neurons firing.” Dismissing a system’s capabilities based on its mechanism is a category error. What matters is what it can observably do, not how it does it.

And what current AI models can observably do is impressive. They can reason through architectural tradeoffs. They can refactor across multiple files while maintaining consistency. They can write tests, identify edge cases, and explain complex codebases. These aren’t parlour tricks.

But — and this is the crucial “but” — they also have real, observable limitations that matter enormously for professional software engineering.

The models are trained on publicly available code and documentation. Let me ask you an uncomfortable question: is most publicly available code good? Is most open-source software well-documented? The honest answer is: some of it is excellent, and a vast amount of it isn’t. Models don’t simply replicate the most common patterns — modern training techniques like RLHF help them discriminate quality from noise. But they are fundamentally bounded by what they’ve seen.

And here’s what they haven’t seen much of: your specific system. Your org’s production traffic patterns. Your particular failure modes. The reason that weird workaround exists in the payment service. The implicit assumptions your team makes that have never been written down. AI doesn’t lack intelligence — it lacks situated knowledge.

The Dark Side: Where Expertise Actually Lives

This brings me to what I think is the most underappreciated argument in this entire debate.

The hard part of software engineering has never been writing code on the happy path.

Think about what actually keeps senior engineers up at night. It’s the Fallacies of Distributed Computing. It’s the failure mode that only manifests under specific load patterns at 2 AM on a Saturday. It’s the security vulnerability that exists not in any single line of code, but in the interaction between three services. It’s the observability gap that means you can’t even diagnose the problem when it happens.

There is remarkably little in the public training data that captures how experienced engineers operate software:

Observability and forensics — How do you instrument a system so you can understand what happened after a 3 AM incident? How do you build dashboards that surface actionable signals instead of noise?
Scaling — Not the textbook “add more instances” kind. The kind where your database’s query planner makes different decisions at 10x traffic and everything subtly degrades.
Security — Not the OWASP top 10 that AI can recite. The subtle authentication edge cases, the supply chain risks, the threat models specific to your architecture.

This is the stuff they don’t teach you in tutorials. It’s barely present in documentation. And it’s vanishingly rare in public codebases. This knowledge lives in the heads of experienced engineers, in post-mortem documents buried in private Confluence spaces, in the institutional memory of teams.

Now, I want to be honest here: AI is starting to enter these domains. There are AI-powered observability tools, security scanners, and incident response systems. The gap is real but it’s closing. My argument isn’t that AI will never handle these things — it’s that right now, and for the foreseeable future, this is where human expertise provides the most irreplaceable value.

AI engineering itself isn’t trivial either. It’s telling that the most sought-after course in the US product management world right now is about evaluations — how to measure whether your AI system is actually doing what you think it’s doing. Building reliable AI-powered features requires understanding failure modes, edge cases, and probabilistic behaviour. In other words: it requires engineering expertise.

AI needs an operator.

So Is It Just a Fad? Can We Lean Back?

Absolutely not.

I want to be direct about this, because I think complacency is a bigger risk than panic.

We are seeing the largest shift in software engineering that I’ve witnessed in over 20 years in this industry. This isn’t hyperbole. When developers like DHH and Linus Torvalds — people who have historically been sceptical of every trend — start acknowledging that AI is changing how they work, you should pay attention.

But recognising the magnitude of the shift doesn’t mean accepting every breathless prediction at face value. This is where the hype cycle model is genuinely useful. We’ve been through the peak of inflated expectations. We’re somewhere in the trough of disillusionment for the first generation of tools. But here’s the thing people miss about the hype cycle: the end of one cycle is often just the beginning of the next step up the abstraction ladder. Each generation of tools enables a new set of possibilities that triggers its own cycle of hype, disillusionment, and productive integration.

The question isn’t whether AI-assisted development is real. It is. The question is what the stable, productive equilibrium looks like — and we’re not there yet.

Will It Make Us Obsolete?

Not in the current generation. Here’s why.

Current AI models operate fundamentally within the boundaries of what they’ve been trained on. They can recombine, synthesise, and apply existing patterns in impressive ways. But they cannot generate genuinely novel architectural insights that aren’t latent in their training data. They cannot reason from first principles about failure modes they’ve never encountered. They cannot make the judgment calls that come from years of watching systems break in production.

The next generation of AI — whatever form it takes — is still years away, and its success is far from guaranteed. There’s no clear consensus in the research community about what the path to meaningfully more capable systems looks like. And the financial picture adds uncertainty: it’s not yet clear whether the financial markets will continue to tolerate the enormous capital expenditure and modest returns that characterise the current AI buildout. The investment thesis depends on future capabilities that haven’t been demonstrated yet.

There are also geopolitical considerations that the European engineering community in particular should be thinking about. Will US-built AI tools be weaponised for competitive advantage? Is it in our strategic interest to build critical development infrastructure on top of systems controlled by a small number of American companies? The rise of capable open-source models from organisations like Mistral and others suggests that the capability itself is becoming a commodity — which is actually good news for those of us who worry about concentration of power.

So no — not obsolete. But here’s the uncomfortable middle ground.

Even if AI doesn’t replace engineers, companies will adjust to the prospect of it. They’ll expect more output per engineer. They’ll restructure teams around AI-augmented workflows. The environment will become more volatile and less cosy. The engineers who refuse to engage with AI tooling won’t be fired by robots — they’ll be outperformed by colleagues who embraced the tools.

And honestly? There’s a decent chance that the current state of AI coding tools will be treated as “good enough” by many organisations. Not because it’s optimal, but because it clears a threshold of utility — the way VHS wasn’t the best format but was good enough to become the standard. The gap between “good enough for most use cases” and “genuinely excellent” may not matter to the majority of the market.

The Third Golden Age

I want to close with a framing that I think cuts through both the hype and the doom.

Grady Booch — one of the foundational thinkers in software engineering, co-creator of UML, and someone who has been thinking about this field longer than most of us have been alive — has described where we are as the third golden age of software engineering.

Image credit: The Pragmatic Engineer

Gergely Orosz did an excellent deep-dive into Booch’s thinking in The Pragmatic Engineer — it’s well worth your time if you want the full picture. Booch’s framework goes like this:

The first golden age (1940s–1970s) was about algorithms and computational theory. We figured out what computers could do.
The second golden age (1970s–2000s) was about managing complexity — object-oriented programming, design patterns, software architecture. We figured out how to build large things.
The third golden age (now) is about systems thinking — moving from individual components to entire platforms, with AI as an enabler of higher-level abstraction.

This framing is powerful because it recontextualises the current anxiety. Engineers have faced “existential crises” before. When compilers replaced hand-written assembly, people feared displacement. When higher-level languages emerged, the same fears surfaced. Each time, the profession didn’t shrink — it expanded and moved up the abstraction ladder. The engineers who embraced the new tools didn’t become obsolete. They became more capable.

That’s the pattern I see repeating. AI isn’t replacing software engineering. It’s moving us up an abstraction level — from writing code to directing systems that write code, from managing complexity to managing the AI that manages complexity.

What This Means for You

So, are we dinosaurs? I don’t think so. But we’re also not in a position to be complacent.

My commit graph tells one version of this story. An Engineering Manager who, armed with AI tools, became dramatically more productive as an individual contributor — while still doing the leadership work. The absurd CV project tells another version: that going deep with these tools requires genuine engineering skill, taste, and judgment. You don’t just prompt your way to a well-architected application. You bring decades of experience about what good software looks like, and the AI helps you get there faster.

The engineers who will thrive in this era are the ones who:

Engage seriously with AI tooling — not dismissing it, not worshipping it, but developing real fluency
Double down on the expertise that AI can’t replicate — production operations, security thinking, system design, understanding failure modes
Embrace the role of operator — AI is a powerful tool, but powerful tools need skilled operators
Move up the abstraction ladder — spend less time on boilerplate, more time on architecture, design, and the problems that actually matter

Booch says the opportunity is to redirect our attention “from friction to imagination.” I think that’s exactly right. The friction — the boilerplate, the repetitive patterns, the tedious plumbing — is increasingly handled by AI. What remains is the imagination: the ability to envision systems that don’t exist yet, to anticipate failure modes that haven’t happened yet, to make the judgment calls that no amount of training data can teach.

That’s not a diminished role. That’s a more interesting one.

Welcome to the third golden age.

The game has changed

2019-04-05T00:00:00+00:00

If you compare modern football teams to their predecessors 20-30 years ago, I think you can spot some interesting differences. Taking a look at the typical players for instance it seems to me that they were much more stereotypical than today.

Goalies prevented the goals, but everyone got nervous when they had the ball at their foot
Defenders were mostly about destroying the opponents game, participating in the game with the ball itself came second. Think of Jürgen Kohler for example
Creative mid-fielders and strikers were often completely freed of any defensive responsibilities. And we often had single players that dominated the games. (We still have Messi and Ronaldo, but they’re probably the exception). Some famous mid-fielders or strikers were even smokers (Mario Basler anyone?)

In a nutshell teams were composed of highly specialized people focusing on their respective duties. If memory serves me right, I also remember much more independent actions in the past and the system that squads were playing back in the days was pretty static compared to today.

One thing seem to have changed in the last decades: The game has become much faster and more intense, requiring much more fitness, more collaboration and faster reaction times.

Just to name a few stats I dug up via Google

Players do cover around 50% more distance in a game than 20 years ago
The amount of sprinting and high-intensity playing activities (like sprints) has doubled since the year 2002
The ball is in play, and live, for almost 15 minutes (17%) longer than was the case in 1990

As a consequence the squads we’re seeing today play the game radically different to what I was used seeing in my youth. Gone are the high specializations of the past.

Goalies need to be good with the ball
Defenders are often the ones who open up a counter attack either over the wings or through the center
The offensive that’s freed from all defensive duties is pretty much gone. Defensive is now starting at the frontline. Mid-fielders and strikers do have to play their role in it

All of them need to be highly trained athletes in order to be able to compete at highest level. Gone are the smokers of the past. And another fascinating thing seems to have changed: The dynamics between the coach at the sidelines, the system and the players on the field.

Now with the game being so fast, decision making solely happens on the field. Players need to thoroughly understand their role in the larger context, their expected behavior in certain situations and their options for adapting when situations arise that weren’t on the preparation sheet. They have to, so that they can make these split second decisions that make or brake the game without a lot external sign-off.

If you ever watched a top squad from the higher ranks of a stadium, you’d also probably realized the magnificent coordination that seems to happen intuitively. Gone are the static positions in the past. You see coordinated changes of the system in offense and defense depending on the game situation. Obviously it’s not intuitive, it has been trained to the excess into the squad, but sometimes it looks to easy and natural that it has the feeling of intuitiveness.

So why am I writing about this?

I find that fascinating and beautiful. When I was looking for inspiration on how I wanted to lead an engineering team, I peeked into all sorts of disciplines and professions. Besides the typical management literature two areas that you often end up at are competitive, squad based sports and the military. And I usually prefer the former.

I see a lot of similarities between the increased need for delivering faster to customers, iterating faster on our product while maintaining or even increasing the quality of the delivered product in the IT world and the development I described earlier about football.

I also think from that you can derive some broader principles and deeper understanding about teams and on how to lead in such environments. At least it had a dramatic impact on me and shaped how I’m running my team for the last 2 years.

A lot of the insights I’m going to touch on next are probably worth their own detailed look. I’m leaving this for other posts though to see if there’s enough interest and effort of jotting them down is justified.

Please also note that these are my conclusions. There’s a chance that they’re wrong in the broader context and not applicable for everyone. I’m just a flawed human being like everyone else. But I’ve seen these things working and producing tangible results.

So let’s dive right to it

There is no such thing as too much transparency

A lot about what is perceived as performance often (but not always) boils down to speed of making decisions. In order to be able to make the adequate ones, I think you need proper context. You need to have a thorough understanding of the “Why?” in order to derive the “What?” and the “How?”. And usually there are plenty of “What?”s and “How?”s.

Some managers work with information, but wherever I can I try to make as much information available to my team as possible, the good ones and the bad ones, so that the proper decisions have a chance to be formed within the team.

I also learned that there’s quite a bit of loyalty involved when you’re open about the negative and not working things from the start. Some applicants thought I was trying to talk them out of an open position when in fact I was being transparent about what working in my team compared to working to what he/she was used to. But the ones that took it on are still with me, even if it has been stressful sometimes.

Explicit more often than not wins over implicit

Some people do get pleasure from being the sole point for authority of their area and that all important decisions have to be acknowledged by them. I’m not one of those people. On the contrary I think it hinders day to day team performance when an engineering lead or manager is asked for permission on all sorts of things.

As the enemies of Admiral Nelson in the battle of Trafalgar found out, things get slightly messy when you’re relying on a central authority which is not available and the rest of the gang is suddenly without direction.

Similarly in the pitch as a player, you can’t wait for someone else to make decisions for you, you’re forced to make the decisions yourself under time pressure.

The point here is that the more information you’ve got available and the clearer your picture is on what your responsibility and authority is, the faster the decisions will likely happen. You simply have more mental capacity available for the job at hand.

I’ve used Delegation Poker and Delegation Boards to sort out and clarify decision making with success in the last 2 years and it feels right and natural to me.

Generalists over specialists

The engineers in my team are jotting down and discussing requirements, implement features in multiple programing languages (we’ve used Scala, Ruby and a bit of Go in the last 2 years), doing QA, giving workshops and internal trainings, organizing all major agile-ish meetings (like groomings, retros), talking to customer teams (we’re building an internal developer oriented product) and many other things.

There’s no Product Owner in my team, no Project Manager, no Agile Coach or Scrum Master. They simply do everything themselves.

Funny side effect of this, is that they seem to support each better than any specialist team I’ve worked with in the past. Does it mean that everybody is awesome in everything? No, but we’re all improving in multiple dimensions and it’s a pleasure to see how this group adapts to new situations. Because there’s nobody in my team that pushes back on things that need to be done with “That’s not my job”.

An in the face of a more fast paced world around us, I believe we need exactly that. Teams that are able to adapt and change gears quickly, that are not afraid of loosing their past investments when going into an area where are they’re not the expert anymore. For me that’s true agility.

Constant inspection, feedback and tinkering creates highly performing teams

The intuitively looking collaboration you see in successful squads on the pitch, is actually the result of many, many months and years of training. Successful teams train different game situations over and over again, stopping them, discussing them, correcting them. Both as a group, but also at an individual level. It’s tinkering in its pure form. But because of this inspect and repeat loop, when the situation requires it, most of the action happens on autopilot and the mental capacity can be used for other things such as creatively resolving whatever situation is in front of you.

What you can take from this is that there’s value to standardize, codify and/or document how your doing things and iterate on that. To capture tacit knowledge and to make it visible and available to everyone.

In my team we went to the extreme and defined everything as an MVP (Most Valuable Product, in the lean sense), so that the way we work and the software we produce is constantly inspected and refined based on hypothesis. It has served us well and is I believe one of the reasons why we were successful in the last 2 years.

You can sprint, but not forever

Football as a game has become much more intense as described earlier. Interesting side effect from this is that we don’t have a constant level of intensity, but more like alternating phases of rushing and slower phases in the game where it’s usually less pleasurable to watch.

Similarly in software engineering or IT teams, no matter how many Scrum enthusiasts are saying it out there, you can’t always sprint. It’s just not sustainable. You need these less productive phases to recover, if you sprint.

Even more controversial to some probably, the more sustainable and at the same time more predictable way is actually not to sprint at all, but working at a constant throughput and slightly below 70% utilization.

Totally boring, but effective. And a topic full books have been written about, so I leave it at that here.

You’re measured by results only

There’s an ugly truth to competitive sports: No matter how ugly you play, nobody will probably bother as long as you get the job done and deliver the expected results.

On the other hand, how many trainers do you know of that are still in their seat, whose teams played beautifully but weren’t really successful?

For IT professionals that like to argue about technical excellence, code quality, testability etc. this is sometimes hard to swallow, but a product that has 100% test coverage but fails to deliver value to the end user, will go down as a failure in the history of the organization no matter how good the technical bits were.

Similarly, and that’s even harder for someone like me to swallow, it doesn’t matter what good, productive atmosphere you created, how adaptable your team is, how fast it makes informed decisions, if you’re not delivering on results, you’re likely not positively remembered. Even worse, you might be replaced by an old school, top down guy that ensures in his own ways that the job gets somehow done.

Your expertise from yesterday is slowly but silently depleting

Remember when Jose Mourinho still called himself “the special one”? That was before he was fired three times from Real Madrid, Chelsea London and Manchester United.

For me he seems to be a pretty good example of someone that used to be successful, but wasn’t able to adjust his skills while the football was changing around him. And now most opponents have understood the defensive style he’s playing. My guess is that without re-inventing himself, he won’t be able to get back to his glorious days.

In a similar vain, as a technical person you direly need to continue learning. Either that or things might become very frustrating on the job market. Engineering managers have also the tendency to be an endangered species, at least the ones that want to be involved in technical solutions to a certain degree. Very often their day-2-day duties don’t involve coding and knowingly or unknowingly their former knowledge slowly but continuously gets outdated to the point where they’re maybe making decisions about things they’ve never used or touched. Usually a recipe for disaster.

I’m personally trying to spend more at least 20-25% of my time per week on operational topics (such as coding, documentation, technical support, etc.) with my team so that I don’t loose touch with them.

Embrace that we’re all on a way to somewhere

Like players in the pitch, there’s high demand for really great IT professionals on the market. You can try to fight that, but you can also choose to embrace that. Allow your team to be in the spotlight wherever possible. Give the thing you’re working on actual faces that people can relate too. Increase their marked value. Aim for having a good and successful time together. Cherish it, because eventually like all good times in life, things will come to an end. One or more of you are going to leave. That’s normal and fine. But the relationships stick and your team culture and spirit might also be preserved even when you’re not part of it any more.

Closing thoughts

Things have become noticeably faster and ever changing compared to 10 or 20 years ago in our profession. Decide for yourself what this means for knowledge work and collaboration. I gave you a brief overview about the conclusions that I derived from this by looking outside of our profession, namely professional football.

In the end of course all metaphors and analogies break down, but that doesn’t mean they’re completely useless. I use them as guiding principles that drive the way I make decisions and approach work together with my team. Am I perfectly following all of them? Nope, I struggle every now and then. Because things are usually neither black nor white, but gray. But I try to follow them wherever I can.

If you’ve got questions on this, feel free to connect with me on Twitter. My handle is @bjoernrochel. I hoped this was somewhat useful to read. At the very least it was for me for structuring my thoughts regarding this.

See you next time around!

:wq

Our GraphQL learnings and observations

2019-01-11T00:00:00+00:00

A belated happy new year everyone! I hope everyone of you had a good start into the new year. Todays post is going to be of a different kind compared to the rest of the the series so far. In the last posts I talked in depth about the “why?”, the initial project decisions and the preparations we took to ensure that we’d be able to deliver on our promise. But I haven’t yet talked about our actual experience of introducing GraphQL into our organization.

At the end of the first year (2017) our initiative was already seen as a success and it continued to thrive in 2018 (though slower than in 2017, mostly for reasons that were outside of our control). Most of the benefits we expected from GraphQL turned out to be true. It’s quite eye opening to see colleagues play with GraphiQL (or later GraphQL Playground) for the first time. Pretty much all of them instantly realize the (positive) impact this could have one their work. Some also instantly get that GraphQL has an impact on collaboration between the various disciplines.

But it’s not all roses. We definitely struggled in some areas. In 2018 you can find an ocean full of resources about why GraphQL “is the new sexy”, “superior to REST”, “to REST what REST was to SOAP” and in general “the best next thing after slided bread”. In order to have a meaningful discussion about a technology though, I believe that we also need openly talk about unexpected outcomes, the mistakes we make introducing new technology like GraphQL and also areas where maybe room for improvement for such a technology exists. Only then can we have a more nuanced view on a piece of technology.

Some of the things that I’m going to reflect about today are likely the result of a combination of multiple factors. Our approach, our culture, our setup and other influences played into it. To give you some context about the company I’m working in (and that is worth understanding before going into the individual points):

My employer has roughly 1,3k employees of which ~300+ are developers
Of these to my knowledge roughly 70+ are native mobile developers
For the rest I don’t have concrete numbers, but my feeling is that it’s roughly 2/3 backend developers and 1/3 web frontend developers.
Most of the engineering teams are setup to build all representations of their product (backend, ios, android & web)
In comparison the teams that are dealing with platform topics are fairly small.
- The system architecture team (which primarily is concerned with running our PAAS platform based on Kubernetes these days) is ~10 people.
- My team that was pushing the GraphQL initiative counts 4 people (including myself and we also had to maintain our OAuth gateway on the sidelines).
In general our engineering organization tries to give a lot of freedom and autonomy to individual product teams, but in general shields them from platform related work. As a consequence there’s always the lingering pressure for platform related teams to find ways to do their work without interrupting or even blocking product teams and as a consequence a lot of the larger technology initiatives by their nature need to be incremental and non-blocking in how they’re rolled out.

That being out of the way, let’s dive into our learnings and observations:

1. Expectations on GraphQL vary dramatically between web and native mobile developers

Have you ever called into a GraphQL service by hand, without some GraphQL specific tooling? Like with curl or fetch? If you’ve tried that and have understood the format that a GraphQL server expects, then you realize that everyone who is able to do a REST call is also able to interact with a GraphQL server. They are not too far apart. Sure, the error handling differs significantly, but the general mechanics of obtaining data a very close.

This enabled for us a more or less straight forward iterative rollout plan:

Get rid of the BFFs and use GraphQL to talk to the platform instead.
In case there’s value seen from the frontend teams, move higher into the client stack.

Always assuming that we need to manage the risk of GraphQL turning out to be such a great fit for our landscape. We tried to be lean and adaptive. So much for the theory, now comes the practice part.

Our mobile developers pretty did exactly that. It kinda seems logical now, looking back at the last two years, since a lot of our motivation came from mobile challenges. That group treated GraphQL as a drop-in replacement for REST that freed them from a lot of the request coordination logic. They had some gripes with error handling (more on that later) and the available native libraries (both mobile teams opted against using the available Apollo library, because they deemed them not ready yet), but all I think they were pretty happy with our proposition.

On the web side, things became slightly interesting though. React had already been in place for quite a while as the default client side library, together with Redux as the client side state management solution. BFFs were the norm and supplied the frontend with data. Our GraphQL gateway would make these BFFs obsolete. Not so different to the mobile side, or so we assumed.

In retrospect this assumption is one of the larger mistakes that I personally made.

Let’s say it this way, this idea created quite a bit of friction between the API team pushing the GraphQL topic and the team responsible for the frontend architecture. It turns out for them, GraphQL wasn’t only about an API. It was a whole new way to build applications end-to-end (basically API + client side state management + self contained React components that specify their data requirements via co-located GraphQL queries and fragments). And they wanted to have all of that experience immediately. All of the great Apollo talks and demos in that area certainly also didn’t make it easier for us.

The only problem though was this: We lacked the client side experience and the frontend colleagues lacked the resources / time to properly accompany this. Which led to introducing Apollo for example at a later point and a time where we significantly lacked guidance on how to bring all the pieces together in the frontend. This resulted in some surprises, for instance in the client side caching behavior which we initially weren’t aware of.

The point to take away from this is that you should plan and setup your GraphQL introduction end to end, with all necessary client resources available. While mobile developers were fine with an incremental adoption, trying to follow the same approach on the web created friction for us. The GraphQL buzz today primarily happens on the web side and in the Javascript world. With that expectations of what GraphQL is able to deliver differ dramatically from their mobile counterparts.

It’s a bit ironic, when you know a bit of the history of GraphQL. If you trace back the GraphQL story to its origins within Facebook, you realize that it came from a mobile background and then expanded further into the company. Basically what happened is that some pretty clever folks built the first version of the concept for the new, native mobile newsfeed. That was when Facebook was still primarily following their HTML5 approach within their mobile applications. When the decision was made to switch the strategy to go all-in on native development instead of web technology, it was a combination of “having a great, working concept” and “being at the right spot, at the right time”. They pretty much created the blueprint for things to come. GraphQL didn’t start out on the web or even Javascript side. Only when they came to the conclusion that “this could really be a thing”, they created the spec and the reference implementation in Javascript. But this happened at a later point.

In case you wondering: In 2017 I sat next to Dan Schaefer in the cab on the way to the speakers dinner for the GraphQL EU conference and had the chance to pester him with all sorts of questions. Dan is one of the original GraphQL creators (besides Nick Schrock and Lee Byron).

2. Client libraries weren’t on par when we started

I hinted it in the first point, if you’re going to build not only for web but also for the native platforms, you have to be aware that there is quite a difference in breadth and depth of available documentation for those different platforms. When we started in early 2017 documentation for the native client libraries was basically not existent. The Android library was in a very early alpha state, the iOS version was a bit (but not much) ahead.

Both of our mobile teams looked at them in 2017 and eventually decided to roll their own tooling. This made the integration into our OAuth 2.0 flow and the rest of the existing application easier, but obviously also comes with a cost associated. Many of cooler later additions on the client side libraries can’t be found in our native mobile applications.

I guess it’s one of the prices you often need to pay as an early adopter. Just looking at Github today, it looks like the libraries have nicely developed in the meantime. If faced with a similar choice today, the result would probably be different.

3. All client developers struggled with error-handling and partial results

In November 2017 I gave a talk about our GraphQL experience in Munich while visiting Autoscout 24. One thing that stuck with me from the following Q&A is that one of their Lead Engineers was noticeably taken aback by my assessment that client engineers struggle with error handling and partial responses more than expected. From his point of view the partial responses were one of the reasons to use GraphQL and he was heavily advocating that on their sided. And now I came along and talked about how our frontend colleagues struggle to make the shift there.

The interesting thing here is that I don’t even have a very different personal opinion. I think partial errors are great. That the server is able to give you something back in case he can’t resolve all of the data is actually a great premise to me. Especially if your GraphQL server (like ours) is a gateway that is remotely talking to other services and the fallacies of distributed computing come into play. Partial results reflect the reality of a distributed platform much better than the normal binary “worked” or “didn’t work” approach (when in the latter case maybe just a tiny portion of the data couldn’t be resolved).

But at least in our company I made the observation that frontend developers aren’t so fond of this changed behavior initially and for some of them it takes quite a while to accept and wrap their head around this. Just to give you one example: I’ve seen mobile developers that were already working for more than a year with GraphQL and tried to talk their backend colleagues into making the schema super strict with mandatory fields for fields that might fail everywhere. The intent being that constraint violations would bubble up and they got their beloved binary behavior again. In the end we talked them out of that. My team 2 years in is still consulting product teams to think about how to model their schema in a way so that it’s usable in the presence of partial errors.

Things take a while to stick. Especially if you’re coming from a REST setup and your work affects many people, I would try to clarify this aspect as one of the earlier topics.

4. Even though GraphQL documentation is great, onboard carefully

At the start of 2017, if you wanted to understand how GraphQL worked and what kind of assumptions it did, there were already great resources out there, especially graphql.org. We pointed the first team we worked with towards these resources. Needless to say, it turned out that this wasn’t enough. Otherwise I wouldn’t be writing about this now :)

Great documentation is awesome, for the case where people actually bother to read it. At least half of the people you typically meet also do this. The other side immediately goes into exploration mode and tries to build something, without going into the documentation. And out of that group from my experience one half reads the documentation when it encounters problems. The other half doesn’t even do that. That group usually approached us in frustration about why “this” and “that” wasn’t working as they expected.

To give you again some examples:

I already talked about the mobile engineers and their desired behavior for partial responses
We were approached by one backend engineer that worked under the assumption that you could send multiple queries simultaneously to the GraphQL server and the server would magically optimize these multiple GraphQL queries into a perfect execution scheme. Needless to say that our server only accepts one GraphQL query at the moment and the only reason why we accept an Array as an input is that we speculated that at some point batch operations would land in the spec (which is a completely different thing and also hasn’t happened yet)
Another frontend colleague who took query co-location to the maximum with every React component having his own GraphQL query, apparently wasn’t aware of the solution using fragments and was vividly arguing that our GraphQL server was frontend-unfriendly and unusable if we didn’t support Apollos batch handling.

From this experience we changed our rollout scheme dramatically. We developed a full 2 day onboarding program that alternated between theoretical content and lots and lots of practical exercises, that are close to what engineers have to build in the final product. And we switched our initiative into a mode that resembles the typical, invite-only private beta you’d normally encounter on producthunt. The assumption here being that you have to mange onboarding with a limit so that you can learn with a limited surface for errors and adapt for the following teams.

In one of the earlier parts of the series I talked about that we treated everything as an MVP: The implemented server, the process and the documentation. Onboarding was a logical addition to this and we continue to do these workshops even in 2019.

5. GraphQL type system feels (sometimes) too basic

When working with a large group of people on a schema, I think GraphQL is lacking some basic things in the schema. I think this is independent of whether you work with a single schema or use some form of schema-stitching.

The first one that comes to mind is the simple concept of namespaces. A “profile” in one domain might be something completely different to a “profile” in another domain. You might want to call both “profile”. As of now, if you want to have both of them in a schema, you need to make their names unique, usually by prefixing their name somehow. For instance having a CompanyProfile and a MemberProfile. That’s also what we do currently.

I know the argument that existing tools do already a good job to identify conflicts, but I would prefer having an explicit concept of namespaces that also works without some form of extra tool or linter, so that we could also have Members::Profile and Companies::Profile.

Another aspect that I find lacking in the type system of GraphQL is a way to express the behavior of fields somehow. For example: If you have multiple fields coming from different sources, there’s likely a different “cost” associated to them and I would like to express this also for tools and inside the documentation. Sangria already has this nice concept of FieldTags that you can attach to a GraphQL field. You can think of them similarly to .NET attributes or Javas annotations. For our system we’ve figured out a way to signify which FieldTags are documentation relevant and automatically show them inside GraphQL playground. The goal being that you have a higher value documentation of the behavior of a field.

Here’s one example for this:

This shows the documentation of a field that is backed by a REST API, where multiple occurrences of this are batched together (@rest(batched)) and that this particular field is not requested from the backend when the parent is resolved (@virtual). We “faked” this by prefixing the normal documentation and patching GraphQL Playground to show the tags in a nicer way using colors and explanatory tool-tips.

But this is a duck-tape solution and again I would prefer to have a real answer similar to this for GraphQL. I think the logical solution for this would be to make server side directives part of the exposed schema, but I’m not aware of any plans to add this.

6. GraphQL spec lacks a pagination abstraction

Sooner or later as an application developer you will encounter the topic of pagination in your schema. The GraphQL spec is completely agnostic when it comes to the pagination topic.

It’s not like there isn’t a fitting spec available. The Relay Cursor Connection Specification pretty much exists since the early days of GraphQL, but separately from the GraphQL spec. Prominent GraphQL APIs like Github and Shopify do make use of it. We also use it more and more (both for real cursors and as an adapter around limit and offset based APIs).

Not having a default pagination abstraction obviously gives some flexibility and adaptability for GraphQL itself, but I can’t help to think that GraphQL could benefit from having a default abstraction for this out of the box. If there was an explicit concept, tooling could make better use of this and also frontend components would have a more standardized notion of pages of data.

7. Don’t be naive with file uploads

Somewhere in the middle of 2017 the topic of file uploads came up. We did the simplest possible solution: Tunneling the binary file as a Base64 encoded blob though our GraphQL mutation. That works, but is in general a very bad idea. You could even call it naive.

Take a look at the following dashboard. Guess where most of the P999 and MAX numbers are coming from :-(

Needless to say, this clogs throughput on the gateway side and decreases the capability to properly monitor the system. Even worse, you run into situations where the frontend times out, but the upload has reached the servers and is actually being processed (only very slowly).

We’re in the process of migrating away from this quick-fix to a separate upload service based on tus. In the new setup the GraphQL schema only provides the means to secure an upload slot and the upload is living completely outside of GraphQL.

I can only advice you not repeat this mistake and keep uploads separately where possible.

8. We struggled with our mutation design

Oh mutations, another unexpected headache. You see GraphQL servers execute hierarchically and sub-selections only apply when the parent resolver was successful.

The first version of mutations was pretty simple in that sense that for successful REST calls underneath, we expanded the selection and where they failed, we added an entry for them as error extensions into the global errors collection. Straight forward, or so we thought.

Turns out that the first wave of mobile developers hated the extra effort that they needed to do when they wanted to access this error information. And we changed the implementation to please them.

Our final version of mutations basically contained an explicit model of the error response in the schema. Mutations pretty much nowadays look like this:

type ContentArticleMutationResult @mutationResult {
  success: ContentArticleInterface
  error: ContentError
}

The extra @mutationResult directive is necessary to enable an exclusive-or behavior. So it’s always one of the fields filled but never all of them. There’s good about this design, since it also enables you to model your errors explicitly and you can also see errors in your schema now.

On the other hand, we broke two basic things that are worth mentioning:

It doesn’t follow the normal hierarchical execution model of GraphQL
It also is not symmetric to queries anymore

We have some few noteworthy occurrences where for queries, the same team that heavily objected our first mutation design is now more or less perfectly fine with reading out of the errors collection in case a query failed. You only realize this in retrospect, but to me this indicates that we as an API team maybe gave in to early in order in this area of GraphQL to please our customers.

09. Retrofitting into an existing APM solution is easier than you think

When the question comes up today how you would monitor performance and health of your GraphQL application, you’ll likely eventually end up with Apollos toolsuite. The company has done a good job capturing the momentum with their former Apollo Engine product which was recently superseded by the Apollo Platform.

At the start of 2017, when we started. Apollo wasn’t there yet though and a bit later we also didn’t feel fully confident to put such a young product in front of our platform. So we rolled our own integration into the existing monitoring solutions used at my employer:

Logjam for application performance monitoring and request tracing
Grafana, Prometheus and Prometheus alertmanager for filling in the blanks where we either wanted or needed more information that couldn’t easily be expressed in Logjam

The effort in making this happen was manageable. People that came into contact with our GraphQL gateway could continue to use the tools they were already familiar with. And of course, if were ever wanted to do something special, we could bend our tooling to our will. To give you some ideas how it looks:

Here’s a dashboard that shows the slowest GraphQL queries in Logjam

Here’s our golden dashboard that we use for system wide overviews

Here’s another view into the dashboard we use for field usage tracking

And we’ve got many more, for example for the internal REST client and also the JVM system metrics.

There’s a lot of heated discussion about monitoring tools in our company at the moment and Apollo took the GraphQL APM space by storm in the last two years, but we found this way both feasible and practical. If there’s a killer feature in Apollo that we direly want and aren’t able to replicate, then we’d probably have a second look at the Apollo Platform, but based on the experience of the last two years, we don’t feel like we need to.

10. Schema-first is only half of the story

When I saw the first bits of SDL I immediately liked the idea. We had the challenge at our company that we operate in a polyglot environment and we specifically didn’t want to teach the programming language the GraphQL server was written in to everyone who contributing to it. On top of that we were searching for a way to have a meaningful discussion about the schema without going too much into the mechanics of it, a good abstraction so to speak. SDL looked like the perfect fit.

And to a certain degree it is. Almost everyone we worked with, be it backend engineer or frontend-engineer, native or web focused, all of them were able to pick up SDL in very short time. However in all of the available GraphQL servers I know, the declarative part ends once you hit the resolvers and you’re forced to learn the programming language the server is implemented in again.

If you want to have a fully declarative approach, you need to fill in the resolver portion. And this doesn’t come cheap. Most of the engineering work we’ve done in the last year was basically about making end-to-end SDL a reality and work on real product problems.

Too be clear we’re pretty happy with our results. With our system, we’re able to push an executable schema into the server within seconds, making it super smooth to work with. But we’ve also diverged from the typical GraphQL approach quite a bit. The end of this is not foreseeable and the future will tell how good we’ll fare with our approach (compared to the alternatives).

In case you’re curious: My co-worker David gave an excellent talk about our SDL approach at last years GraphQL EU conference.

Conclusion

This post started out as the summary of what we learned in the first year with GraphQL. Somewhere in the first half, it somehow reshaped into overall learnings so far. Things usually overlap a bit time-wise and I thought the post is probably more valuable if I share all of our learnings.

I hope I could give you a bit “food for thought” when it comes to GraphQL. Like I said in the introduction, my feeling is that we already have enough success stories out there and need to get a more rounded picture.

Don’t get me wrong. GraphQL is an amazing technology and we enjoy working with it every day. But there’s no free lunch, especially when you introduce GraphQL into an existing product organization without a large rewrite.

See you next time around!

:wq

On Agile Codebases

2018-11-23T00:00:00+00:00

I’ve participated in efforts to deliver software for more than 15 years now. The first five using the good old waterfall approach with infrequent, big releases; the last 10 years in teams that followed one of the flavors of the agile method.

Some of the teams I was working on were using Scrum. Others used Kanban. Some used a mixture of both. The worst application of the agile method I encountered in the last 10 years (and sadly the most prominent one) is what I would generally describe as Agile without a soul, or as Martin Fowler calls it faux-agile. You know, the version that heavily focuses on the prescriptive process portions. Which sometimes has teams whose members don’t meet on eye level because the Product Owner actually turns out to be a manager in disguise, other times has developers that don’t care at all about business outcomes but celebrate technical discussions about KISS, DRY and friends to the death, often in places where it doesn’t really matter.

To be fair, I’ve been guilty of those discussions in the past as well :)

Maybe you know those kind of teams I’m talking about. Teams were removal of impediments and/or real continuous improvement is seen similarly often as unicorns. The only thing that does turn out to be continuous is the nagging sentiment in higher management levels that “This ‘agile’ thing is fraud / doesn’t deliver what it promises / hurts us more than it actually helps”.

But guess what, I’ve seen those unicorns. And I’ve seen them more than once. Highly productive, self organized software engineering teams, that constantly inspect and adapt how they’re delivering software. Teams that are conscious of the value that they provide to others, that collaborate closely with their stakeholders to deliver high quality results in a given time-frame. I still consider myself a believer in what slowly is somehow becoming similar to Voldemort (he who should not be named, for you non Harry Potter folks).

If you now got the impression that this is going to be an epic post about how to do agile “the right way”, I have to disappoint you. That’s not going to be what I would like to talk about today. I doubt that there is “the right way”, it all depends on context. And even if we accept that, I think it’s still a big hairy audacious goal to describe how to be agile in all its intricacies.

So, after bailing out on that, I would like to talk about something much simpler today. Something that is often overlooked when trying to deliver software in an agile way: Technical and semi-technical stuff that might help you being agile. And I don’t want to approach this in an abstract way, but reflect a bit on what we actually did for our XING One project. Especially those “Yeah, that was a good idea”, “saved our back more than once” or “would do that again” stuff.

So let’s have a look at those.

1. Coding with the right mindset

You can go into a new software adventure with the “I’m going to rock this” attitude. Especially if you’re coming from other successful stints or projects and are highly confident in your capabilities.

Or you can choose to be a bit more humble.

You can accept that you’re not perfect, that you’re going to make mistakes and some sub-optimal decisions along the way, that requirements likely will change in an unforeseen way and that new opportunities may arise that you would like to follow-up on. If you accept that as a starting point, I believe that you look at your software and your setup differently. Consequently this was pretty much our stance with XING One.

Just to give you some idea what we had to correct during our first year:

We completely ripped out our first REST layer (the HTTP layer talking to the internal APIs, based on akka-http) and replaced it with Twitters Finagle (because our application was dying in overload scenarios and profiling pointed to akka-http)
The HTTP layer itself was refactored and revised numerous times to adapt to new requirements that were introduced with new capabilities (primarily caused by new custom SDL directives)
We proactively replaced the JSON library we were initially using (json4s) after we had been live for several months, because we learned at some point in the project while profiling large GraphQL queries, that the JSON library we used implemented key lookup on top of a list (e.g. the larger the JSON document that was selected from, the longer the key lookup would take)
The metadata layer was refactored numerous times (we convert a lot of the SDL directives into Sangria FieldTags which then control the execution in the execution layer)
We switched how you would extend XING One from using a Scala DSL to SDL after 10 months, once we were sure that this would work
We changed the way how you’d call from SDL into native Scala function at least twice

and probably some more changes that I don’t remember off the top of my head right now.

What I want to say is this: Even though our outside GraphQL server interface didn’t change much over the course of the first year, the internals were constantly revised, rewritten and improved, on a system that after 6 month had already real production traffic and real users on it.

2. Use an refactoring friendly language stack

I’m repeating myself here, but I want to emphasize the point. If you can’t change your software fast and in a fairly cheap manner, you will have a hard time trying to work with agile and iterative product development. The ability to pull of refactoring with confidence, low ceremony and fairly low effort is key to a successful agile implementation. Move fast and break things sounds super sexy until you actually start breaking things and turn your users against your product. Your customers actually like when your application behaves predicatably. Your boss probably too …. and wait … I guess you as well. So make sure that you can refactor with ease.

There are different opinions on what makes a codebase easy to refactor out there. Dynamic typing advocates usually mention that you have less code to work with and less abstractions that you need to deal with, that it’s faster because the compiler is usually slow. People from the static typing camp usually bring up the better tool support and that the compiler does a great job for you (whose absence you have to compensate in dynamic languages for instance by writing more tests).

I’ve build production systems in C#, Ruby, Elixir and Scala now, with a bit patching and bug-fixing in Objective-C, Perl and Erlang sprinkled in. So compared to a lot of people out there who have strong opinions regarding this, I can at least say that I spend significant time in each technology to feel its benefits and downsides first hand.

I always liked the feeling of speed that you have when you’re working on a new Ruby or Rails codebase. But I also have to admit that working on large Ruby codebases, at least for me, was almost always outside the fun category. And refactoring Ruby applications even today ends up with a glorified ‘search and replace’ and ‘fingers crossed that you found all references and/or everything is well covered by tests’ (yes, I’ve also tried Rubymine and didn’t see a lot of improvements there).

Once a codebase has reached a certain size I can definitely see a lot for value in static typing. Having Resharper in C# or IntelliJ in Scala at your disposal for me makes one hell of a difference. And I also think a good refactoring tool changes the way you work. You rename more often and you move stuff around more often. You can easier check where code is actually used in the codebase. And much much more.

Scala for me with Intellij fits well into this category, but it’s not without it’s caveats.

3. Throw out refactoring unfriendly language features

There is "A better Java" - Scala and there’s "I'd rather do Haskell but I'm stuck with this" - Scala. The latter comes with one of the most refactoring unfriendly features I have actually seen in a static language, implicts. Not only do they slow down your IDE quite dramatically, they also cause puzzling looks on developers faces, when somebody removes an “import” by accident and the application doesn’t compile anymore.

You certainly can do cool things with them (Typeclasses anyone?), but I don’t think implicts are worth the cost. Especially when you know that you want to rely on refactoring capabilities a lot.

4. Remove unnecessary discussions about code style and conventions

Debates about code style are lovely time-sinks and I’m super grateful for the development of the recent years that brought more and more tools into the various tech stacks that do automated code formatting, based on some agreed upon time conventions. I’m not sure where it originated, but I think gofmt started this. I definitely appreciate the idea. Consequently we relied on scalafmt from day one and since that day it’s automatically ensuring that all our contributions are in a similar format.

We also had fairly conservative Scalastyle checks in place early on. Scala gives you many options on how to write your code (ranging from obj.method() to obj function). While I like the idea of the “scalable programming language”, having multiple of those styles mixed together in one application, makes it more confusing to work with the code in a team and creates unnecessary distractions. Also here we favored the boring solution of obj.method() over the more functional syntax.

5. In-memory integration tests yield the most bang for the buck

One thing that I’ve seen many teams struggle with that embraced automatic testing, is finding the right testing approach that brings not only safety, but also speed and confidence. Similar to agile, I think you can see quite a bit of cargo culting in the automated testing area. Don’t get me wrong, automated tests are an essential part of everything I’ve done in the last years, but somehow more often than not, when I look into different teams codebases, the setup falls into one of two extremes:

Heavy reliance on end-to-end integration tests, which are usually not very insightful, very slow, aren’t run often and even worse sometimes even flaky (especially when you’re dealing with distributed applications)
Heavy reliance on unit level tests and mocks, which are usually very fast, but very closely coupled to the implementation (resulting in parallel rewrites of the tests whenever the application changes)

I think there’s a sweet spot in between, when you can figure out a big enough functional unit that can be decoupled from outside dependencies (HTTP, DBs, AMQP, etc.) and at the same time be effectively used to describe the behavior of your application. The decoupling from outside dependencies brings the necessary speed to run them often. The ‘big unit’ usually brings the ability to describe your applications behavior without describing the internal implementation, which often means little rework when the implementation changes.

The result of this in my experience is that tests tend to become more central to the development workflow of a team and that testcode is also more considered as being ‘part of the package’ and not an ‘appendix to the actual thing’. There’s value in treating test code like production code. Remember it’s not about the amount of code written, it’s about the right amount. An imbalance between effort for test automation and and writing the actual feature can have a severe impact of flow in your team.

Out of curiosity I took a look into the XING One repository while writing this.

At the time of writing the test codebase consists of 1229 tests that take ~55 seconds to run on my machine (a 2018 Macbook Pro). The testcode is of similar size compared to the actual application code (22628 vs. 21018 lines of code)

The ‘big unit’ in our case is the GraphQlEngine, the piece of code that immediately takes over in our application once the outside HTTP interface received a request. It can be configured in different ways (authenticator, schema, middlewares and query providers). A spec for a particular feature usually showcases the feature or capability with the minimum necessary configuration.

Tests typically follow an AAA format:

Given a GraphQLEngine with the following middleware, schema and query providers configured
Stubbing the following HTTP requests (Optional)
When I run the following query
It should have performed the following requests
It should produce the following result

To give you some better idea how that might look in code, here’s an actual (simplified) example from the codebase:

class ConstResolverSpec extends EngineSpecification {

  val schema = """
    |extend type Query {
    |
    |  # Builtin scalars
    |
    |  aString: String! @const(value: "MUCH STRING")
    |  anInt: Int! @const(value: 1234)
    |
    |  # ... shortened
    |}
  """.stripMargin

  val engine = createSchemaEngine(schema)

  "successful conversions" >> {
    "String!" >> {
      val query    = "query ConstTest { aString }"
      val response = runQuery(engine, query)

      response.statusCode must beEqualTo(StatusCodes.OK)
      response.result must beJson("""
          | {
          |   "data" : {
          |     "aString" : "MUCH STRING"
          |   }
          | }
        """)
    }

    "Int!" >> {
      val query    = "query ConstTest { anInt }"
      val response = runQuery(engine, query)

      response.statusCode must beEqualTo(StatusCodes.OK)
      response.result must beJson("""
          | {
          |   "data" : {
          |     "anInt" : 1234
          |   }
          | }
        """)
    }

    // Shortened
  }

6. Get rid of cross repository coordination

At my current company we favor very small repositories. Libraries have their own repository. Individual services have their own repository. It also often happens that integration tests are located in a different repository. That definitely has some advantages. You isolate the work from different people better, including having a clear, focused git commit history. You also don’t spend a lot time cloning on that particular repository. You can pick only the few repositories that you need and then you’re good (compared to cloning the world and then using just a small portion of it).

But it also has a very clear disadvantage: Coordination. Say for instance your documentation is in one repository, your GraphQL server in another, integration tests are in a third repository and then maybe the client libraries are in a fourth one. Now things become slightly more complicated, because you can’t change everything in a single pull request. You can’t review all the changes that belong to each other as a whole. QA also becomes interesting since you now have to figure out how to temporary bring things together for validation (usually involving lots of branches). And last but not least, you now probably have an integration order.

We made the conscious decision to put everything related to XING One into a single repository. The code, benchmarks, load tests, documentation, integration tests, utilities, you name it. This allowed us to get rid of a lot coordination ceremony. Especially the documentation benefited quite a lot from it, because we required it to be written or updated together with the actual code. We also changed the substructure of the source code quite a bit during the first year (when we pulled apart the XING One project into multiple sbt projects located in the same repository).

I found it tremendously helpful to look at all aspects of a change as a whole (code, unit tests, integration level tests, documentation, deployment, …) and I certainly can see why companies like Google take the idea to the max, by having everything inside a single repository.

In our company it wasn’t always a free lunch though. The small repository approach and assumptions about the root repository structure were hardcoded in different custom Docker and Kubernetes tools we’re using at the company, that we needed to patch in order to get everything we wanted to have properly working. We were willing to do that and pull through with it, but there definitely needed some extra work here and there to make our way work.

As I outlined earlier, I would do it again. Overall the benefits of the mono repository outweigh the costs for me.

Conclusion

Today I ranted a bit about agile and then talked in more depth about some more technical decisions and tweaks we did that allowed us to iterate fast and with confidence.

To recap:

It helps to accept that change is inevitable and part of the game and prepare for that
Choosing a refactoring friendly language stack and staying away from language features that make refactorings more difficult is one of those possible preparations that at least for us payed off big time
Using automated tests and decoupling them good enough from the actual implementation so that they’re fast, stable and a reliable safety-net is an often overlooked part of project setup. Don’t cargo cult. Find the level of tests that actually drive your project. In our case these are the in-memory, integration-like tests of the GraphQLEngine.
Don’t waste time on code format or style discussions, nowadays there are tools that automate these things away. Make good use of them
If you can, favor fewer bigger repositories over many smaller repositories: to spare yourself coordination cost, as well as to be able to change and review all touchpoints in one go

After ~1.5 years in production and a lot refactorings of different scope, we haven’t yet managed to introduce regressions through new features or refactorings. For me that’s one core element that allows you to be actually agile as a development team.

The next time, I’m going to take an interlude and will share our GraphQL learnings and observations.

See you next time around!!!

:wq

Dude, where’s my Kaizen?

I Spent Twenty Years in the Future

The language that connected us

The people I actually know

What worked, at least in my experience

The bubble I live in

What’s left

Who Owns Web 4.0?

Google’s Innovator’s Dilemma

Three Bets on Where the Platform Lives

Apple: the device controls the agent

OpenAI: hedging across two layers

Anthropic: the protocol is the platform

The Missing Layer: Discovery

Multi-Agent Architecture and Platform Power

Technology Sovereignty

Why an LLM platform is worse than Google dependency

The shift in European sentiment

Trust-model fragmentation

The EU’s position

What this means for the competitive picture

The Strongest Counterarguments

The engagement problem

The widget fallacy

What if the platform doesn’t consolidate?

What I Think Is Actually at Stake

The Supply Side of Web 4.0

Why Would Providers Join?

Three Forces Killing the SaaS Playbook

The Disruption Framework in Depth

What to Build Instead

The Solopreneur Wave

The Developer Transition

The Capital Picture

Platform Economics and Provider Incentives

What Comes Next

Why Web 4.0 Might Succeed Where Web 3 Failed

What Web 3 Got Wrong

The Problem Everyone Already Has

Voice, Vision, and Who Gets to Participate

Why This Time the Experience Is Different

The Trade-offs Are Real

Where It Doesn’t Reach

The Demand That Doesn’t Know It Exists

OpenClaw, the End of the SaaS Playbook, and the Arrival of Web 4.0

The Web 4.0 Frame

We’ve Been Here Before — And It Failed

MCP: HTTP for the Agent Era

Delegation, Not Point-and-Click

The Internet Splits — And the SaaS Playbook Ends

The Economics — Briefly

What I Think We’re Watching

On Boilerplate and the Death of Old Principles

The Cheap Rework Argument

But Here’s Where It Gets Interesting

And What About Krisztina’s Real Question?

Trust Is Good, Double Check Is Better

The Principle Evolves

On AI and Existential Fears

Drinking from the Firehose

The Amodei Prediction, Revisited

How This Technology Actually Works — And Why That Matters

The Dark Side: Where Expertise Actually Lives

So Is It Just a Fad? Can We Lean Back?

Will It Make Us Obsolete?

The Third Golden Age

What This Means for You

The game has changed

So why am I writing about this?

There is no such thing as too much transparency

Explicit more often than not wins over implicit

Generalists over specialists

Constant inspection, feedback and tinkering creates highly performing teams

You can sprint, but not forever

You’re measured by results only

Your expertise from yesterday is slowly but silently depleting

Embrace that we’re all on a way to somewhere

Closing thoughts

Our GraphQL learnings and observations

1. Expectations on GraphQL vary dramatically between web and native mobile developers