Het harnas, niet het model…

Twee handen houden gloeiende, geelgroene teugels van licht vast tegen een donkere achtergrond — De handen aan de teugels — het oordeel blijft bij de mensHands on the reins — the judgement stays with people

Agentic AI belooft de grootste productiviteitssprong sinds de komst van de computer. In de meeste organisaties blijft die sprong uit. Dit is waarom — en wat de bedrijven die het wél lukt anders doen.Agentic AI promises the greatest productivity leap since the arrival of the computer. In most organisations, that leap fails to materialise. Here is why — and what the companies that dó succeed do differently.

Wie weleens een AI-agent aan het werk heeft gezien, kent het dubbele gevoel. Eerst de verbazing. Het systeem leest de opdracht, schrijft een plan, opent bestanden, draait een test en corrigeert zijn eigen fout, alsof er een geroutineerde collega aan de knoppen zit. En dan, meestal ergens halverwege, de kanteling… De agent verliest de draad. Hij herhaalt zich, verzint een tussenstap die nergens op slaat, wordt slordiger naarmate hij langer doorwerkt. Dezelfde technologie die je een minuut eerder imponeerde, begint te zwalken.Anyone who has watched an AI agent at work knows the mixed feeling. First the amazement. The system reads the assignment, writes a plan, opens files, runs a test and corrects its own mistake, as if a seasoned colleague were at the controls. And then, usually somewhere halfway, the tipping point… The agent loses the thread. It repeats itself, invents an intermediate step that makes no sense, grows sloppier the longer it works. The same technology that impressed you a minute earlier starts to flounder.

In de softwareontwikkeling, waar agents het verst zijn doorgedrongen, heeft men voor de oplossing een woord gevonden: de agent moet in een harnas. Niet uit angst, maar omdat blijkt dat dit beter werkt. Zonder een strak omhulsel van afspraken, controles en grenzen levert hij eenvoudigweg niet wat hij belooft.In software development, where agents have penetrated furthest, a word has been found for the solution: the agent must be put in a harness. Not out of fear, but because it turns out to work better. Without a tight shell of agreements, controls and boundaries, it simply does not deliver what it promises.

Het verschil tussen een indrukwekkende demo en een systeem waar een bank, een verzekeraar of een accountantskantoor zijn naam aan verbindt, zit bijna nooit in het model. Het zit in het harnas eromheen.The difference between an impressive demo and a system to which a bank, an insurer or an accounting firm will attach its name almost never lies in the model. It lies in the harness around it.

De belofte en het uitblijvende rendementThe promise and the missing returns

Dat inzicht verklaart een getal dat vorig jaar door menige bestuurskamer spookte. Een veelgeciteerd MIT-onderzoek, The GenAI Divide, becijferde dat ongeveer 95 procent van de zakelijke AI-projecten geen meetbaar effect had op de winst-en-verliesrekening. Het bedrijfsleven had tientallen miljarden in de technologie gestoken; bij de overgrote meerderheid bleef het rendement onzichtbaar.That insight explains a number that haunted many a boardroom last year. A widely cited MIT study, The GenAI Divide, calculated that roughly 95 per cent of business AI projects had no measurable effect on the profit-and-loss account. Businesses had poured tens of billions into the technology; for the vast majority, the return remained invisible.

De onderzoekers legden de schuld nadrukkelijk niet bij de modellen, maar bij wat zij een „learning gap” noemden: het onvermogen om AI te verknopen met de echte processen, data en beslissingen van een organisatie. Met dat exacte percentage mag je voorzichtig zijn, want de auteurs erkennen zelf dat hun definitie van succes streng is en dat niet alles uit harde cijfers komt. Het patroon is niettemin te robuust om weg te wuiven en ander onderzoek bevestigt het. Gartner verwacht dat ruim 40 procent van de agentic-AI-projecten vóór 2028 sneuvelt en waarschuwt voor „agent washing”: oude chatbots die in een nieuw jasje als agent worden verkocht. McKinsey zag dat het gros van de bedrijven inmiddels met agents experimenteert, terwijl vrijwel niemand ze al op schaal draait.The researchers emphatically did not blame the models, but what they called a ‘learning gap’: the inability to tie AI into an organisation’s real processes, data and decisions. One should be careful with that exact percentage, as the authors themselves acknowledge that their definition of success is strict and that not everything comes from hard figures. The pattern is nonetheless too robust to wave away, and other research confirms it. Gartner expects over 40 per cent of agentic AI projects to be cancelled before 2028 and warns of ‘agent washing’: old chatbots sold as agents in new packaging. McKinsey found that most companies are now experimenting with agents, while almost nobody runs them at scale yet.

De technologie deugt dus wel degelijk. Ze werkt alleen zelden vanzelf. De waarde zit gevangen in het harnas, en wie dat niet bouwt, houdt een dure proeftuin over.So the technology is sound. It just rarely works by itself. The value is locked inside the harness, and those who fail to build it are left with an expensive playground.

Wat een harnas werkelijk isWhat a harness really is

Wat is dat harnas dan, concreet? Het begint met scherpte. Een agent die de opdracht „analyseer dit dossier” krijgt, dwaalt af; een agent die precies te horen krijgt welke stappen hij zet, welke uitzonderingen tellen, welk bewijs hij nodig heeft en wanneer hij klaar is, levert werk. Het harnas voedt hem alleen met de context die ertoe doet, want te veel informatie geeft ruis en te weinig geeft giswerk. Het zet om elk gereedschap een hek: deze database mag hij lezen maar niet wijzigen, dit systeem raadplegen maar er geen betalingen in klaarzetten. Het dwingt hem zijn eigen werk te controleren, zijn bronnen te tonen en zijn twijfels te benoemen. Het plaatst een mens op de punten waar geld, recht of reputatie op het spel staan. Het legt alles vast, zodat achteraf valt te reconstrueren wie wat heeft goedgekeurd. En het houdt de kosten in toom, want een agent die ongeremd tokens en tools verbruikt, wordt al snel duurder dan de medewerker die hij moest ontlasten.What, then, is that harness in concrete terms? It starts with precision. An agent given the assignment ‘analyse this file’ wanders off; an agent told exactly which steps to take, which exceptions matter, what evidence it needs and when it is done, delivers work. The harness feeds it only the context that matters, because too much information produces noise and too little produces guesswork. It puts a fence around every tool: this database it may read but not modify, that system it may consult but not queue payments in. It forces the agent to check its own work, show its sources and name its doubts. It places a person at the points where money, law or reputation are at stake. It records everything, so that afterwards it can be reconstructed who approved what. And it keeps costs in check, because an agent that burns tokens and tools without restraint soon becomes more expensive than the employee it was meant to relieve.

Dat dit harnas zwaarder weegt dan het model, is geen theorie. Begin 2026 liet het team achter LangChain zien dat het zijn coding agent op een gangbare benchmark van de dertigste naar de vijfde plaats kon tillen, zonder ook maar iets aan het onderliggende model te veranderen. De hele sprong kwam uit het harnas eromheen: betere zelfverificatie, scherpere tracing, slimmer omgaan met context. De motor was dezelfde; ze hadden er een betere auto omheen gebouwd.That this harness weighs more heavily than the model is not a theory. In early 2026 the team behind LangChain showed it could lift its coding agent from thirtieth to fifth place on a common benchmark without changing anything about the underlying model. The entire leap came from the harness around it: better self-verification, sharper tracing, smarter handling of context. The engine was the same; they had built a better car around it.

En hier neemt het verhaal een wending die veel bestuurders verrast. Want zo’n harnas laat zich niet door een willekeurige programmeur in elkaar zetten. Iemand moet weten welke uitzondering in de praktijk het verschil maakt, welke data je kunt vertrouwen en welke niet, waarom een regel ooit is bedacht, welke fout je door de vingers kunt zien en welke meteen moet escaleren. Dat is geen kennis die in een handboek staat. Het is de diepgewortelde routine van de ervaren professional, en juist die routine moet eerst expliciet worden gemaakt voordat een agent er iets mee kan.And here the story takes a turn that surprises many executives. Because such a harness cannot be assembled by just any programmer. Someone has to know which exception makes the difference in practice, which data can be trusted and which cannot, why a rule was once devised, which mistake can be overlooked and which must escalate immediately. That is not knowledge found in a handbook. It is the deep-rooted routine of the experienced professional, and it is precisely that routine which must first be made explicit before an agent can do anything with it.

Daarmee staat de populaire framing op zijn kop. Het bedrijf dat AI vooral ziet als een manier om mensen kwijt te raken, heeft het bij het verkeerde eind. De eerste echte productiviteitssprong ontstaat niet door de ervaren kracht te vervangen, maar door wat hij in zijn hoofd heeft te vertalen naar protocollen, controles en beslisregels. Pas dan wordt zijn oordeel schaalbaar. De agent neemt dat oordeel niet over. Hij voert het werk uit; het oordeel blijft bij de mens.This turns the popular framing on its head. The company that sees AI primarily as a way to shed people has it wrong. The first real productivity leap comes not from replacing the experienced professional, but from translating what is in his head into protocols, controls and decision rules. Only then does his judgement become scalable. The agent does not take over that judgement. It carries out the work; the judgement stays with the person.

Het onderzoek wijst dezelfde kant op, en corrigeert en passant een hardnekkig misverstand. Bij de vorige generatie hulpmiddelen, de slimme autocomplete, profiteerden vooral de junioren: een bekend experiment met GitHub Copilot zag ontwikkelaars een afgebakende taak 55,8 procent sneller afronden, met de grootste winst onderaan. Bij echte agents draait dat om. Een analyse van data uit de programmeeromgeving Cursor laat zien dat agents het werk verschuiven van uitvoeren naar toezicht houden en dat juist de ervaren professional daarvan profiteert: hij plant beter, delegeert beter en ziet eerder wanneer de uitkomst niet klopt. Generatieve AI maakt mensen sneller en helpt hen buiten hun eigen vakgebied, vond Harvard Business School, maar van een leek maakt ze geen expert. De kloof wordt kleiner. Weg gaat hij niet.The research points the same way, and corrects a stubborn misunderstanding in passing. With the previous generation of tools, the smart autocomplete, it was mainly the juniors who benefited: a well-known experiment with GitHub Copilot saw developers finish a well-defined task 55.8 per cent faster, with the biggest gains at the bottom. With real agents, that reverses. An analysis of data from the Cursor programming environment shows that agents shift the work from executing to supervising, and that it is precisely the experienced professional who benefits: he plans better, delegates better and spots sooner when the outcome is off. Generative AI makes people faster and helps them outside their own field, Harvard Business School found, but it does not turn a layman into an expert. The gap narrows. It does not disappear.

De keerzijde: wie kweekt de volgende generatie?The flip side: who raises the next generation?

Precies in dat succes schuilt een probleem dat verder reikt dan het volgende kwartaal. Want hoe komen we eigenlijk aan ervaren professionals? We kweken ze, jarenlang, uit junioren. En een junior leert het vak doorgaans via het werk dat saai is en repetitief: de dossiers nalopen, de uitzonderingen tegenkomen, de systemen leren kennen, langzaam begrijpen waarom de regels zijn zoals ze zijn. Laat dat nou precies het werk zijn dat zich zo goed laat automatiseren.Precisely in that success lurks a problem that reaches beyond the next quarter. Because where do experienced professionals actually come from? We grow them, over years, from juniors. And a junior typically learns the trade through the work that is dull and repetitive: going through the files, encountering the exceptions, getting to know the systems, slowly understanding why the rules are the way they are. And that happens to be exactly the work that automates so well.

De eerste signalen op de arbeidsmarkt zijn voorzichtig, maar ze wijzen één kant op. Een veelbesproken Stanford-studie, Canaries in the Coal Mine?, vond dat Amerikaanse twintigers in sterk door AI geraakte beroepen sinds eind 2022 terrein verloren, terwijl hun oudere collega’s in dezelfde functies juist groeiden. Onderzoek over 62 miljoen cv’s spreekt van seniority-biased technological change: de klap valt onderaan. Hard te bewijzen is het niet, er waren immers ook rentestijgingen en correcties na jaren van overmatig aannemen, en wie eerlijk is doet niet alsof het vonnis al geveld is. Maar het risico laat zich helder formuleren. Wie het instapwerk wegautomatiseert zonder er een nieuw leerpad voor terug te bouwen, droogt zijn eigen kweekvijver op. Over tien jaar zijn er dan geen senioren meer om het harnas te bouwen of te onderhouden.The first signals on the labour market are tentative, but they point one way. A much-discussed Stanford study, Canaries in the Coal Mine?, found that American twenty-somethings in occupations heavily affected by AI have lost ground since late 2022, while their older colleagues in the same roles actually grew. Research across 62 million CVs speaks of seniority-biased technological change: the blow lands at the bottom. It is hard to prove — there were also interest-rate rises and corrections after years of over-hiring, and anyone being honest does not pretend the verdict is already in. But the risk is easy to formulate. Whoever automates away the entry-level work without building a new learning path in its place dries up his own talent pool. Ten years from now there will be no seniors left to build or maintain the harness.

De remedie is niet ingewikkeld, alleen ongemakkelijk, omdat ze tegen de reflex van de korte termijn ingaat. Laat junioren de output van agents beoordelen in plaats van het werk zelf te doen. Bouw bewust momenten in waarop ze leren herkennen wat goed is en wat niet. En beloon de senior niet alleen voor zijn snelheid, maar ook voor de moeite om zijn kennis overdraagbaar te maken.The remedy is not complicated, only uncomfortable, because it goes against the short-term reflex. Have juniors assess the output of agents instead of doing the work themselves. Deliberately build in moments where they learn to recognise what is good and what is not. And reward the senior not only for his speed, but also for the effort of making his knowledge transferable.

Wanneer iedereen hetzelfde model heeftWhen everyone has the same model

Dat de mensen het echte kapitaal vormen, wordt nog scherper als je bedenkt wat er intussen met de modellen zelf gebeurt. Ze veranderen in een grondstof. Ze worden krachtiger, goedkoper en breder beschikbaar; je huurt ze per verwerkte tekst en je concurrent huurt morgen hetzelfde. Wat vandaag nog als voorsprong voelt, de toegang tot het beste model, is overmorgen een nutsvoorziening. En hier zit het venijn: hoe beter de modellen worden, hoe minder ze je onderscheiden. De voorsprong verhuist naar de laag eromheen.That people are the real capital becomes even sharper when you consider what is happening to the models themselves in the meantime. They are turning into a commodity. They grow more powerful, cheaper and more widely available; you rent them per processed text and your competitor rents the same one tomorrow. What still feels like an edge today — access to the best model — is a utility the day after tomorrow. And here is the sting: the better the models get, the less they distinguish you. The edge moves to the layer around them.

Want die laag is wél van jou. De data die alleen jij hebt, de processen die je moeizaam hebt uitgevochten, het vakmanschap dat in de loop der jaren is opgebouwd en het harnas waarin dat alles is vastgelegd. Zoiets laat zich niet downloaden. Twee organisaties die hetzelfde model huren, kunnen even ver uiteenlopen als twee orkesten met dezelfde partituur: niet de noten maken het verschil, maar wie ze speelt.Because that layer ís yours. The data only you have, the processes you fought hard to establish, the craftsmanship built up over the years and the harness in which all of it is captured. That cannot be downloaded. Two organisations renting the same model can diverge as far as two orchestras with the same score: it is not the notes that make the difference, but who plays them.

Het roept een ongemakkelijke vraag op. En niet alleen voor bedrijven. Als de intelligentie zelf te huur is, wat bezit je dan nog?It raises an uncomfortable question. And not only for companies. If intelligence itself is for rent, what do you still own?

Europa: reguleren en bouwenEurope: regulating and building

Voor Europa krijgt dit alles nog een extra lading. Het continent heeft sterke instituties en een volwassen omgang met privacy en grondrechten. Dit heeft gezorgd voor behoorlijk wat regelgeving op dit punt. Het rapport-Draghi maakte in 2024 pijnlijk duidelijk hoe ver Europa achterloopt op de Verenigde Staten en China — in frontier-modellen, in rekenkracht, in kapitaal en in schaal — en becijferde dat er jaarlijks zo’n 750 tot 800 miljard euro extra nodig is om het bredere gat te dichten. Brussel probeert bij te sturen met het AI Continent Action Plan en met InvestAI, goed voor zo’n 200 miljard euro en plannen voor eigen AI-fabrieken en rekenkracht.For Europe, all of this carries an extra charge. The continent has strong institutions and a mature approach to privacy and fundamental rights. This has produced quite a bit of regulation on the subject. The Draghi report made painfully clear in 2024 how far Europe lags behind the United States and China — in frontier models, in compute, in capital and in scale — and calculated that some €750 to €800 billion extra is needed annually to close the wider gap. Brussels is trying to steer with the AI Continent Action Plan and with InvestAI, worth some €200 billion, plus plans for its own AI factories and computing power.

De reflex om alles op de regelgeving te schuiven is begrijpelijk, maar ook misleidend. De AVG en de AI Act stellen eisen, zeker, maar de diepere zwakte in Europa zit in versnippering, traag kapitaal en de afhankelijkheid van buitenlandse infrastructuur.The reflex to blame everything on regulation is understandable, but also misleading. The GDPR and the AI Act impose requirements, certainly, but Europe’s deeper weakness lies in fragmentation, slow capital and dependence on foreign infrastructure.

Hoe dan ook, de Europese aanpak van betrouwbare AI die grondrechten respecteert, is een gezond uitgangspunt. Het daadwerkelijk doorpakken op het Action Plan en InvestAI zal moeten uitwijzen of het bij woorden blijft.Either way, the European approach of trustworthy AI that respects fundamental rights is a healthy starting point. Actually following through on the Action Plan and InvestAI will have to show whether it remains words.

Wat nu?What now?

Voor jouw organisatie is het einde van de vrijblijvende pilots bereikt. Wie serieus met agentic AI aan de slag wil, begint niet bij de technologie maar bij een proces dat echt knelt, wijst er een eigenaar en een domeinexpert voor aan, beperkt de autonomie van de agent met opzet, bouwt de controleerbaarheid vanaf het begin in en ontwerpt meteen het leerpad voor de mensen die het straks moeten overnemen. Geen van die stappen is spectaculair. Samen vormen ze het verschil tussen een marketingverhaal en een resultaat.For your organisation, the era of non-committal pilots is over. Anyone serious about agentic AI starts not with the technology but with a process that genuinely pinches, appoints an owner and a domain expert for it, deliberately limits the agent’s autonomy, builds in verifiability from the start and immediately designs the learning path for the people who will have to take over later. None of these steps is spectacular. Together they make the difference between a marketing story and a result.

Keren we terug naar die agent die halverwege de draad kwijtraakt. Het is verleidelijk om in hem de toekomst te zien, of juist het bewijs dat het allemaal tegenvalt. Geen van beide klopt. Het is een krachtige, grillige kracht die pas waarde levert binnen de grenzen die een mens hem stelt. De modellen worden beter, de grilligheid neemt af, en toch verandert dat de kern niet. Het verschil tussen een dure proeftuin en een echte transformatie is geen technische kwestie. Het is een kwestie van ervaring, domeinkennis, oordeel en de wijsheid om te blijven investeren in de mensen die dat oordeel leveren.Let us return to that agent losing the thread halfway. It is tempting to see in it the future, or precisely the proof that it is all a disappointment. Neither is right. It is a powerful, erratic force that only delivers value within the boundaries a person sets for it. The models get better, the erraticness diminishes, and yet that does not change the core. The difference between an expensive playground and a real transformation is not a technical matter. It is a matter of experience, domain knowledge, judgement and the wisdom to keep investing in the people who supply that judgement.

Het harnas is niet de bijzaak rond de businesscase. Het harnas ís de businesscase.The harness is not a footnote to the business case. The harness ís the business case.

Bronnen en aanknopingspuntenSources and starting points

MIT NANDA — The GenAI Divide: State of AI in Business 2025
Gartner — Over 40% of Agentic AI Projects Will Be Canceled by End of 2027
McKinsey — The State of AI: Global Survey 2025 enand Building the Foundations for Agentic AI at Scale
LangChain — Improving Deep Agents with Harness Engineering; Faros.ai — Harness Engineering
Suproteem K. Sarkar — AI Agents, Productivity, and Higher-Order Thinking (SSRN)
MIT Sloan — onderzoek naar GitHub Copilot en ontwikkelaarsproductiviteitresearch into GitHub Copilot and developer productivity
Harvard Business School / HBR — onderzoek naar generatieve AI en expertiseresearch into generative AI and expertise
Stanford Digital Economy Lab — Canaries in the Coal Mine?; Generative AI as Seniority-Biased Technological Change (SSRN)
Carnegie Endowment — The EU’s AI Power Play: Between Deregulation and Innovation
Europese CommissieEuropean Commission — AI Continent Action Plan enand InvestAI; Mario Draghi — The Future of European Competitiveness

Cijfers en voorbeelden zijn ontleend aan publiek beschikbare bronnen; waar onderzoeken elkaar tegenspreken of zijn bekritiseerd, is dat in de tekst aangegeven.Figures and examples are drawn from publicly available sources; where studies contradict each other or have been criticised, this is indicated in the text.

Eerder verschenen op roibot.cloud, 29 mei 2026.Previously published on roibot.cloud, 29 May 2026.