Image: Getty
This is the third of six articles that Chris Brown and I are publishing to offer a framework for planning, scaling and operationalising claims transformation leveraqing AI. The first two are linked here: -
‘Fools rush in where angels fear to tread'. This article aims to keep you on the side of Agentic AI governance angels and innovation disciples.
Everyone in insurance is talking about agentic AI. The problem is, almost nobody agrees on what "agentic" actually means. And in a regulated industry where governance frameworks depend on understanding exactly what your systems are doing, that definitional confusion isn't academic. It's dangerous.
I've built AI systems for insurance claims. I've watched vendors label every AI capability they sell as "agentic." And I've seen the confusion this creates when someone asks the question that matters: who's accountable when the system gets it wrong?
The answer depends entirely on what kind of system you've built. Currently, a single term is used to describe fundamentally different architectures. I’ll discuss the outcome of using one word in the following sections: -
- The Definitional Mess
- Why the Technical Distinction Matters
- What true Agency looks like in Claims
- The Honest Assessment
- Why Orchestration might be right for insurance
- The Governance Risk
1. The Definitional Mess
The term "agentic AI" went from obscurity to ubiquity in roughly twelve months. Google Trends data indicate that interest was minimal until April 2024, after which it spiked and peaked in mid-2025 [1]. Gartner named it the top technology trend for 2025. McKinsey called it "the next frontier." Every major vendor, from Salesforce to Microsoft, rushed to rebrand their AI offerings with the term "agentic."
But here's the problem. The people building these systems don't agree on what the word means.
Andrew Ng, who popularised the term more than anyone, describes agentic AI through four design patterns:
- Reflection (AI critiques its own output)
- Tool use (AI calls external APIs and services)
- Planning (AI breaks complex tasks into steps)
- Multi-agent collaboration (multiple AI systems work together )[2].
Under Ng's framing, any system that iterates on its output or selects tools could be called agentic. It's a broad definition that encompasses a wide spectrum of sophistication.
IBM defines it more sharply as a system "capable of autonomously performing tasks by designing its workflow and using available tools" [3]. The emphasis on designing its own workflow is a significant qualifier that narrows the field considerably.
MIT CSAIL distinguishes agentic AI from generative AI in four ways: it's focused on action rather than content generation, it doesn't rely on prompts, it carries out tasks independently, and it adapts dynamically to its environment [4]. That's a substantially higher bar than Ng's patterns.
But the clearest distinction comes from Anthropic, the company behind Claude. In their influential "Building Effective Agents" essay, they draw an architectural line that the rest of the industry would benefit from adopting. Workflows, they argue, are systems where LLMs and tools are orchestrated through predefined code paths. Agents are systems in which LLMs dynamically direct their own processes and tool use, maintaining control over how they accomplish tasks [5]. In a subsequent piece on context engineering, they distilled it further: agents are "LLMs autonomously using tools in a loop" [6].
That's a crisp, technical distinction. And it matters enormously for insurance.
2. Why the Technical Distinction Matters
In a workflow, a developer designs the sequence. Step one: extract claim data. Step two: match against policy wording. Step three: assess coverage. Step four: generate a recommendation. The LLM does the heavy lifting at each stage, but the path is predetermined. You know what the system will attempt, in what order, with what data.
In an agent, the system decides the sequence. Given the goal "determine whether this claim is covered," it chooses whether to extract the policy first, clarify the claim description, check for fraud indicators, or go straight to coverage analysis. It selects tools based on what it encounters. If it encounters ambiguity, it may invoke a clarification step that is not part of any predefined pipeline. If it detects something suspicious, it might pivot to fraud analysis without being told to.
The governance implications are fundamentally different.
For a workflow, you can audit each step. You can test each stage independently. You can assign accountability to specific decision points. You can explain to a regulator exactly how a coverage determination was reached, because you designed the path it followed. When the Financial Conduct Authority asks how you arrived at a decision, you can trace it through a known sequence with defined inputs and outputs.
For an agent, the system chose its own path. It might have taken a different route for a different claim with similar characteristics. The decision process isn't just the output of each stage; it's the agent's reasoning about which stages to invoke and in what order. Testing becomes more difficult as the state space increases. Accountability becomes murkier because the "decision" isn't only the coverage determination but also the decision about how to approach it.
In regulated industries, this isn't a subtle difference. It's the difference between a system you can defend to a regulator and one you can't.
3. What True Agentic Would Look Like in Claims
True agentic AI in claims would mean giving a system a goal, “determine whether this claim should be paid", and letting it figure out how.
It would review the claim and determine whether additional information from the claimant is required before it can be assessed. Is the policy wording ambiguous enough to require deeper analysis? Are there fraud indicators that should change my approach? Is this straightforward enough for me to determine coverage directly, or complex enough that I should break it into sub-problems?
It would select tools dynamically. For a simple gadget claim, it might skip fraud analysis entirely and go straight to coverage matching. For a suspicious motor claim, fraud detection may be invoked before coverage determination. For a complex commercial claim, it might decompose the problem into multiple coverage questions and address each independently.
This is genuinely powerful. And genuinely difficult to govern.
Because now the questions multiply. Why did the system skip fraud analysis on this claim? Why did it choose to clarify with the claimant rather than proceeding with the information available? Why did it decompose this commercial claim into three coverage questions rather than four? Each of these meta-decisions shapes the outcome, and each needs to be explainable.
The Cloud Security Alliance published an autonomy levels framework for agentic AI in January 2026, deliberately echoing the SAE levels for autonomous vehicles [8]. It's an emerging model, but the analogy is instructive. Most insurance AI systems today are at Level 0 or 1: they provide information or recommendations, but humans make all decisions or explicitly approve each action. True agentic systems would be Level 3 or above: operating autonomously within defined boundaries, with humans intervening only for exceptions.
The Financial Times drew the same comparison, noting that most current AI agent applications sit at Level 2 or 3, with only highly specialised systems achieving Level 4 in constrained environments [9]. The gap between where the industry is and where the marketing suggests it remains substantial.
4. The Honest Assessment
Here's what I think the insurance industry is doing, and what it should call it.
Most of what's being sold as "agentic AI" in insurance is orchestrated workflows with LLMs at each stage. Document extraction, correspondence drafting, triage suggestions, chatbots, these are tools and augmentations. Valuable, worth deploying, but not agentic by any meaningful definition.
Some more sophisticated implementations are orchestrated pipelines: classification, extraction, validation, and recommendation stages with LLMs doing the work and human handlers making final decisions. Under Ng's broad definition, it's agentic. Under Anthropic's sharper definition, it's a workflow. Either way, it's governable, auditable, and defensible.
Very little in insurance is truly agentic in the strict sense. Systems that autonomously determine their own approach to resolving a claim, selecting tools and sequencing actions based on what they encounter, with human oversight only at defined checkpoints. The governance infrastructure for this does not yet exist at most insurers.
And that's not necessarily a problem.
5. Why Orchestration Might Be Right for Insurance
There's a certain hubris in assuming you need full autonomy. Orchestrated workflows with LLMs can capture most of the value that "agentic AI" promises, while remaining auditable, testable, and defensible.
A well-designed pipeline that handles intake, classifies claims, extracts policy terms, identifies coverage issues, and presents recommendations to handlers doesn't need to decide its own approach. The approach is designed by people who understand the claims process, the regulatory requirements, and the edge cases. The LLMs do the heavy lifting on natural language understanding, document processing, and pattern matching. Humans apply judgment where it matters.
This isn't a compromise. It's an architecture appropriate to the context.
The academic literature bears this out. A systematic review of 143 studies on agentic AI systems found that autonomy, adaptability, and goal-driven reasoning are the defining characteristics that distinguish agentic systems from traditional AI and standard automation [11]. Most insurance applications don't require adaptability or goal-driven reasoning. They require sophisticated processing within defined workflows. That's a different problem with a different solution.
One AI researcher put it well: "We don't need agents for everything. Some automations are simple and effective"[10]. The pressure to rebrand everything as agentic because it's the current buzzword leads to architectural decisions driven by marketing rather than requirements.
6. The Governance Risk
The real risk isn't that insurance deploys orchestrated workflows and calls them agentic. The marketing inflation is irritating but manageable.
The real risk is that someone deploys a genuinely agentic system, one that chooses its own approach to claims resolution, without the governance infrastructure to support it. Because the governance requirements for orchestrated workflows and autonomous agents are fundamentally different.
For an orchestrated workflow, you need quality assurance through sampling, monitoring for drift, human oversight at defined decision points, and audit trails that trace through known stages. This is challenging but understandable. It extends naturally from existing claims governance.
For an autonomous agent, you also need to govern the meta-decisions: how the system chose its approach, why it selected certain tools over others, and what reasoning led it to skip or invoke specific steps. You need to test a much larger state space because the system's behaviour isn't constrained to predefined paths. You need accountability models that address not only wrong decisions but also wrong approaches to reaching them.
Anthropic themselves acknowledge this tension. In their framework for responsible agent development, they note that a central challenge is balancing agent autonomy (which makes agents valuable) with human control (which makes them safe). An agent asked to "organise my files" might autonomously delete what it considers duplicates and restructure everything, going far beyond what was intended [12]. In claims, the equivalent would be an agent that autonomously decides a claim doesn't require fraud screening when the claim characteristics warrant it.
The FCA's current position is principles-based. They've explicitly stated that they won't introduce AI-specific rules, applying existing frameworks such as Consumer Duty and SM&CR instead [13]. This means insurers are responsible for determining the appropriate governance for their AI systems. If you've deployed an orchestrated workflow and can trace every decision through a designed pipeline, you're on solid ground. If you've deployed an autonomous agent and can't explain why it took a particular approach to a particular claim, you have a problem.
Avoid risk - Get Precise about What You're Building
If I could give one piece of advice to insurers evaluating AI strategies, it would be this: before you buy, build, or deploy anything labelled "agentic," ask which kind of system you're getting.
Is the vendor selling you a workflow in which LLMs perform sophisticated tasks within a predefined pipeline? That's valuable. The governance is manageable. Deploy it where it makes sense.
Are they selling you an agent, a system that determines its own approach to achieving a goal? That's potentially more powerful but fundamentally harder to govern. Ensure your governance infrastructure aligns with the architecture.
Or are they selling you augmentation tools, document extraction, chatbots, correspondence drafting, rebadged as "agentic" because it's the current buzzword? That's fine too. Just don't build governance for autonomous agents when you're deploying a chatbot.
The terminology matters because the governance follows the architecture. Get the architecture wrong and you'll either over-invest in governance you don't need or under-invest in governance you desperately do.
In a regulated industry where decisions are challenged, audited, litigated, and reviewed by ombudsmen, knowing exactly what your system is doing and being able to explain it isn't optional. It's the foundation on which everything else is built.
Next in series: The Production Gap — why impressive demos don't translate to production deployment in regulated industries.
Chris Brown is a fractional CTO and enterprise architect with 28 years of experience across insurance, emergency services, and enterprise software. He works with insurers, SaaS companies, and investors through The Build Paradox.
To discuss how he can help your organisation 👉 www.buildparadox.com
References
- Bandi et al. report Google Trends search interest for “agentic AI”, noting it was minimal until April 2024 and peaked in July 2025 (see their Figure 2: “Google Trends shows the popularity of agentic AI”) 2025.
- Andrew Ng, keynote at Sequoia AI Ascent 2024, describing four agentic design patterns: reflection, tool use, planning, and multi-agent collaboration. Summarised in multiple sources including Octet Consulting analysis (June 2024).
- IBM, "Agentic AI: 4 reasons why it's the next big thing in AI research" (November 2025).
- MIT CSAIL Alliances, "Agentic AI: What you need to know about AI agents" (2025).
- Anthropic, "Building Effective Agents" (December 2024).
- Anthropic Applied AI team, "Effective context engineering for AI agents" (July 2025).
- Innovate UK project reference 10093983, "Utilising AI models to optimise the FNOL claims journey and improve customer experience." UK Research and Innovation Gateway to Research.
- Cloud Security Alliance, "Autonomy Levels for Agentic AI" (January 2026).
- Lucy Colback (Financial Times), "AI agents: from co-pilot to autopilot (2025).
- Jeff Kramer (Deloitte), quoted in InformationWeek, "Charting the path to the autonomous enterprise" (November 2025).
- Bandi et al. (2025) "The Rise of Agentic AI: A Review of Definitions, Frameworks, Architectures, Applications, Evaluation Metrics, and Challenges," Future Internet 17(9), MDPI (September 2025).
- Anthropic, "Our framework for developing safe and trustworthy agents" (2025).
- FCA, "AI and the FCA: our approach" (September 2025).
AI-driven claims processing uses machine learning, natural language processing (NLP), computer vision, and other automation technologies to handle large parts of the claims lifecycle. These tools work by: Collecting and interpreting claim submissions across channels Extracting information from structured and unstructured documents Validating data against policies and rules Detecting patterns that may indicate fraud Routing straightforward claims for automatic resolution and escalating others for human review Automated claims processing does not necessarily eliminate humans from the process. Instead, it augments human adjusters by handling repetitive, high-volume work, allowing specialists to focus on complex decisions and customer care.
unknownx500