Your Data Platform Is Probably Not Ready for AI Agents

This article discusses the analytics-agent impedance in the context of healthcare payer systems, where I currently spend most of my professional time. It starts from a simple but increasingly important problem: many organizations are trying to operationalize AI agents on top of data environments built for reporting, not reasoning. In payer environments, that mismatch becomes especially visible because decisions depend on sequence, policy, evidence, and workflow context, not just access to records. The result is a growing gap between what enterprise data platforms are designed to do and what trustworthy agents actually require in production. That gap deserves far more attention than it currently gets.
The data problem behind AI agent misfires
Everyone wants AI agents.
Leaders want agents that can answer questions, synthesize context, recommend next steps, draft communications, and eventually take action inside real business workflows. In healthcare, insurance, banking, and other complex industries, the aspiration is especially powerful: put intelligent systems closer to operational work and dramatically improve speed, consistency, and scale.
So organizations do what seems logical. They centralize data in a warehouse or lakehouse. They stand up a vector database. They index documents. They connect an LLM. They run a few promising demos.
And then reality sets in.
The answers are plausible, but not reliable. The agent retrieves the wrong version of a document. It misses a key event in the case lifecycle. It cannot explain where an answer came from. It sees too much data in one context and too little in another. It sounds smart, but it does not behave like something you would trust in production.
This is the moment when many teams realize they do not really have a model problem. They have a data problem.
More specifically, they have this problem: analytics-ready data is not the same thing as agent-ready data.
That distinction is not discussed nearly enough.
Most enterprise data platforms were built for transactions, reporting, dashboards, regulatory extracts, and batch integrations. Those are important jobs but agents need something different. They need data that preserves meaning, sequence, source, policy, and context well enough to support reasoning and action at runtime.
If your organization is serious about AI agents, this is the uncomfortable truth: your data platform is probably not ready.
The data layer most enterprises have
Most enterprises have spent years building data platforms that serve humans well.
They have operational systems that run the business. They have pipelines that move data into warehouses or lakehouses. They have dimensional models for reporting, dashboards for executives, and curated data sets for analytics teams. In more advanced organizations, they may also have a knowledge layer, a search capability, and a set of governed APIs.
That is all useful. None of it is wasted, but it is built around a different assumption: that a human will interpret the output.
A report can summarize. A dashboard can abstract. A lakehouse can consolidate. Humans are good at filling in gaps, reconciling inconsistencies, and applying unstated business knowledge. If a report doesn’t tell the full story, an experienced operator can often infer what matters from context.
Agents cannot do that safely unless the platform explicitly provides that context.
An agent working inside a real workflow needs more than access to data. It needs to know what a thing is, what state it is in, how it got there, what rules apply, what it is allowed to see, and whether the evidence behind the answer is trustworthy.
That is a very different bar.
Why AI-ready data is not analytics-ready data
A traditional analytics platform is optimized for questions like these:
- How many cases were opened last month?
- What is the average turnaround time by region?
- Which providers have the highest authorization volume
- What are the trends in appeals over time?
Those are excellent business questions. But an agent is often asked something else entirely:
- What is happening in this case right now?
- What changed in the last 48 hours?
- What evidence supports the current status?
- What policy applies here?
- What is the next valid action?
- Am I allowed to see or use this information in this context?
That is not just analytics. That is runtime reasoning.
The difference matters because many enterprise data platforms flatten, delay, or strip away exactly the things agents need most. They preserve current state but not the full event history. They centralize data but not business meaning. They expose records but not provenance. They manage access for users and applications, but not dynamically for machine actors operating within specific workflow boundaries.
So, teams try to compensate with retrieval. They add RAG, they index documents, and they vectorize everything in sight. Sometimes that helps, but more often than not, it just makes the underlying problem more visible.
What agent-ready data actually requires
If you want to prepare enterprise data for AI agents, it helps to stop thinking only about storage and start thinking about runtime context. In practice, agent-ready data has five essential properties.
1. Semantic consistency
Agents need clear and stable business meaning, not just raw access to fields and tables. A business analyst may know, for instance, that status code “7” means a case is awaiting external review, and that this is different from a nurse review or a pending authorization. An agent will not know that unless the meaning is explicitly defined, nor should it be expected to guess.
When business context is vague or inconsistently defined, agents are far more likely to make poor decisions. That is why core concepts must be modeled explicitly. What is an authorization? How is it different from a referral? Which statuses are terminal? Which state transitions are valid? How do benefits, policies, and correspondence fit into the workflow?
If the enterprise itself has not clearly defined these semantics, the agent will improvise. That is where brittle, unreliable behavior begins.
2. Provenance and traceability
An enterprise agent must be able to answer a simple but critical question: How do you know that?
That answer has to be grounded in source systems, timestamps, documents, workflow events, and transformation lineage. If an agent summarizes a member’s case history, drafts a response, or recommends a next step, users need to be able to trace the answer back to evidence.
This is not just a nice feature for debugging. In regulated environments, it is foundational to trust. Without provenance, AI outputs become difficult to validate, difficult to govern, and nearly impossible to operationalize responsibly.
3. Event history
Most enterprise systems are very good at exposing current state. Agents often need the sequence of change.
A care plan is not just its latest version. An authorization is not just approved or denied. A benefit is not just active or inactive. What matters is often the lifecycle: what happened, in what order, by whom, under what policy, with what supporting evidence.
Reasoning depends on sequence. Was a document received before or after the determination? Was the authorization reopened? Was the member contacted? Was the benefit exception applied before the appeal?
If your platform loses the thread of the process, your agent loses the thread of the business.
4. Policy-aware access
This is where many AI architectures quietly break.
Enterprise data access is rarely universal. It depends on role, workflow, purpose, sensitivity, and organizational boundaries. In healthcare, that complexity intensifies quickly. Not everyone should see everything. Not every use of data is permissible in every context. The same agent may need different access depending on who invoked it and what task it is performing.
An agent-ready platform must evaluate access permissions in real time, not simply rely on controls at the application boundary. A smart agent with weak policy controls is not enterprise-ready. It is a liability.
5. Context packaging for retrieval and action
This may be the most overlooked requirement of all.
Enterprises often assume that once documents and records are searchable, the agent has what it needs. But raw retrieval is not the same as usable working context.
Agents perform better when the platform assembles a coherent, bounded package of relevant context: current state, recent event history, related documents, controlling policies, source evidence, and the permissions associated with the task at hand.
That packaging layer is often missing. The result is an agent that can retrieve fragments but cannot reliably assemble understanding.
Why naive RAG underperforms in the enterprise
A lot of enterprise AI still follows a simple pattern: connect an LLM to internal content, add retrieval, and hope the model can bridge the gaps.
The problem is that messy enterprise data does not become coherent just because it’s indexed.
If the source environment is filled with duplicate records, inconsistent terminology, outdated documents, incomplete histories, conflicting sources, and poor metadata, RAG often makes the confusion worse. The model may retrieve information that is relevant but not authoritative. It may surface a policy excerpt while missing its effective date. Or it may summarize the current state of a case without seeing the event that changed it just an hour earlier.
This is especially dangerous in healthcare payer environments, where context is everything and distinctions matter. Member, provider, authorization, assessment, benefit, care plan, correspondence, and audit trail are all interdependent domains. The answer is rarely sitting in one table or one document. Useful and operationally effective answers have to be assembled across systems, histories, and policies.
That is why “RAG on messy enterprise data” so often disappoints. The issue is not that retrieval is useless. It is that retrieval is not coherent architecture.
Why healthcare payer environments magnify the problem
Healthcare payer operations expose the gap between data-for-analytics and data-for-agents especially clearly.
A single workflow may require understanding a member’s eligibility, a provider’s network status, an authorization’s lifecycle, an assessment outcome, a plan benefit, a piece of correspondence, and the audit trail behind it all. Much of that data exists somewhere in the enterprise, but it is typically spread across transactional platforms, workflow engines, documents, integration feeds, and reporting stores.
Humans compensate for this fragmentation through experience and tribal knowledge. They know where to look, what to trust, and how to interpret ambiguity. Agents do not have that instinct. The platform has to supply it.
That is why healthcare organizations pursuing agentic AI need to think beyond centralized storage. They need a data architecture that supports context assembly, evidence, policy enforcement, and explainability at runtime.
What the target architecture should look like
The right goal is not to throw out the warehouse or lakehouse. Those remain essential. Instead, the goal is to add a runtime data layer designed for AI use.
That layer typically includes a canonical domain model, a durable event backbone, strong provenance metadata, policy-aware access controls, and a context assembly service that can package the right data for a specific task. Retrieval still matters. Search still matters. Embeddings may still matter. But they need to sit inside a broader architecture that preserves business meaning and operational trust.
That is the missing middle in many enterprise AI stacks.
Not model on top of data; model on top of context.
Where to start
The good news is that this doesn’t require a giant, years-long reinvention before you can realize value.
A solid strategy is to start with one or two high-value workflows and work backward from an agent’s needs. Pick a use case where context and trust matters. For example: summarizing a member’s recent care history before outreach, assembling utilization review context for a nurse, or drafting correspondence with traceable evidence.
Then ask these context-relevant questions:
- What business entities are involved?
- What event history matters?
- What policy governs access?
- What source evidence must be preserved?
- What would a human need to trust the result?
Those questions force the architecture in the right direction.
The real work of enterprise AI
The most common mistake in enterprise AI is assuming the hard part is choosing the model. Usually, it’s not.
The hard part is preparing enterprise data so that an intelligent system can operate with enough context, trust, and control to be useful inside a real workflow.
That is why analytics-ready is not agent-ready.
Warehouses, lakehouses, and vector indexes are valuable, but by themselves they don’t produce enterprise-grade agents. What does? A data foundation that preserves semantics, provenance, event history, policy, and context in a form that can be assembled at runtime. That is the architecture agents actually need.
And in most enterprises, that work has barely begun.
The News(fire)
Curated insights delivered monthly to your inbox.