The Data Stack Wasn't Built for Agents

·5 min read
aiagentsarchitecturestrategy

The real bottleneck

A stat buried in a recent MIT Technology Review piece, from a conversation with Databricks and Infosys, stopped me: 95% of enterprise AI projects fail to generate business value.

That's not a model problem. Models have been good enough for serious production use for over two years now. The gap is upstream. It's data — how it's stored, how it's governed, who owns it, and whether the infrastructure was built to serve the kind of workloads AI actually creates.

Most enterprise data stacks were not. They were built for analytics.

What analytics infrastructure was designed for

The data warehouse was a revolution when it arrived. Pull historical data from your operational systems, transform it, load it somewhere structured, and let analysts run queries against it. Build dashboards. Generate reports. Make decisions based on what happened last quarter.

This is read-heavy, batch-oriented, and optimised for aggregation. You ask questions of historical data. The system answers. Nobody writes back to it. The latency is acceptable because the use case is retrospective.

That infrastructure is fine for what it was designed to do. The problem is that it is fundamentally the wrong foundation for agents.

What agents actually need

An agent doesn't just query data. It acts on it, tracks its own state, coordinates with other systems, handles partial failures, and picks up where it left off. It needs to write, not just read. It needs a current view of the world, not last night's snapshot. And it needs somewhere to persist context across the steps of a workflow — what it decided, what it tried, what it's waiting for.

That's an operational database problem, not an analytical one. It's OLTP, not OLAP. Low-latency reads and writes, real-time state management, transactional guarantees.

Most companies deploying agents right now are trying to run them on data infrastructure designed for the previous use case. The agent queries a data warehouse that was built for dashboards, writes intermediate state back to somewhere improvised, and stitches together context across system boundaries that were never designed to interoperate at this speed. It works until it doesn't. And when it fails, it fails in ways that are hard to trace and expensive to debug.

The data is the moat, not the model

The other insight from that conversation that's worth sitting with: the model is not the differentiator.

Every company has access to GPT-4, Claude, Gemini. The frontier is commoditising faster than any individual team can keep up with. The thing that separates companies is the proprietary data they have and how well that data is organised, connected, and accessible.

Your customer history, your transaction records, your internal documents, your product data — that's the moat. A competitor can spin up the same model you're using in five minutes. They cannot replicate two decades of institutional data.

But most organisations can't actually use that data effectively. It's fragmented across legacy systems, SaaS applications, and data formats that were never designed to talk to each other. The AI is only as good as the information it can access, and if that information is locked in disconnected silos, you're feeding a frontier model garbage and wondering why the outputs aren't impressive.

Governance is not the boring part

Every conversation about enterprise AI eventually reaches governance and then watches half the room check out. It sounds like compliance overhead. It's actually the prerequisite for everything else.

If you don't know what data you have, you can't feed it to a model. If you don't control who can access what, you can't deploy agents with any degree of trust. If you haven't defined what your data means — the semantics, the business context, the relationships between fields — then you're giving the model tokens without meaning.

The organisations getting serious results from AI are not the ones that moved fastest. They're the ones that laid the foundation first: unified data, open formats, access controls that scale, catalogued assets with documented lineage. It's slower to start. It's dramatically faster to build on once it's in place.

The organisations that skipped that step and went straight to building are now debugging agent failures that trace back to data they can't trust.

Measure it or kill it fast

The last thing that landed from that piece: the successful AI implementations treat value measurement as a first-class deliverable, not an afterthought.

Define the outcome metric before you build the use case. Measure against it from the start. If the signal isn't there after a defined window, kill the project and move the resource to something with better signal. Don't keep nursing something that isn't working because you already spent the budget.

This is harder than it sounds. There's always a reason to believe the next iteration will fix it. There's always someone who convinced themselves the data will improve. The teams doing this well are the ones that set the measurement criteria upfront and hold themselves to them, the same way you'd hold a financial investment to a return threshold.

The uncomfortable reframe

If your organisation is struggling with AI adoption, the instinct is to look at the models, the tooling, the team capability. Those are rarely the real problem.

The real problem is usually that the data infrastructure was built for the previous decade and the requirements have changed. Agents don't need a better dashboard. They need a data stack that was designed for systems that act, not just systems that report.

Most organisations haven't built that yet. The ones that do it now will have a meaningful head start on the ones that realise it later.

Written by

Martin Dimoski

Senior R&D Executive & AI Systems Builder