Building a data stack for trusted AI

last updated on Jun 03, 2026
AI adoption is stalling. Not for lack of ambition, and not because the models aren't capable. The data foundation underneath isn't ready.
The numbers back this up. About 16 percent of organizations surveyed have deployed AI agents, according to Gartner. Meanwhile, 80 percent of IT leaders don't believe their data is ready for AI, and over 70 percent worry about governance in a world increasingly run by agents.
That last number should be higher. Way higher.
Trusted AI requires trusted data: governed, consistent, and contextual. Without it, every AI initiative follows the same arc: a promising pilot, a scaling problem, a stall. The model isn't to blame. The foundation is.
The shape of the consumer has changed
The data stack was built for people. Every layer, from how we model data to how we serve it, assumes a human is sitting at the end of the pipeline: someone with judgment who can pause when a number looks off and decide whether to trust it.
That assumption is being overturned. Agents are querying your data continuously, taking action autonomously, and they don't stop to reconcile. They act.
The analyst who runs 50 queries a day has been joined by agents that might run 50,000. Where a human analyst can tolerate lag and ambiguity, an agent making inventory, pricing, or routing decisions cannot.
The core issue is context. An analyst carries context in their head: they know that "customer" in the revenue model means paying subscribers, not trial users, because someone told them that in onboarding.
Agents don't have that. They take what they're given. What a column means, what's authoritative versus stale, what's a trusted curated model versus a raw staging table: that context now has to live in the data itself, in metadata, in contracts, in governed definitions that any system can read.
Why AI is guessing in the dark
Here's what actually happens when you point a generative AI model at your databases, warehouses, or BI datasets without a governed context layer:
- The AI scans whatever tables and columns it can see.
- It guesses which ones to use based on their names.
- It pulls data straight from the warehouse, mixing raw, staging, and curated tables alongside siloed SQL in views and notebooks, with no reliable way to know which represents the actual source of truth.
The results are consistent. Without governed context, agents:
- Generate unreliable SQL because it can't identify the right models
- Invent or misapply your metric definitions
- Create governance and trust issues with no clear audit trail
- Drive up costs as unreliable queries burn tokens and compute
All of this happens because the agent is guessing in the dark. The context it needs to behave reliably simply isn't there.
Two pillars: governance and structured context
Trusted AI rests on two pillars.
The first is governance: control, data lineage, and quality. The second is structured context: the semantics that agents can actually reason over. Without both, every agent reinvents the truth.
Over 50,000 companies use dbt in production. The governed structured context layer it provides fixes the context gap.
It tells your AI or agent how your data is defined, how it connects, and what it actually means. It exposes the rich metadata that already lives in your dbt project: your models, lineage, metrics, freshness definitions, and documentation. Then, it surfaces all of that through open standards, including the model context protocol (MCP) and the semantic layer using MetricFlow, so any AI system can rely on a single governed source of truth rather than reconstructing context from scratch every time.
dbt Wizard complements this by packaging proven dbt workflows - including testing, debugging, migrations, and metrics definition - into reusable patterns. Agents not only know the context; they know how to follow a consistent, proven process when acting on it. dbt Wizard CLI surfaces that same context for fast, cost-efficient local development.
The benefits compound quickly, with structured models and tests giving you AI that generates reliable, reusable SQL. A governed, query-optimized semantic layer gives AI the right definitions and logic, while centralized lineage surfaces clear ownership and makes every change auditable through Git and pull request (PR) workflows.
The result is both increased accuracy and greater cost efficiency. When AI works from curated context rather than your entire warehouse, your token, compute, and review costs stay manageable.
Check out dbt Wizard, your personal dbt agent, available wherever you work.
The semantic layer is not optional
The semantic layer is the component that makes AI actually reliable. When metrics and business entities are defined once and reused everywhere, whether the consumer is a dashboard, an operational workflow, or an agent, they all work from the same trusted logic. That consistency is what separates AI that amplifies good decisions from AI that amplifies errors, with no human in the loop to catch the difference.
Gartner estimates that by 2027, enterprises without a semantic layer will spend 40 percent more on AI rework and remediation than those with one. Don't be those enterprises.
At dbt Labs, we approach this through open standards. The Open Semantic Interchange (OSI) is an open standard for how semantics move between tools. MetricFlow gives you one definition of every metric, consumable anywhere. Define once, use anywhere, and it works with whatever agent or application comes next.
The practical upshot: dbt is agent agnostic. We work with whatever LLM framework or vendor you choose. Your governed context travels with the data. As your AI stack evolves with new models, tools, and interfaces, the structured foundation stays the same. You centralize logic in one layer, reduce redundant queries, and get more reliable AI outputs across the board.
What this looks like at ACV Auctions
ACV Auctions runs a wholesale dealer-to-dealer automotive marketplace. Their analytics manager, Darren Peters, has been building through this challenge firsthand.
The starting point was familiar. ACV's embedded reporting system couldn't keep pace: simple report tweaks for dealers could take weeks or months. Their internal BI tool had accumulated over 1,500 dashboards. A new employee searching a common business term would see 400 different results, each a variation on the same concept, with no reliable way to identify ground truth.
Darren's instinct, well before AI chat was part of anyone's roadmap, was rigorous data governance at the dbt layer. He talks about writing column descriptions the way you'd write a board game rule book: specific enough that no one argues over what a rule means.
The goal was traditional data governance: a clean data dictionary, automated quality tests, and descriptions piped from BigQuery into Confluence. Precise, unambiguous definitions for every column.
That discipline turned out to be the foundation for everything that came after. When ACV brought in Omni Analytics for their semantic layer, Omni could pull column-level metadata directly out of their data warehouse automatically.
Darren built the governance layer for traditional reasons. But it also set up ACV perfectly for AI.
A few months into using Omni's AI chat, the team's workflow has shifted. When a Slack message arrives from a stakeholder with a data question, Darren pastes it straight into the chat. If the response looks right, he sends back a shareable query URL: a live, governed analysis the recipient can view or iterate on themselves, without it being formally published and adding to the content pile. No more ad hoc dashboards that bloat the system just to answer a one-off question.
Product managers who used to wait days or more for ROI analyses on feature work are now running those analyses themselves. Darren's team validates and adjusts, but a backlog that used to go untouched is actually getting cleared. Customer support teams are answering dealer-specific account questions in real time, without ticket queues.
Darren describes himself as a former AI skeptic, specifically in the context of self-serve analytics. Two things shifted his view: the maturity of the semantic layer itself, and access to more capable reasoning models.
When both came together, it wasn't a gradual improvement. "It was a light switch," he said. One question in the chat, one response, and he knew it was real.
The shift ACV is living now: less time building content, more time engineering context. When the core analytics loop is about defining a dimension precisely and describing it in plain language, the team has to stay close to the business and keep sharpening its understanding. That rigor pays off across every consumer, whether human or agent.
Shipping AI with confidence
Solving for the context gap is what finally lets you ship AI with confidence. It leads to fewer hallucinations, better decisions, lower security and governance risk, reduced token and compute spend, and faster data development.
Most importantly, context enables AI initiatives that actually scale beyond the pilot stage. They scale because teams are willing to adopt and trust them.
That's the part people underestimate. Governance isn't the enemy of speed. It's the condition for adoption.
When data semantics are open and portable, your stack stops being a series of vendor decisions and becomes a foundation you actually own. Every layer, whether ingestion, storage, or semantics, stays interoperable and yours to evolve. That's what lets enterprises scale AI without being held hostage to architectural choices made three years ago.
For a hands-on look at all of this in practice, watch the full session recording.
Build the governed foundation your AI initiatives need: Talk to the team at dbt today.
Get started in dbt
Join the analytics engineers building data infrastructure that actually scales.
Fivetran + dbt Labs: What's shipping. Live Q&A.
Tristan Handy and Taylor Brown answer your questions directly — what the Fivetran + dbt Labs merger means for your team, and what's coming next. June 25 & 30.





