Bring structured context to agentic data development with dbt
AI is writing code at Google and Microsoft. Automating deployments at startups. But when it comes to building production data pipelines, it's still mostly manual.
The missing piece is the same as in Part 1 of our series: for AI agents to develop analytics safely, cost-efficiently, and at scale, it needs what every data engineer depends on, structured context.
This is Part Two of our series, Bringing structured context to AI where we explore how dbt and the dbt Model Context Protocol (MCP) server make the rich metadata in your dbt project accessible, cost-efficient, and trustworthy for AI agents.
In this post, we turn to the next frontier: agentic data development.
Where AI systems don't just query dbt data, but develop, refactor, test, and migrate dbt projects safely and at scale. That means eliminating unnecessary warehouse compute and ensuring every change aligns with your org’s rules and definitions.
AI agents accelerate app development, but stumble on data pipelines
In software engineering, AI agents are already rewriting the rules. Google reports that over 30% of its new code is now AI-generated. And this isn't just autocomplete. These agents can interpret intent, map out multi-step plans, refactor codebases, open pull requests, and more.
AI agents thrive in software engineering because application codebases live in highly structured environments. Clear module boundaries, type systems, tests, CI, and version control give agents a predictable world to reason about and safely automate changes in.
Data pipelines live in a very different universe.
In many organizations, analytics environments don’t look like that. Without tools like dbt, the "code" is only a tiny slice of the truth. Every model, metric, and transformation is entangled with business logic, source-system drift, undocumented constraints, and years of accumulated tribal knowledge scattered across chats, documentation, and people's memories. If you're using legacy visual ETL tools or ad-hoc SQL without structured context, agents don’t get the foundation they need to reason effectively.
So yes, agents can generate SQL, but they can't understand why something exists, how it’s used, or what it’s allowed to do. This means that without structured context, these data agents produce code that looks correct but violates your organization's standards and definitions.
To demonstrate this, we asked ChatGPT to create a lifetime value model with no access to a dbt project or structured context. As expected, the output from this zero-shot prompt looks plausible but is very poor.
LTV model from chatgpt promptWhen agents work over your warehouse like it’s just a pile of tables, you get a consistent set of problems:
- Missing SQL/system understanding: If an agent only sees a single SQL file at a time, it has no way to understand how that code fits into the broader pipeline. It can’t see column lineage, compilation errors or recent job logs, it cannot understand how changes relate. So it “fixes” a query or rewrites a model in isolation and produces code that breaks downstream.
- Missing business context and logic: Generic data agents don’t know how your company defines LTV, ARR, churn, or an “active user.” They can’t tell which table is a trusted source of truth and which is a one-off analysis from last quarter. Without access to company documentation, semantic definitions, or domain rules, it might hallucinate and return logic that appears correct but produces incorrect metrics and results leading to fast erosion of trust in anything the AI touches.
- No change-impact awareness: Refactoring a model or adding a new one requires understanding all upstream and downstream dependencies. Without that, agents change SQL in isolation and quietly break BI dashboards, reports, and other models. They should be able to see the blast radius of a change and make decisions with that impact in mind.
- High context switching for humans in the loop: Humans often have to chase context across multiple systems to verify AI outputs. If developers must log into the data platform UI, review dbt logs, and check the BI tool to validate outputs, they lose continuity. Iteration slows down, review becomes tedious, and AI feels like more overhead than help.
- No tight local validation loop: In many AI experiments, the only way to know if a change “worked” is to push it all the way to the warehouse and run a heavy job. If the agent can’t compile models or validate changes locally, even simple changes require full warehouse runs, creating long feedback cycles and increased compute cost.
Even with access to the local codebase, the agent will be forced to guess, because it reads your SQL as unstructured text, not as the interconnected, governed system of models, tests, lineage, and semantics your team works with every day.
That’s why today’s AI-driven data development feels brittle.
How dbt's structured context layer enables agentic data development
Safe, reliable, and cost-efficient agentic data development requires giving systems a structured understanding of your analytics environment, not just access to code.
dbt’s structured context layer provides exactly that. The dbt MCP Server exposes rich project metadata (semantic definitions, lineage, contracts, owners, tests, freshness information, and CI artifacts) through the same structured context and engines used by dbt Platform, Fusion, and the dbt VS Code extension. That structured context layer enables a few key capabilities:
- System-level context: Agents can see your dbt project as a graph, not a pile of files. This means agents start behaving like engineers who understand the system, proposing changes that preserve contracts and avoid surprise breakages.
- Through dbt’s lineage, run artifacts, and metadata (exposed via the dbt MCP Server), they can inspect dependencies, see what’s upstream and downstream, and understand which changes are safe.
- Shared business semantics and logic: Agents can reuse governed metrics and dimensions defined centrally instead of defining their own versions of metrics. This means AI-generated logic matches your existing definitions, so practitioners and leaders see consistency, not conflict, across reports and tools.
- By exposing dbt docs, owners, and the dbt Semantic Layer via the dbt MCP Server, agents can reuse governed metrics, dimensions, and logic instead of reinventing them.
- Impact-aware planning: Because agents see the DAG and downstream consumers, they can reason about the blast radius of a change before making it. They can answer, “Which dashboards and models will this refactor affect?” and choose safer patterns, add tests, or propose follow-up changes instead of blindly editing SQL.
- With dbt lineage, exposures, state from dbt artifacts, and the Fusion MCP tool together give agents a concrete view of upstream/downstream impact for every change.
- Less context switching, more trust: When humans review and steer agent output in one governed environment instead of bouncing across multiple disconnected tools it results in ****higher productivity and trust.
- With dbt Fusion Engine, the dbt language server (LSP), and the dbt VS Code extension, agent proposals, lineage, docs, and compile results all show up in the IDE where engineers already work.
- Fast, scoped validation loops: Agents can compile locally, run targeted checks, and validate only the impacted slice of the DAG. Resulting in cheap, rapid feedback. Teams can let agents propose frequent, incremental changes while maintaining the same quality bars they expect from human-led development.
- By plugging into dbt’s compilation engine, tests, contracts, and Slim CI/state-aware orchestration, agents can compile locally, run only the relevant checks, and validate just the impacted slice of the DAG.
By operating through a structured context layer and governed interfaces (like the dbt MCP Server), SQL stops being just text and becomes part of a governed system. With this foundation, data development agents can read dependencies, reuse existing logic, run compile-time checks, validate outputs, and operate within the same guardrails as your analytics engineers.
And the best part? You can build this today.
dbt’s ecosystem already provides nearly everything data development agents need to reason like analytics engineers, forming the foundation for reliable, explainable, and cost-efficient agentic data development.
Structured context is the multiplier. With dbt as our source of definitions and lineage and MCP exposing that context across Snowflake and Claude, we can add new agent skills without re-plumbing governance. We are excited for dbt Agents to bring purpose-built automation that moves us from reactive tickets to proactive, agent-driven operations and spares us the overhead of bespoke bots.
-Øyvind Eraker, Senior Data Engineer at NBIM
Building a reliable data development agent with dbt
To build a trustworthy, safe, and cost-efficient data dev agent workflow, your agent needs two core capabilities:
- Access to structured context - the complete meaning behind your models, lineage, tests, contracts, and metrics.
- A grounded execution environment - a local, governed workspace that reflects your actual dbt project.
dbt provides both through its structured context layer, exposed to agents via dbt MCP Server and executed locally through dbt Fusion Engine, which powers the dbt VS Code extension.
How agents access structured context
- Model Context Protocol (MCP): Think of MCP as an API for LLMs. It passes structured data from external systems (like GitHub, Notion, dbt, or any system with an MCP server) directly to the agent. This ensures agents always reason from current, complete, authoritative information, not partial context pasted into prompts.
- dbt Fusion Language Server Protocol (LSP): The Fusion LSP powers the dbt VS Code extension, giving agents and developers: gives agents real-time, local insight into your dbt project, model structures, dependencies, refs, columns, and local compilation results. Combined with Fusion Engine, this enables fast, accurate feedback without hitting the backend and makes code changes more reliable by grounding them in the actual project structure.
Together, MCP + LSP give agents the context and a local execution environment inside VS Code enabling safe agentic development for tools like Cursor, Claude Code, or any MCP-enabled IDE.
Conceptual architecture of agentic dbt developmentThe three capabilities your dbt agent must implement
Here’s how an agent uses dbt's structured context to develop safely.
1. Ingest the right context (via MCP)
Instead of switching between multiple tools, it makes more sense to directly ingest this information from multiple MCP servers. For example, an MCP for your issue tracker, another for internal docs, another for GitHub, and the dbt MCP Server for dbt metadata. For dbt specifically, the dbt MCP Server exposes the structured context layer through dedicated tools:
- the Admin tool for reading job logs and un history
- the Discovery tool for exploring lineage and metadata
- the Query/Semantic Layer tool for querying governed metrics and dimensions
- the CLI tool for local compilation, testing, and validation
- the Fusion tool for leveraging Fusion’s compiler, diagnostics, and project state
This gives the agents access to the “why” and “how” behind every model before it generates a single line of SQL.
2. Make safe, local code changes (via Fusion + VS Code)
Once the agent has gathered context from the relevant MCP servers, it can begin making changes. This is where dbt VS Code extension and the language server (LSP) feed the agents with information from the local environment, where changes can be inspected, validated, and governed before touching the warehouse.
- Fusion provides local SQL understanding and compilation, letting the agent analyze errors and dependencies without warehouse queries for fast, safe, and cost-efficient iteration.
- Fusion LSP exposes your project’s full structure, models, columns, sources, refs, and configs, giving the LLM a real-time, authoritative view of your dbt graph for fewer hallucinations.
Inside agent VS Code, agents (e.g. GitHub Copilot, OpenAI, Claude) can propose and apply fixes grounded in the actual project, not guesses, and never blind SQL generation.
Fusion + VS Code is where structured context becomes safe, less costly, and actionable.
3. Validate and deploy changes safely
After successful compilation, the dbt MCP CLI tool can run tests, sampling, and model executions. Agents then uses this structured context to auto create PRs enriched with accurate model diffs, lineage impact, and downstream test results
Once the PR is opened, it automatically kicks off column-level CI which validates only the models that directly depend on the changed code. This keeps iteration fast, cost-efficient, and production-safe, while still giving a full downstream impact analysis.
This agentic workflow mirrors an analytics engineer’s workflow, but automated with complete awareness of your structured context layer.
See it in practice
Here’s a real example: a CI Agent that uses dbt’s structured context layer to diagnose an issue, fix the model, and open a PR.
And this isn’t a one-off. Teams are already using agentic workflows for tasks like refactoring deprecated models, fixing failing tests, adding new metrics, or even migrating to a new warehouse, all powered by dbt’s structured context layer.
In fact, dbt partner Indicium helped Aura Minerals use AI and dbt’s structured context layer to build a migration agent that moved their PySpark estate to dbt. The results: 400+ notebooks and 130 workflows migrated, pipeline time cut by 87%, ~99% code conformity, and 66% less stakeholder coordination. With structured context, Aura adopted a governed dbt environment quickly and safely.
Why this matters for you as a Data Engineer
Anyone who’s maintained a major data model knows the grind: chasing lineage, debugging tests, rerunning builds. Agents powered by dbt’s structured context layer handle this repetitive work without risking your production environment. This workflow gives you:
- Faster debugging — all relevant context is pulled automatically.
- Higher accuracy — fixes are based on real dbt metadata, not guesses.
- Safer and cost-efficient iteration — local compilation avoids unsafe or costly backend queries.
- Automates low-value work — PR creation and validation can be automated.
Engineers spend less time gathering context and more time architecting systems, exactly where human expertise matters most.
Thinking about making data development with agents possible? Start with dbt.
Before you introduce agents into your data development workflow, make sure your foundation is ready:
- Do you have tests, contracts, or CI that catch issues early?
- Is your business logic centralized, not scattered across dashboards, SQL files, and Slack?
- Do your models have lineage, ownership, and documentation?
- Can your project compile cleanly in a local environment?
If so, your project is already agent-ready.
You can connect your own agents through the dbt MCP Server and start shipping agent powered data pipelines today!
Or, if you prefer not to build it yourself…
We’re introducing dbt Agents (coming soon), our purpose-built agents in dbt Platform that plan, generate, and validate dbt code using your existing definitions, tests, lineage, and permissions.
- The Developer Agent (coming soon) will support refactoring, migrations, validation
- The Observability Agent (coming soon) will monitor builds and surfaces likely root causes.
All powered by structured context and governed development. It's time to bring governed, cost-efficient agentic workflows to data development! Get a demo of the dbt MCP Server or dbt Agents to ship reliable agentic data development today.
VS Code Extension
The free dbt VS Code extension is the best way to develop locally in dbt.





