Scale reliable analytics in the AI era with dbt and Databricks

last updated on Dec 11, 2025

If you're a data leader planning for the future, you're probably feeling the pressure. Your team's backlog could fill the next year. Leadership wants AI results yesterday. And somewhere in the back of your mind, you're wondering if your current infrastructure can actually support what comes next.

Taking a look at the state of the data art, it’s easy to understand why everyone’s sweating bullets. Preparing for the AI future requires:

Accessing your data at scale, no matter how (or where) it’s stored
Migrating off of legacy systems and into more modern data formats
Making overwhelmed data engineers more productive

And, somehow, we need to do all of this while also increasing stakeholders’ trust in data.

What’s needed to meet these challenges isn’t necessarily more standards. What’s needed is a more open data infrastructure that, perhaps ironically, enables us to think less about infrastructure.

I had a good chat recently about this with David Totten, VP of Field Engineering at Databricks. We talked about how dbt and Databricks are working to prepare companies to scale analytics reliably in the AI era by avoiding vendor lock-in, making migration painless, and leveraging AI to augment (not replace) overworked and understaffed data teams.

Watch the on-demand webinar: Insights from Databricks & dbt Labs data leaders: Scaling reliable analytics in the AI era

dbt and Databricks: An expanding partnership

Databricks’ core mission is to get data into a data lakehouse architecture. The data lakehouse architecture gives customers the ultimate power and control over their data in the quickest, efficient, and most productive manner. Using a data lakehouse, companies can drive analytics more cheaply and efficiently than ever.

dbt enables moving data into and out of the lakehouse, providing a seamless way to create and manage high-quality data pipelines. Once inside a Databricks data lakehouse, all of that data is governed by the Databricks Unity Catalog, which provides a single location for governing, discovering, monitoring, and sharing data across the enterprise.

This tight integration between dbt and Databricks - with dbt serving as a data control plane and Databricks as a single source of truth - provides a unified, vendor-agnostic approach to managing all of your data. You can put your data in any format you desire and manage it in a highly scalable and governed manner.

How dbt and Databricks help companies move faster

Almost every customer discussion I’ve had at dbt Labs since taking over as Chief Product Officer has focused on two things: AI and Apache Iceberg. These are the two areas where we find all of our customers have common ground.

What's fascinating is that many teams are still treating these as separate problems. They want to build AI capabilities and, separately, are evaluating Iceberg for their data lakehouse.

But here's what David and I both see from the field: these aren't separate problems. Iceberg is essential to getting your data out of siloes and scaling your AI strategy.

This is symbolic of a bigger shift in our industry. For years, we've dealt with the consequences of closed systems and proprietary formats. We've watched teams spend months—sometimes years—on migration projects, only to find out the company has moved on to the next thing before they're done.

The challenge, in other words, is not to get behind the curve while trying to stay ahead of it. Here is what we’ve seen successful customers do.

Why standards don’t (or shouldn’t) matter anymore

The modern data stack started as a kit of point solutions. You grabbed best-of-breed tools for extraction, transformation, loading, and observability, and stitched them together. Then we moved to platforms that tried to own entire boxes in that stack.

Now we're in a different era. Storage is decoupled from compute. SQL can (and should) run anywhere. Your data lakehouse should work with multiple engines and storage formats. This isn't just good architecture—it's survival.

The question isn't whether you're using the "right" format or the "right" platform. The question is whether you're using open standards that let you move freely as technology evolves. Because technology is evolving faster than ever, and you can't afford to be locked in.

David made a great point about Databricks' journey with Delta and Iceberg. They started with Delta, built incredible technology, and then, when the world said it wanted Iceberg, they hired all the Iceberg people and embraced the open standard. That's the kind of flexibility you need to compete today.

Open source and integration of systems are the key. You don’t want to spend 75% to 80% of your annual IT product budget and nine months figuring out basic infrastructure. That’s why Databricks and dbt have worked together to make it easy to operationalize and govern your data, no matter what standard it adheres to and what format it’s in.

Transpilation: How dbt Fusion will make migration easier

Let’s cut to it: “Transpilation” is just an overly complicated word to describe a problem we’ve wrestled with in the data space for decades.

You need to migrate data from one system to another. But all your SQL is vendor-specific. This necessitates a painful migration process. Six months in, you find you’re maybe 80% done (if you’re lucky). Meanwhile, your target technology is already out of date - the industry’s moved on to the next best thing.

The dream is that you write a SQL query once, and it runs in any environment. That’s the reality we should all be striving for: to ensure that we never have to use the word “migration” again.

This is why, at dbt, we’ve heavily invested in the dbt Fusion engine, the tech we acquired from SDF Labs. We call Fusion an SQL compiler - and it is. It has a native understanding of SQL across multiple engine dialects, meaning it can compile and validate SQL against your data warehouse even as you type in your Integrated Development Environment (IDE). That accelerates development by eliminating lengthy check-in/test/fix cycles common to CI/CD pipelines.

At scale, however, Fusion should be a transpiler. I.e., you shouldn’t have to worry if you have 10 or even a hundred thousand stored procedures in one data warehouse, or get hung up because of that one function you’re calling that’s specific to your platform.

This is a huge blocker to companies meeting their AI goals. As David noted, most organizations have a bunch of legacy data stored offline in physical systems or in various clouds. They want - they need - to make this data available to AI agents. That’s not possible if they can’t leave these legacy systems.

This is something we’re closely tracking at dbt. We believe that Iceberg and AI are both integral to unlocking that last 20% of the migration process - to inspecting code before it’s used anywhere and letting you know if it’s truly agnostic and can run anywhere.

How AI is boosting (not replacing) data teams

There’s a lot of talk these days about the human impact of AI. Particularly, people are worried about the impact that AI technology might have on their jobs.

There’s no need to worry about this in the data space. There’s no data team on the planet that thinks, “We have plenty of time and resources, and no ticket backlog - we’re fine.” Every data team we’ve ever seen is at 110% capacity (or worse).

AI is not coming for the data engineer, analytics engineer, data scientist, or analyst. Rather, AI is poised to boost these roles, augmenting experts so they can burn down those backlogs more quickly.

It’s not just the “boring” stuff - skeleton code, documentation, etc. - that AI can assist with either. We’ve got a great customer who has (for good reasons) strict internal policies on what you can or can’t use inside of data transformation code. Before AI, enforcing these policies was impossible. Inspecting code took too long. With AI-based code inspection, it’s finally possible.

Around 70% of dbt users are leveraging AI to generate code, documentation, or tests. That needs to evolve so we can get out of this reactive mode of burning down Jira backlogs and back to the strategic work that drives the company forward.

Download the report: Structured for intelligence: Why Al needs governed, discoverable, and provisioned data

Transparency leads to trust

Trust has always been a key issue with data. And trust is, at its core, in infrastructure problem.

Trust is, in the end, about being transparent. It’s about helping a user understand, not just what data’s lineage is and where it came from, but where something might have gone wrong in the data pipeline.

That’s what I’m excited about this paramount shift in infrastructure - about Iceberg, AI, and the scalability and transparency they can bring. Using AI, we can give users a rough estimate of the accuracy of the given response and help them navigate how to use it. We can have self-healing pipelines that generate their own bug fixes and file their own pull requests.

This provides a new level of transparency. We don’t have to certify that something is 100% correct - because when is it ever? Things are moving too fast to guarantee 100%, which is why people lose trust in their data. Instead, we give both technical and non-technical experts the information they need to solve the problems they find in data using a common language.

How to scale analytics well into the future

That leads to the question of where to go next. Most companies and data leaders know the direction in which they need to move. It can be challenging, however, to figure out how to take the first steps.

David and I recommend two basic strategies:

Shift from infrastructure to experimentation. You simply cannot drive AI experimentation into production if you're worried about data access, format conversion, and infrastructure management. You need to shift 95% of your team's time from worrying about infrastructure to building actual use cases. AI is here, right now, and you're missing the wave if you're still figuring out how to access your data.

Ask easy questions. Where do you start first? Simple - look at your infrastructure and make a call: red light or green light? If something’s a green light, it’s not a problem. If it’s a red light, it’s a blocker. Start tackling the red lights, one by one.

The partnership between dbt and Databricks isn't just about technical integration. It's about a shared belief that customers deserve flexibility, not lock-in. Teams should spend their time solving business problems, not building infrastructure.

The challenges we discussed aren't going away. Your backlog will still be full tomorrow. The pressure to deliver AI results will keep increasing. And the pace of technological change will only accelerate. But for the first time, we have the tools to meet these challenges.

The AI era isn't coming—it's here. The question is whether your data foundation is ready for it.

VS Code Extension

The free dbt VS Code extension is the best way to develop locally in dbt.

Install free extension

Latest posts

Pulse16 min

How AI is transforming modern data pipelines

Joey Gault

on Mar 02, 2026

Learn5 min

Write once, analyze anywhere: Omni + the dbt Semantic Layer

Roxi Pourzand

on Feb 27, 2026

Product12 min

How Zscaler cut PR review time by 90% using dbt context and multi-agent AI (OpenAI)

Hrishi Kulkarni,Chakshu Mehta

on Feb 25, 2026

The dbt Community

Join the largest community shaping data

The dbt Community is your gateway to best practices, innovation, and direct collaboration with thousands of data leaders and AI practitioners worldwide. Ask questions, share insights, and build better with the experts.

Join the CommunityExplore the community

100,000+active members

50k+teams using dbt weekly

50+Community meetups