Today, we launched the dbt Fusion engine, a complete rewrite of dbt from the ground up. If you’d like more information on what just launched check out these posts:
- What is Fusion and how does it help?
- New Code, New License
- Technical Components and Licensing
- Current Maturity and Path to GA
In this post, though, I want to talk about the future. What does this complete technical overhaul say about the future of dbt?
Let’s dig in.
Ripping out an engine is hard!
In the world of commercial open source infrastructure software, there is an emerging trend: rebuilding the engine at the heart of the platform.
Databricks did this with Photon. Photon is a complete rewrite of the Apache Spark engine in C++. Before Photon, Databricks was a really nice way to run Apache Spark in the cloud. Now, Databricks delivers meaningful benefits that are simply not possible with “vanilla Spark,” including significant price and performance gains.
Confluent did this with Kora. Kora is a protocol-compatible rewrite of the Apache Kafka engine. Before Kora, Confluent was a really nice way to run Apache Kafka in the cloud. Now, Confluent can rightfully claim meaningful benefits that are simply not possible with “vanilla Kafka,” including significant price, performance, and reliability gains.
There are others. MongoDB’s launch of Atlas comes to mind.
These decisions are fascinating to me as the founder of a commercial open source business. In each of these cases, these companies had to say: “What got us here won’t get us there.” And I don’t think Databricks, Confluent, or Mongo would be the success stories they are today without these investments.
But they are incredibly hard to execute on. The demands of growth—thousands of customers, different segments, different user profiles, a fast-moving ecosystem, etc.—require so much attention that it is incredibly hard to say “we’re going to take 1-3 years and rebuild the engine that everything else is built on.” But without making these kinds of investments, a commercial OSS business is unlikely to be successful over the long term.
This thought has been bouncing around in my head for several years, and it’s been pretty clear to me that we were going to have to make this leap as well. There were simply aspects of the dbt Core code base—tracking back from 2016!—that were not able to get us to the future, that we couldn’t iterate our way through. Performance. Functionality. Etc. There was just no path towards the world we want to build without rebuilding the foundations.
We were on our own internal path towards this rebuild when I originally met Lukas and got to hear what he and the SDF team were up to. This match made in heaven has allowed us to move 1-2 years faster than we otherwise would’ve been able to. I feel as though I’m shipping tech today that was brought from the future into the present in a time machine.
So: now that the dbt Fusion engine is live in the wild, where are we headed? What does this unlock for the future of dbt?
What the dbt Fusion engine unlocks
The below are the medium-term (12-24 month) directions that the dbt Fusion engine will allow us to innovate in. While I’m not here to commit to specific dates, you should expect that we have a direct line-of-sight to all of these themes based on the state of the underlying Fusion technology.
Parse & compilation times
The new dbt Fusion engine does a more advanced parse and compile than dbt Core, and even so is already around 30x faster to parse and substantially faster to compile.
Parse and compilation times are incredibly important for any piece of developer tooling. The faster they are, the more useful that developer tooling becomes.
Pre-Fusion, dbt’s parse times have been just barely fast enough to support traditional developer workflows. Even for this use case they are a pain point in larger projects.
But imagine other use cases that have not been supportable by dbt because of parse times:
- Agent-based chat experiences with MCP where an agent iteratively writes and tests code based on user requests. Current parse times are just too slow.
- Better developer experiences that require recompile-on-keypress (more below).
- Ever-larger dbt projects authored by ever-larger teams enabled by ever-more-accessible authoring experiences.
…and more that we cannot even anticipate. In general, every time you improve the performance of a developer tool by an order of magnitude, you discover use cases that were totally unanticipated. (Developer creativity is a beautiful thing!)
In the future, you should expect parse and compile times to continue to drop, as there is now significantly more headroom for optimization within Fusion. You can already see this inside of the VS Code extension, where we can incrementally recompile a single file in milliseconds.
Improved developer experience
Developer experience isn’t just about making developers happier: it is about making developers more productive. And Fusion, along with the new dbt Language Server and VS Code extension, delivers.
Historically, dbt developers were left to actually execute dbt in order to find errors in their code. Even in the best of worlds, that is a cumbersome process, and the performance challenges of dbt Core only made the problem worse.
Now errors show up as you type—and they are significantly more complete and detailed. dbt will now validate not only your dbt code, but your SQL as well—not just coarse-grained things like function parameters but fine-grained things like type checking. And dbt will now not only validate the model you’re developing in; it will, as you type, validate downstream dependencies as well and surface those errors.
Add to this features like auto-refactoring, auto-renaming, go-to-source, and more and the tasks you do day-in, day-out just got a whole lot more efficient.
Local execution for development environments
The original thesis for dbt, authored in 2016, was that data practitioners should adopt the tooling and best practices of software engineers. And much of that has come to pass over the past almost-decade. But one of the ways that this has definitely NOT come to pass is local development environments.
It is generally superior, especially in a post-Docker world, to have development environments that can run locally end-to-end. Local development environments are more amenable to building great tooling, are more customizable, reduce latency, and eliminate a source of cost.
And while software engineers have been doing this for a long time—packaging Postgres locally and pointing to RDS in production—data practitioners operating in the cloud simply couldn’t do that. You can’t run BigQuery or Snowflake or Athena or Databricks (etc.) on your local machine.
Until now.
The dbt Fusion engine can fully emulate—with consistency guarantees—the functionality of the underlying data platform and allow all developers to execute their code locally, without ever reaching out to the remote data platform. This is not “best guess” emulation; this is Fusion fully emulating, down to the logical plan level, the exact behavior of the underlying platform.
When paired with dbt’s existing data sampling functionality, this will be a dramatic upgrade in the dbt experience. While we’re not ready to share when this will ship to users, this type of functionality is something we are already playing with internally.
Cost savings
The dbt Fusion engine gives us huge scope to mitigate costs for users. In fact, our goal will be that adopting the full capabilities of Fusion will be ‘cheaper than free’ due to the savings it will create.
For almost all companies that use dbt, it is the single largest driver of consumption on their underlying cloud data platform. dbt makes it easy to author data pipelines, and data pipelines can be expensive to execute. Companies have historically only really had two options: a) accept this reality and do their best to optimize, or b) limit the number of humans who had access to author pipelines. Neither are good answers.
With Fusion, dbt will be able to automatically optimize your pipelines and orchestration to allow them to simply cost less to execute. As of today, dbt Enterprise customers on Fusion get access to a feature we are calling state-aware orchestration which is expected to result in an average of 10% cost reduction, simply by turning the feature on and letting Fusion optimize how and when jobs are run. In a typical customer spending $1 million on their underlying platform, dbt can often be 50% of that, so this 10% of the $500k of dbt-driven data platform spend represents a $50k annual cost savings simply by standardizing on Fusion.
This is only the first tranche of cost saving strategies unlocked by Fusion. In the future, we anticipate this number growing significantly beyond 10%.
PII, governance, and lineage
Tracking the flow of PII in a sufficiently-complex data ecosystem is a very challenging problem when you stare at it long enough. At first it would seem easy: tag all the source data, and make sure you have column-level lineage. As it turns out, that is not sufficient.
PII is often transformed into non-PII with certain predictable transformations. Imagine running a count(user.email)
: the resulting column is based on PII, but it is not, itself, PII. If you provide a false positive on downstream uses of the resulting column, users will lose trust in the system and begin to ignore it.
This may seem like a small thing, but in environments with tens of thousands to millions of tables, the complexity of these seemingly-simple things quickly spirals out of control. The technology behind the dbt Fusion engine was originally built to serve exactly this requirement inside one of the most complex data ecosystems on the planet: Meta’s internal data warehouse.
In the not-too-distant future, we’ll be offering the ability to show an audit-ready view of your PII footprint across your entire data landscape, all powered by Fusion.
Cross-platform workload portability
I do not believe that we are in an ecosystem that will ultimately be dominated by a single large player. This is not Windows in the 90’s (monopoly). It is not even Mobile in the 2010’s (duopoly).
Instead, there is a group of 5-10 major vendors, and major platforms, that dominate the data ecosystem (oligopoly). Not 50-100! But 5-10. And I think this will be a persistent fact.
Within those 5-10 vendors, nearly all customers will require flexibility. It creates exactly zero enterprise value to migrate code from one platform to another. No CIO / CDO wants to spend time migrating or replatforming. And so they tend to stay on older, more established technologies for longer than they would want to avoid having to pay this cost.
This desire for cross-platform flexibility is exactly what is driving the massive demand for Iceberg and other open-table formats.
With the dbt Fusion engine, in the future, dbt-authored pipelines will never need to be migrated again, for any data platform that Fusion supports. Just as Fusion can emulate the underlying data platform with complete type-aware fidelity to support local compute environments, it will be able to use that logical plan to perform real-time, guaranteed-faithful SQL transpilation. This allows SQL written in one dialect to be automatically ported into others—at runtime.
When paired with Iceberg, this capability represents the future of enterprise cross-platform flexibility.
This capability is likely a bit further away, but it is nonetheless very real and unlocked by the dbt Fusion engine.
Just the beginning…
And these benefits are just the beginning of what Fusion will bring to the dbt Community in the coming years. In a period of such intense change (both at dbt Labs and in the technology industry more broadly), predicting the future can be challenging. But I am incredibly confident that Fusion—the new technical foundation that we are building the entire dbt platform on top of—will help us accelerate into that future.
Last modified on: May 28, 2025
2025 dbt Launch Showcase
Join us on May 28 to hear from our executives and product leaders about the latest features landing in dbt.
Set your organization up for success. Read the business case guide to accelerate time to value with dbt.