/ /
29%+ warehouse savings: How the dbt Fusion engine drives cost efficiency

29%+ warehouse savings: How the dbt Fusion engine drives cost efficiency

Kathryn Chubb

last updated on Jan 21, 2026

Data pipelines have a cost efficiency problem. We should know. We deal with it first-hand ourselves at dbt Labs.

Every day, we run over 895,000 scheduled jobs. Most run every hour.

The problem is, most don't need to be run. The underlying data hasn't changed. That leads us to rebuild thousands or even tens of thousands of dbt models that just output the same results. That wastes compute costs—and it wastes the time of data engineers who spend cycles managing jobs instead of improving the models themselves.

That's why we looked to state-aware orchestration as a solution. With state-aware orchestration, you rebuild a model only if it needs to be rebuilt. The potential cost savings with this approach are massive, running anywhere from 10% up to 64%.

Implementing state-aware orchestration required, however, that we rewrite the dbt engine from the ground up. Let's take a look at how the old approach to building data models leads to waste, how state-aware orchestration helps, and how the new dbt Fusion engine uses state-aware orchestration to cut development cycles and slash compute costs.

The orchestration problem—or, why freshness doesn't have to mean waste

dbt models map one or more source tables to new destination tables with relations between them, creating a directed acyclic graph (DAG). Today, when you run dbt build even on a single DAG, it rebuilds every model in the DAG.

In the model below, the stg_orders and stg_customers models in our staging layer, the int_orders model in our intermediate layer, and the dim_customers and cust_orders models in our mart layer are all rebuilt, regardless of the state of the underlying source_orders and src_customers tables.

State-aware orchestration works differently. With state-aware orchestration, we keep a cache at the model level. This means we can recognize that, in the current model, we only need to rebuild three of the models, whereas we can reuse the other two (stg_orders and int_orders) whose source data hasn't changed.

This leads to a whole lot less wasted compute. We've seen that our customers are able to reuse 30% of their models using this approach.

10% savings just by using Fusion

The question is, how is the dbt Fusion engine able to do this?

Fusion moves dbt from being a stateless piece of software to being a stateless engine with real-time model state. What this means is that, as jobs run, we keep a real-time cache of the entire environment with each model's hash data and code state. If nothing's changed, we simply don't build it, and we get lower costs across every build.

It also means that we can make real-time decisions about what to build as we traverse across the DAG. It no longer matters which job runs which model because all of these are reading and writing from that same shared state. And if two jobs build the same model at the same time, we're actually going to wait for that first one to finish. This reduces the complexity because those models can no longer clash across different jobs.

So, in a first build of a DAG with 13 models, you might see 13 models built, taking a total of 49 seconds to run. If you kick off another build just a few minutes later, you might see it takes only 27 seconds for that entire job. That's because we only run the models whose sources have changed. In the case shown below, this means rebuilding only two out of 13 models.

This is really the power of what we can do with state-aware orchestration. What we've seen is that from our customers and our own data team, simply by turning on state-aware orchestration, they've been able to save 10% plus on the data warehouse compute.

Advanced configurations: Turning 10% into 30% or 50%

That's a great cost reduction. But we can go further by fine-tuning our runs using three basic approaches.

Run on SLAs, not on timers

You can use advanced configurations to customize the behavior of both how sources and models behave in state-aware orchestration. This helps you deal with situations like when a model might be needed less often, even if its sources are fresh. A data source might update every hour, for example, but the report it feeds is only reviewed once every week.

You can fine-tune your runs further by asking for a given model: Do I want to wait until all of the dependencies upstream have built? Or simply do it whenever there's new data on any?

As you fine-tune these configurations, you're able to dramatically increase how much is reused. That leads to fewer builds against the actual data warehouse as you align your builds with the SLAs of the business.

Customize behavior based on business needs

With these advanced configurations, you can define what it means for a source to be fresh. There are three ways to approach this.

First, freshness might be tied to a specific column. You can do that with the loaded_at_field in your dbt models, or create a custom SQL query where you pull that metadata from an Iceberg table. You could put in a custom combination of number of rows loaded and a datetime in order to refine the definition of "fresh" in a highly granular manner.

Second, you can create model SLAs for situations where sources refresh more frequently than the business needs data. You can use a build_after setting in your dbt models to wait until enough time has passed, even when there is new data to incorporate.

build_after: {count: 2, period: hour}

Third and finally, updates_on lets you specify whether we should wait on all upstreams to be updated or simply build whenever any of them are.

The magic thing about dbt is that because this all exists in your dbt project YAML, you can incorporate this as sensible defaults at the project level and then override those across any folder, subfolder, or even individual model that really allows you to fine-tune your project to the exact configurations that make the most sense.

Efficient testing: Only run the tests that are needed

You can optimize further by optimizing how often you test.

Traditionally, with dbt Core, we run a test set at the end of every model that builds. This is great because it helps ensure data quality by ensuring tests pass as data changes throughout our data lineage.

But it also means that we're probably overusing tests.

With efficient testing using the dbt Fusion engine, you only run the tests that are actually needed. Suppose you have a unique test that passes upstream. With dbt Fusion's column-aware, semantic understanding, if no downstream model logic invalidates that test, it reuses the passing test rather than rerunning it. This reduces runtime and lowers costs by avoiding unnecessary computation.

Using Fusion's semantic understanding of the SQL, we don't just skip the tests. We reuse them because we know the test is statistically guaranteed to pass. There isn't any SQL logic that will invalidate any unique joins or WHERE conditions that depend on the individual test run upstream.

When we run those tests, instead of executing each one as a separate query against the data warehouse, we bundle them into a single query and execute it in a single pass. This further maximizes compute efficiency.

Proving the value to your business

Now, here's how you can prove these results to the business.

With state-aware orchestration, we show you the models built and reused across your entire account. The dbt platform's reporting shows you the impact your changes have on your runtimes and on the number of models built.

On the individual jobs, we show you that same view of models built versus reused. You can fine-tune those advanced configurations to determine how to meet the SLAs for when the business needs the data and how often.

If you want to go above and beyond, you can also write this cost savings data in dollar or credit terms directly to your data warehouse. This lets you analyze and prove those cost savings wherever you work.

More importantly, this helps you make friends with finance. When it comes time to ask for that additional headcount, you can show them and your management team exactly how much you've saved on your data warehouse bill.

Real results: Our journey and customer success

In summary, enabling state-aware orchestration delivers 10%+ in cost reduction. Fine-tuning those advanced configurations can give you an additional 15% or more. Efficient testing can add another 4%, for a total of 29%.

dbt Labs was the first user of Fusion and the first user of state-aware orchestration in production. We're a 9-year-old dbt project with 108 contributors.

To be honest, historically, cost optimization wasn't our focus. But as we've grown (and as our data warehouse bill has grown along with us), it's come under scrutiny.

When we implemented state-aware orchestration, we first enabled it and then reconfigured our jobs into two freshness-based tiers. We achieved these results without rewriting any SQL code. (I say SQL here because obviously we did write some YAML in order to put in those build after configs across our project and defaults, and then override those for more frequent tiers where needed.)

What we saw here was pretty incredible. We saw a 63% improvement in average job runtime. For example, our incremental job went from 3.5 hours on average to 25 minutes—that's 88% faster. We saw 75% of models reused daily. That's gone from 9,000 built to only 2,200. And most importantly, we saw a 64% annual data platform savings of all of our dbt-related spend.

The hope is that, with state-aware orchestration, advanced configurations, and features such as efficient testing, everyone can achieve the same results in reducing their data warehouse spend and improving the efficiency of their data pipelines.

What's next

As we look ahead, we're really working on bringing that cost data from the warehouse directly into the dbt platform. We can show you how much you're saving without any additional work on your part.

We want to build even slimmer CI using the power of Fusion and that column-level awareness we get through our semantic understanding to create an even slimmer set of models that are actually changed for your CI jobs.

We're working on creating AI agents to automate configurations and save you both time and money as you fine-tune your dbt projects. And we're investing in even more automatic cost savings like dynamic indexing across your tables so that when you turn on cost efficiency features with Fusion, you get the maximum out of the box across your warehouse.

Get started today

Anyone using dbt with an eligible project who's on an enterprise-tier plan can turn this on today. You simply need to change your production settings from dbt latest to dbt Fusion. You can then check the boxes in your job settings and start benefiting from those immediate cost savings through state-aware orchestration and from fine-tuning your model builds as you need across your project.

If you're not currently using dbt, sign up today for a free account and step through the Fusion getting started guide to see how Fusion can optimize your data transformation workflows and workloads across your entire enterprise.

VS Code Extension

The free dbt VS Code extension is the best way to develop locally in dbt.

Share this article
The dbt Community

Join the largest community shaping data

The dbt Community is your gateway to best practices, innovation, and direct collaboration with thousands of data leaders and AI practitioners worldwide. Ask questions, share insights, and build better with the experts.

100,000+active members
50k+teams using dbt weekly
50+Community meetups