/ /
Using state-aware orchestration to slash your data costs

Using state-aware orchestration to slash your data costs

Kathryn Chubb

last updated on Nov 26, 2025

There are multiple factors that make it challenging to manage data pipelines at scale. One of the biggest is cost.

Many teams don’t consider cost when they first begin building out their data pipelines. As the number of pipelines they manage and the volume of data increase, however, the cloud compute bill becomes too large to ignore. That sets them scrambling to find ways to enable the rapid deployment and development of data pipelines at the optimum price.

The dbt Fusion engine, the next-generationg dbt engine, makes it easier than ever to save money on data pipelines. It does this using state-aware orchestration, which uses the current state of your pipeline to make intelligent decisions around which models it ought to rebuild.

Let’s look at what drives the cost of data pipelines up and how state-aware orchestration brings those costs down. We’ll also cover the other ways that Fusion drives down data prices.

How data pipelines waste money

Most data pipelines are complex. A data pipeline created using dbt, for example, can contain multiple models. These models can themselves materialize into multiple tables or views, each connected to one another via dependency declarations.

dbt represents this in a rich directed acyclic graph, or DAG. This means that when you create and run a data pipeline job in dbt, it’s aware of the dependencies between your models and knows in which order to run them.

dbt builds models whenever you use the dbt run command. By default, run rebuilds all models in a project. You can also tell dbt to run a model and all its dependencies (dbt run -m my_model+).

Even with the latter option, however, dbt is running every model in the dependency chain—whether its model has changed or not, and whether it has new data or not. This means running all of those models for every dev, staging, or production run.

This is an extra cost that you don’t need. It may be trivial for a single pipeline and a single developer. But this cost multiplies quickly across a company as the complexity of models, the number of models, and the number of people modifying models increase. If your teams are shipping model changes rapidly as part of a well-tuned analytics workflow, this cost can quickly spiral.

This is one of the many problems that the dbt Fusion engine solves.

Reducing data costs with the dbt Fusion engine

Fusion is a new version of the dbt engine that turns dbt into a full SQL compiler, one that understands the syntax and semantics of all major data warehouses. That means it can be smarter about what needs to be rebuilt in a dbt model’s dependency chain.

Let’s look at what this means in practice. dbt Core always runs with Just In Time (JIT) rendering. This means it renders a model, runs it in the data warehouse, and then moves on to the next model in the chain.

By contrast, Fusion defaults to Ahead-of-Time (AOT) compiling. It renders all models in the project, producing and statically analyzing every model’s logical plan before it runs any models in the data warehouse.

How state-aware orchestration in the dbt Fusion engine saves money

AOT compilation means Fusion understands your data model’s dependencies. It also knows if your data or your model code has changed between runs.

This enables Fusion to use state-aware orchestration. With state-aware orchestration, Fusion will only run those models and any dependent models with pending changes.

Fusion isn’t just aware of single-job state, either. If you’re running multiple Continuous Integration (CI) jobs that use the same models, Fusion is smart enough to only run them once across all running pipelines.

An example of state-aware orchestration in action

To understand this better, let’s walk through an example. Take the following DAG. The model takes the raw data tables (prefixed with raw.), creates staging models to transform the data, and then creates dimension and fact tables for a data warehouse from those.

Let’s say that when this model runs in production, the raw.wizards model is the only source model with fresh data. A traditional data pipeline might run every single model here by default. By default, the dbt Fusion engine only rebuilds raw.wizards and its dependencies - stg_wizards, dim_wizards, and fct_orders. It will leave raw.worlds, raw.order, raw.wands, and all their downstream tables as is, since it knows these don’t require a refresh.

How state-aware orchestration works

You might be wondering how all of this operates under the hood. The answer is that Fusion is built upon a few core principles:

  • Real-time shared state: All jobs leverage a shared model-level state, which enables tracking model runs across jobs
  • Model-level queuing: If multiple jobs leverage the same model, these queue up at the model level to prevent collisions and unnecessary rebuilds
  • State-aware and state-agnostic support: A job can run in either a state-aware or state-agnostic mode; in both cases, dbt updates the shared state to accurately reflect the model’s run status

The benefits of state-aware orchestration

Using state-aware orchestration immediately brings multiple benefits to your data pipelines:

Reduced data costs. Rebuilding fewer models results in reduced cost per run, as it reduces the amount of compute needed to run each job.

Faster data pipeline runtimes. Fewer model runs also reduce the time it takes for a given pipeline to run. That means data engineers will spend less time waiting for job runs to complete successfully, boosting overall developer productivity.

Out of the box operation. Fusion uses AOT by default. There’s no need to tinker with configurations to get better performance immediately.

High configurability for more demanding use cases. That said, you can also tailor state-aware orchestration for your needs as required. For example, you can further control when models are run in jobs by specifying source freshness intervals on a per-model basis.

Usable from data engineers’ favorite IDEs. Fusion is built into the dbt Studio IDE. It also tightly integrates with Visual Studio Code via our official extension.

Getting started with state-aware orchestration

The free dbt VS Code extension is the best way to develop locally with the dbt Fusion engine.

Follow our step-by-step guide to using Fusion. If you’re new to dbt, you’ll want to start with the Quickstart for your data warehouse.

How the dbt Fusion engine accelerates data development at less cost

Fusion can save you both time and money in data development:

  • Fusion is a rewrite of the core dbt engine in Rust. This means it delivers superior performance over the previous Python-based engine—up to 30x faster per run
  • AOT compilation reduces data warehouse round-trips during development; the dbt Fusion engine can check SQL syntax locally, resulting in less churn in your deployment pipelines

Get started on Fusion quickly with the dbt VS Code extension and talk with a dbt expert today.

Live virtual event:

Experience the dbt Fusion engine with Tristan Handy and Elias DeFaria.

VS Code Extension

The free dbt VS Code extension is the best way to develop locally in dbt.

Share this article
The dbt Community

Join the largest community shaping data

The dbt Community is your gateway to best practices, innovation, and direct collaboration with thousands of data leaders and AI practitioners worldwide. Ask questions, share insights, and build better with the experts.

100,000+active members
50k+teams using dbt weekly
50+Community meetups