Blog How Airflow + dbt Work Together

How Airflow + dbt Work Together

Gloria Lin

Session Recap: Using dbt with Airflow

The dbt Live: Expert Series features Solution Architects from dbt Labs, taking live audience questions and covering topics like how to design a deployment workflow, how to refactor stored procedures for dbt, or how to split dev and prod environments across databases. Sign up to join future sessions!

In the latest session, Sung Won Chung, Senior Solutions Architect at dbt Labs, addressed a question he hears often on the job:

How does dbt differ from Airflow, and how (or why) might some teams use both?

What Do Airflow and dbt Solve?

Airflow and dbt share the same high-level purpose: to help teams deliver reliable data to the people they work with, using a common interface to collaborate on that work.

But the two tools handle different parts of that workflow:

  • Airflow helps orchestrate jobs that extract data, load it into a warehouse, and handle machine-learning processes.
  • dbt hones in on a subset of those jobs -- enabling team members who use SQL to transform data that has already landed in the warehouse.

With a combination of dbt and Airflow, each member of a data team can focus on what they do best, with clarity across analysts and engineers on who needs to dig in (and where to start) when data pipeline issues come up.

TIP: Scrub to ~8:03 in the video to see what this might look like, and stay until ~13:37 to see Sung demo a "smart" rerun.

The Right Path for Your Team

Consider the skills and resources on your team, versus what is needed to support each path:

  • Using Airflow alone
  • Using dbt Core/Cloud alone
  • Using dbt Core/Cloud + Airflow

Implementation

For those who are ready to move on to configuration, below are guides to each approach:

Airflow + dbt Cloud

  • Install the dbt Cloud Provider, which enables you to orchestrate and monitor dbt jobs in Airflow without needing to configure an API
  • Step-by-step tutorial with video
  • Code examples for a quick start in your local environment

Airflow + dbt Core

  • Considerations for using the dbt CLI + BashOperator, or using the KubernetesPodOperator for each dbt job

Other Perspectives

Audience Q&A

After the demo, Sung and fellow Solutions Architect Matt Cutini answered a range of attendee questions.

To hear all of the Q&A, replay the video (starting ~19:30) and visit the #events-dbt-live-expert-series Slack channel to see topics raised in the chat.

A sample of questions:

  • (22:00) - How can a complex dbt DAG be displayed in Airflow? (Resource)
  • (24:00) - How can we automate full refreshes for schema changes in incremental models? (Resource)
  • (27:30) - Where can I find examples and best practices for using dbt? (Resource)
  • (28:15) - How can I do deferred runs in dbt Cloud -- rerun only the things that have changed since a prior run? (Resource)
  • (32:15) - Can I use dbt Core with Airflow, and get all of the same functionality as dbt Cloud?
  • (45:00) - How do I set up data status files in my BI dashboards? (Resource)
  • (47:00) - What is the right use case for the incremental strategy in dbt?

Last modified on: Feb 27, 2024

Accelerate speed to insight
Democratize data responsibly
Build trust in data across business

Achieve a 194% ROI with dbt Cloud. Access the Total Economic Impact™️ study to learn how. Download now ›

Recent Posts