Reverse ETL: How to and why

on Jan 17, 2025

Reverse ETL is the process of getting your transformed data stored in your data warehouse to end business platforms, such as sales CRMs and ad platforms. Once in an end platform, that data is often used to drive meaningful business actions, such as creating custom audiences in ad platforms, personalizing email campaigns, or supplementing data in a sales CRM. You may also hear about reverse ETL referred to as operational analytics or data activation.

Instead of only exposing data in a business intelligence (BI) tool, reverse ETL pipelines allow data to be synced down to ad platforms (Facebook, Linkedin), sales CRMs (Salesforce), customer service platforms (Zendesk, Kustomer), and other destinations. There are a variety of platforms, such as Hightouch, Rudderstack, and Census, that help data teams harness the power of reverse ETL.

In this post, we’ll look at the practical advantages, use cases, and methods for implementing reverse ETL.

How reverse ETL works

In the reverse ETL process, transformed data is synced from a data warehouse to external tools in order to be leveraged by different business teams.

The power of reverse ETL comes from sending down already transformed data to business platforms. Raw data, while beautiful in its own way, typically lacks the structure, aggregations, and aliasing to be useful for end business users off the bat.

After data teams transform data for business use in ELT pipelines, typically to expose in an end business intelligence (BI) tool, they can also send this cleaned and meaningful data to other platforms where business users can derive value using reverse ETL tools.

Data teams can choose to write additional transformations that may need to happen for end business tools in reverse ETL tools themselves or by creating additional models in dbt.

Why use reverse ETL?

There’s a few reasons why your team may want to consider using reverse ETL:

Putting data where the work is happening

While most data teams would love it if business users spent a significant portion of their time in their BI tool, that’s neither practical nor necessarily the most efficient use of their time.

In the real world, many business users will spend some time in a BI tool, identify the data that could be useful in a platform they spend a significant amount of time in, and work with the data team to get that data where they need it. Users feel comfortable and confident in the systems they use everyday—why not put the data in the places that allow them to thrive?

Manipulating data to fit end platform requirements

Reverse ETL helps you to put data your business users need in the format their end tool expects. Oftentimes, end platforms expect data fields to be named or cast in a certain way.

Business users shouldn’t have to worry about manually entering fields that can easily change; instead, that data should be populated by our version-controlled, QA’d, and peer-reviewed dbt models. To do this, we leverage reverse ETL pipelines to create automated workflows that send data down data to a variety of end business tools that we specify as exposures in dbt.

Supporting self-service efforts

By sending down data-team approved data in reverse ETL pipelines, your business users have the flexibility to use that data however they see fit. Soon, your business users will be making audiences, testing personalization efforts, and running their end platform like a well-oiled, data-powered machine.

Reverse ETL use cases

Just as there are almost endless opportunities with data, there are many potential different use cases for reverse ETL. We won’t go into every possible option, but we’ll cover some of the common use cases that exist for reverse ETL efforts.

Personalization

Reverse ETL allows business users to access data that they normally would only have access to in a BI tool in the platforms they use every day. As a result, business users can now use this data to personalize how they create ads, send emails, and communicate with customers.

Personalization was all the hype a few years ago and now, you rarely ever see an email come into your inbox without some sort of personalization in-place. Data teams using reverse ETL are able to pass down important customer information, such as location, customer lifetime value (CLV), tenure, and other fields, that can be used to create personalized emails, establish appropriate messaging, and segment email flows. All we can say: the possibilities for personalization powered by reverse ETL are endless.

Sophisticated paid marketing initiatives

At the end of the day, businesses want to serve the right ads to the right people (and at the right cost). A common use case for reverse ETL is for teams to use their customer data to create audiences in ad platforms to either serve specific audiences or create lookalikes.

While ad platforms have gotten increasingly sophisticated with their algorithms to identify high-value audiences, it usually never hurts to try supplementing those audiences with your own data to create sophisticated audiences or lookalikes.

Self-service analytics culture

We hinted at it earlier, but reverse ETL efforts can be an effective way to promote a self-service analytics culture. When data teams put the data where business users need it, business users can confidently access it on their own, driving even faster insights and action.

Instead of requesting a data pull from a data team member, they can find the data they need directly within the platform that they use. Reverse ETL allows business users to act on metrics that have already been built out and validated by data teams without creating ad-hoc requests.

"Real-time" data

It would be amiss if we didn’t mention reverse ETL and the notion of “real-time” data. While you can have the debate over the meaningfulness and true value-add of real-time data another time, reverse ETL can be a mechanism to bring data to end business platforms in a more “real-time” way.

Data teams can set up syncs in reverse ETL tools at higher cadences, allowing business users to have the data they need, faster. Obviously, there’s some cost-benefit analysis on how often you want to be loading data via ETL tools and hitting your data warehouse, but reverse ETL can help move data into external tools at a quicker cadence if deemed necessary.

All this to say: move with caution in the realm of “real-time”, understand your stakeholders’ wants and decision-making process for real-time data, and work towards a solution that’s both practical and impactful.

Reverse ETL tools

Reverse ETL tools typically establish the connection between your data warehouse and end business tools, offer an interface to create additional transformations or audiences, and support automation of downstream syncs. Below are some examples of tools that support reverse ETL pipelines.

Tool	Description	Open source option?
Hightouch	A platform to sync data models and create custom audiences for downstream business platforms.	❌
Polytomic	A unified sync platform for syncing to and from data warehouses (ETL and Reverse ETL), databases, business apps, APIs, and spreadsheets.	❌
Census	Another reverse ETL tool that can sync data from your data warehouse to your go-to-market tools.	❌
Rudderstack	Also a CDP (customer data platform), Rudderstack additionally supports pushing down data and audience to external tools, such as ad platforms and email CRMs.	✅
Grouparoo	Grouparoo, part of Airbyte, is an open source framework to move data from data warehouses to different cloud-based tools.	✅

The dbt approach to reverse ETL

We tackle reverse ETL projects with the following step-based approach:

Understand stakeholder expectations: What’s the problem we’re solving for here?
Identify the data models and fields needed to be pushed to the tools
Transformation time! Create export models to power reverse ETL pipelines
QA it up, create syncs in our reverse ETL tool, and involve stakeholders for training
Setup dbt exposures for governance and documentation
Establish alerts for reverse ETL pipelines to provide transparency to end business users

Below, we’ll walk through these steps in greater detail using a user segmentation example with the jaffle shop, a sample dataset and dbt project.

Before we jump into that, a few prerequisites you may need to create and manage reverse ETL pipelines are:

Data stored in a data warehouse
A reverse ETL tool to perform syncs from your data warehouse to your end platforms
An end platform that you’re looking to push data to, such as a sales CRM or ad platform

Step 1: Understand stakeholder expectations

We summarize stakeholder needs by determining how often data ingestion, dbt Cloud jobs, and reverse ETL syncs should occur. This ensures alignment between business goals and technical capabilities for timely, actionable insights. Our approach blends stakeholder input with expert analytics intuition to define realistic SLAs.

Step 2: Map out entities & fields

Now we’re getting into the fun stuff! At this stage, we need to understand the data and dbt models needed to really build export models. We do the following forms of planning: entity matching and field identification.

Entity matching

It’s important to understand how the grains differ between our existing dbt models and the end business tools’ system. For example, how does a Salesforce contact match or map to a dbt Cloud account? This is the type of question we solve in this stage of planning.

Field identification

At this stage, we need to determine if we have all the fields already modeled out or if we need to create new fields for the project. Most of the data business users want exported down are derived from our fct_ and dim_ models. Oftentimes, the fields stakeholders want in their downstream tool already exist in our data models, but may need to be cast, renamed, or grouped differently.

Step 3: Create export models

Just as we have marts, staging, and metrics directories for our dbt project, we create a separate models/exports directory for all of our data models we want to sync to business tools. For every existing data model we want to sync out to different destinations, we have one export model.

For example, if we have a customers model like in jaffle shop that we want to be synced down to multiple destinations, we would create a single export_customers.yml model.

The primary goal of these export models is to supplement any existing data or modeling in our fct_ and dim_ models required by stakeholders in the structure that the tool we’re syncing to expects. I just said a lot of words, so let’s break down what I really mean by this.

Our fct_ and dim_ tables stored in our models/marts directory form the backbone of analytics at dbt Labs. An immense amount of work goes into making our fct_ and dim_ tables what they are:

Closer to the end of the DAG
Have undergone considerable transformation
Likely experienced some entity joins to get them in their final shape
True and proven: these models have undergone immense testing and peer review to ensure the data they hold are accurate and usable

Our ultimate belief: Export models shouldn’t be doing the heavy lifting that happens for fact and dimensional models.

Step 4: QA, set-up reverse ETL sync, & train stakeholders

We conduct thorough quality checks on export models, then sync data to a sandbox environment and validate with stakeholders. This ensures data accuracy, usability, and stakeholder confidence before affecting production. dbt emphasizes stakeholder training to empower teams with self-service capabilities and clarity on data generation.

Step 5: Create exposures

Just as we would for key dashboards in our BI tool, we create dbt exposures for our reverse ETL syncs. Exposures are a great way to provide ownership, identify dependencies, and context into reverse ETL projects. They also allow us to visualize all of our downstream dependencies in the DAG.

We create a separate exposure_export.yml file in the models/exports directory where we store exposure information specifically for all of our export models. Below is an example of how we would document the exposure for the export_customers model we created for the jaffle shop in Step 3.

- name: email_crm_export
  type: application
  maturity: medium
  url: <reverse ETL tool sync URL>
  description: '{{ doc('email_crm_export') }}'
  depends on:
	- ref('export_customers')
  owner:
	name: brandon t.
	email: b******@dbtlabs.com

Step 6: Set up alerts in reverse ETL tool

We establish alerts for sync failures caused by upstream changes, destination outages, or tool issues, notifying stakeholders promptly. This maintains transparency and ensures timely resolution of data pipeline disruptions. We integrate these alerts with Slack channels for effective communication and resolution tracking.

Example: mapping out entities in customer segmentation

As we mentioned earlier, a common use case for reverse ETL is to send customer information down to email platforms to support better customer segmentation. For this scenario, we can use the jaffle shop’s customer table to create a new export model. While we’re using the jaffle shop here as an example, it represents a similar flow to what we do internally for syncing to our email platform.

The table

The customers table here has seven columns: customer id, first name, last name, first order date, most recent order date, total number of orders, and customer lifetime value (CLV).

All of these fields could potentially be used by business users in an email platform–maybe they want to pass in a customer’s number of orders into the email subject line to make the emails more personalized or create different customer segments based on CLV for a loyalty program.

As a result, we’re going to make sure we pass down all possible fields. In addition, the marketing team requested a few features about this export model:

The marketing team wants to create a new segment, buyer_type based on the number of orders a customer has placed
The team is interested in having the most recent order date and first order date extracted out to the month-level so that they can contact new, bounced, and longtime loyal customers with different messaging

The code

Use the following code to create an export_customers.sql model that lets your friends on the marketing team go wild and free with email segmentation.

with import_customers as (
	select * from {{ ref('customers') }}
),
segment_customers as (
	select
		customer_id,
		first_name,
		last_name,
		date_trunc('month', first_order_date) as first_order_month,
		date_trunc('month', most_recent_order_date) as 
		most_recent_order_month,
		number_of_orders,
		customer_lifetime_value,
		case when number_of_orders >= 3 then 'super_purchaser'
		     when number_of_orders >=1 and number_of_orders < 3 then 
			'regular_purchaser'
		     else 'non_purchaser'
		end as buyer_type
	from import_customers
)
select * from segment_customers

As you can see from this example, there’s not really any fancy or mind-boggling data modeling happening here.

All we did was extract some dates, alias a few fields, and add a simple case when statement.

Join the largest community shaping data

The dbt Community is your gateway to best practices, innovation, and direct collaboration with thousands of data leaders and AI practitioners worldwide. Ask questions, share insights, and build better with the experts.

Join the Community Explore the community

100,000+active members

50k+teams using dbt weekly

50+Community meetups

Reverse ETL: How to and why

How reverse ETL works