Transforming raw source data into the format required for downstream use (whether that’s reporting, operational automation, ML models, or exploratory analysis).
Teams practicing analytics engineering do this in a transparent, easy to debug manner.
This transformation work is generally done in a few chunks:
We’ll cover each of these 4 chunks of work in the following pages, hearing from various members of the Fishtown Analytics team.
The modeling process is owned by analytics engineers.
Other roles contribute in a couple of ways:
Before embarking on a data modeling project, it’s helpful to have:
Having these two will allow you to get a feel for what shape of transformation will be required:
What tables must be joined together, and how? What metrics must be calculated, and where?
Let’s take a step back to 2016, when dbt was born.
Raw source data was flowing into MPP databases with (relative) ease. At that time, teams were visualizing it in reporting tools like Looker, Tableau and many others. But the transformation layer in between was being duct taped together.
Teams were writing complex stored procedures with often-confusing dependencies. Or sometimes even manually refreshing queries to calculate metrics for visualizations.
Refreshing transformed metrics often required human intervention:
When a query failed, a human would re-run each individual query from that point forward in the dependency graph - inferring those dependencies by intuition or experience.
There wasn’t much of a concept of an automatically-generated dependency graph.
Turns out that many teams were running into this same challenge: plentiful raw data, a useful tool to visualize it, but nothing in between to get datasets ready for use.
dbt emerged to empower data teams to solve this problem.