Analytics Engineering: Operating with Leverage
One of the foundational principles in software engineering is to operate with leverage. Don’t write in binary, write in Go. Don’t plug ethernet cables into servers, use AWS + Terraform + Docker + Kubernetes. Et cetera. If your time is valuable, it should be important to you to operate with the highest possible leverage. This allows you to get more done with your fixed number of hours. Simple.
dbt is an exercise in giving data analysts leverage. I frequently get asked the question “Why shouldn’t I just write SQL to construct my transformation pipelines?” The answer is leverage. With dbt, you only write business logic and let dbt focus on all of the DML / DDL boilerplate. With macros and the DAG, you can write DRY, modular code. You can test your code with ease. And you automatically document all the work you’re doing as you write the code. All of this is leverage.
There are a million examples of this that if you’re a heavy dbt user just become a part of the air you breathe. But I recently ran across such a simple perfect example in my own work in our internal analytics that I wanted to share it.
Recently, Stitch released a completely new Postgres integration and announced that the prior integration would be phased out. We use Postgres as the application database behind dbt Cloud, and rely on this data for a lot of our dashboards. Connecting the new integration was easy and the data arrived in Snowflake exactly as expected. Great!
Here’s where things would’ve gotten annoying if we didn’t have dbt. We would’ve had to go through the dozens of Mode dashboards and a dozen Looker views that all point to tables from this source and, for each one, make sure that we changed it to point to the new schema. Because this is a repetitive task with no automated way to validate the outputs, there likely would be some assets that got missed, or even worse, some assets where some schema references were updated and others were not.
In short: making this update would have been annoying, time-consuming, and error-prone.
We use dbt as an abstraction layer between our business intelligence tools (Mode and Looker) and our raw data in Snowflake—our business intelligence user group doesn’t even have permissions to see the raw data. That means we can make a change like this in dbt and we’ll be confident that that change will flow downstream.
Not only that, but we’ve fully implemented a new dbt feature called Sources, which made the change even easier. In fact, the PR was exactly +1 / -1. Take a look:
That’s the source spec for our Postgres integration. I updated that single line, rebuilt all of my models (using –full-refresh), then tested everything. Everything worked great.
One line of code and about 5 minutes to validate that everything continued to work, vs. a manual process to search through two entire codebases to do an error-prone find-and-replace and then spot-checking by loading a bunch of dashboards. That’s leverage.
This one instance is not, in and of itself, so groundbreaking. But this is the type of leverage that well-architected dbt projects bring to an entire analytics tech stack. Every day, every single task by every single analyst is done with optimal leverage. dbt doesn’t make data teams 10% more productive, it makes them 10x as productive.
I recently met the entire data team for a company with $130m in funding and 300 employees—the data team was two people. I laughed when they told me they were all there was, and asked them if they were growing the team. They said “not really…we are pretty good at what we do and we have good tools.”
⚡️Ready to improve your analytics engineering workflow? Get started with dbt today. ⚡️
Last modified on: Nov 22, 2023