Your business runs on data. You’ve got lots of it coming in from lots of different places. To build a competitive advantage, you need to find ways to integrate these datasets, transform them into actionable insights, and deliver them to the relevant stakeholders as quickly and accurately as possible.
The challenge is how to do this without the system becoming a giant, untraceable, out-of-date mess that doesn’t scale.
This is exactly the problem that dbt solves. In this article, we’ll look at what dbt does, how it works at a high level, and how it can standardize the way your organization builds, tests, delivers, and catalogs clean, accurate data.
Check out this 2-minute explainer video where I walk you through it all, or keep reading to dive deeper.
Data, data everywhere
When companies begin mining their data for insights—think: clickstream data, advertising data, customer payment data, product usage data, CRM and ERP data, etc.— the initial efforts are usually ad hoc. Anxious to investigate the data they need to make a decision, you may already have some teams that are going-it-alone and downloading CSVs from different data sources and manually merging them together in a spreadsheet.
Such efforts are highly manual, error-prone, and impossible to scale. Different teams trying to get to the bottom of the same insight, for example “What is our return on ad spend? (“ROAS”)”, wind up with different answers to the same question. When merging columns from Facebook Ads and Stripe, inevitably, columns don’t match, primary IDs are tagged differently, and you’re building complicated join logic to get the data you need. And the worst part? That data is stale the moment you’ve downloaded that first CSV.
With an analytics stack, it’s easy to extract raw data from your data sources and load it into a cloud data platform. From there, you can integrate with BI tools to drive varied operational use cases.
But what happens when your business grows? You launch your service in Europe and Asia, and now you’re allocating ad dollars to new platforms. To get to the bottom of the same insight (“What’s our return on ad spend in Europe? In Asia? Globally?”), you need to be able to dynamically splice and dice your ROAS metric across various dimensions. There will always be new data sets, new business logic, new use cases, and new stakeholders demanding new ways of looking at the data.
You could go back to your BI tool(s) to update your query logic to build new region-specific dashboards. But here’s the problem: when you’re managing business logic across a bunch of different (and always growing) dashboards, it’s only a matter of time before metric inconsistencies pop up. Each region sells different products, or categorizes the same product differently, or generates revenue in different currencies. The complexity is simply too vast and dynamic to manage with the current approach.
It soon becomes very clear, very fast, that the one-off SQL queries you wrote are actually load-bearing, and change management involves manually updating dozens of dashboards three times per week. And you’re still constantly worried that your exec is going to call out a discrepancy in an important meeting and derail any of the trust you’ve built in your data processes.
In this case, you need to standardize your data and abstract the complexity of our business logic so that everyone in the organization—whether a data engineer, BI analyst, or your executive stakeholder—can move fast with trusted data in a scalable, cost-effective way.
This standardization needs to happen inside of the data platform—not across a bunch of BI dashboards.
dbt and the analytics stack
This is where dbt comes in. dbt helps manage this complexity—in a way that’s modular, scalable, repeatable, and governed—all directly inside of your data platform.
dbt is the standard for data transformation in modern environments. Using dbt, data teams can build, test, and deploy analytics code using software development best practices (like portability, CI/CD, observability, documentation, etc.) to create production-grade analytics pipelines that scale. The output is modular, clean data models that can be delivered into our BI tools, LLMs, and APIs to ensure that stakeholders have the accurate data they need, when and where they need it.
dbt is foundational to the modern data integration workflow known as “ELT” which stands for “extract, load, transform.” dbt is the “T” in ELT, helping teams transform data after it has landed in the data platform and shapes it into the format that analysts, managers, and executives need to drive decision-making.
With dbt, you can be confident in your data and that the decisions they inform are accurate, governed, consistent, and shipped to downstream teams with agility.
How dbt works
We built dbt with two core beliefs:
1. Transformation logic should be defined in code; and
2. Data teams should have the tools to be able to work like software engineers so they can treat their data assets like a product.
To accomplish this, dbt enables building robust data pipelines—data transformation processes that are version controlled, tested, documented, and shipped incrementally in a secure and governed way.
Engineers use dbt to define a model using either SQL or Python. Your dbt models specify how to transform data, shaping it to normalize any dimension that matters to your business—campaign names, currencies, product categories—all according to the business logic that you define.
Engineers also write tests to validate their code, and dbt automatically documents every code build to describe the data and what it represents so that future collaborators have the context they need to understand and build on existing data assets. All models are version controlled with Git and data teams create create pull requests (PRs) for other engineers to review before pushing code into production. Once approved, dbt then runs your models, materializing the data into a view or table.
This process of merging data models into production creates a continuous integration (CI) data pipeline. dbt enables CI pipelines that materialize and test your data in different deployment environments—e.g., by creating it in a development environment and testing it in a staging environment prior to running it in production. This helps your data teams identify and resolve any issues with a data model change before it negatively impacts users.
Once shipped, anyone with the appropriate access can use this normalized data to build reports, data-driven applications, and other data assets. With these normalized data sets at their fingertips, regardless of the use case, users can be confident that the analytics they’re using to make decisions are accurate, consistent, and frequently updated to keep pace with changing conditions.
dbt also automatically builds and publishes documentation about every model, enabling anyone in the data workflow to navigate and understand critical context about a data asset (lineage, dependencies, freshness, materialization, exposures, etc.). This improves data velocity and collaboration, as data teams can work more efficiently and data consumers have the trust signals they need to use the data with confidence.
The value of dbt
By standardizing on dbt, companies can:
Ship data products faster
With dbt at the center of your data workflows, your data teams are empowered with a governed and scalable approach to not only develop and test their data models, but also explore, improve, and deploy them at scale. Data teams finally have the ability to work more efficiently, remove bottlenecks, and ultimately ship data products faster.
Build trust in data and data teams
Meanwhile, downstream teams are empowered to make more decisions, faster decisions, better decisions because they have a way to interact with that data, to understand its lineage, and to contribute to it. And if they’re less technical and are using a BI tool to tap into that data, they actually trust the metrics they’re receiving because they are consistent, regardless of where they’re queried.
Reduce the cost of producing insights
Having this centralized framework for collaboration feeds a self-fulfilling flywheel that fosters a data-driven culture that data teams can actually keep up with. All of a sudden, your data platform goes from being a cost center to a profit center. dbt also offers built-in user experiences designed to optimize data platform compute and help teams pinpoint and address inefficient, unused, or long-running models.
Managing data at scale is full of pitfalls, including incorrect, out-of-date, or hard-to-find data. Using dbt, you can build standardized, tested, and documented data sets with high quality that anyone can confidently use to drive your business forward.
Last modified on: Nov 15, 2024
Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.