Monte Carlo and dbt Labs: Partnering for more reliable data
“Why didn’t my job run?”
“What happened to this dashboard?”
“Why is this column missing?”
“What went wrong with my data?!”
If you’ve been on the receiving end of a broken data pipeline, these questions probably look familiar to you. And as data ecosystems become increasingly distributed and complex, the likelihood of this “data downtime,” or in other words, periods of time where data is missing, inaccurate, or otherwise erroneous, only grows.
Fortunately, there’s a better way forward.
Just as an engineer would never deploy code to production without testing it first, or run production software without application performance monitoring and observability, data teams can apply a similar “testing + observability” framework to their data pipelines.
To this end, we’re excited to announce Monte Carlo and dbt Labs’ strategic partnership to help teams ship more reliable data, faster, through robust testing and monitoring. This new partnership will allow data teams and business stakeholders to focus on the work that adds the most value, with confidence in the integrity of their data products.
Monte Carlo and dbt: a recipe for more reliable pipelines and models
Together, dbt and Monte Carlo allow data teams to combine the benefits of both testing (for known unknowns) within dbt and observability (for unknown unknowns) in their data pipelines, ensuring the highest degree of confidence possible in the integrity of the underlying assumptions from upstream data producers to downstream data consumers.
With Monte Carlo, teams can proactively manage the health of their data across the entire data stack — including data warehouses, data lakes, BI tools, data orchestration tools, and of course dbt. Whereas dbt tests allow for SQL-based specific or modular queries/assertions on models and the tables that comprise them, Monte Carlo takes this a step further by employing automatic and opt-in ML-powered detection for freshness, distribution, volume, and schema changes across your data stack and enables teams to manage data quality across a single pane of glass.
Once an incident is detected, Monte Carlo extends dbt tests and enables data teams to centralize data observability on a single platform to automatically detect, quickly resolve, and prevent bad data altogether by:
- Enabling data observability across the entire data stack, including data warehouses, data lakes, BI tools, data orchestration tools, and of course dbt
- Extending and consolidating dbt tests with automated monitoring for freshness, distribution, volume, and schema changes
- Allowing for rapid triaging, root cause analysis, and incident resolution by centralizing the most critical context, including dbt error logs in a single interface. Data team owners can then access the most critical context, including dbt error logs, automated field-level lineage, and other data, code, and operational diagnostics to triage, conduct root cause analysis, and resolve incidents quickly.
As a result, data engineering and analytics engineering teams can spend more time building new data pipelines to help drive strategic business decisions and power data products
Monte Carlo customers can easily check their dbt pipeline health as they investigate data incidents, so that they can get to root causes much faster.
Enriching our integration for a lasting partnership
While Monte Carlo has supported dbt Core and Cloud integrations since early 2022, we’ve since expanded and enhanced the joint value.
With the latest release of these integrations, Monte Carlo and dbt Cloud customers can:
- Surface dbt test failures and model errors as Monte Carlo incidents in a central UI, enabling teams to seamlessly analyze their downstream impact
- Troubleshoot data incidents by checking associated dbt models and tests run results in Monte Carlo.
- Map out dbt models to tables in their database, gaining insight into each table’s dbt model name, location, model code, and last run time.
- Import dbt tags and descriptions on table and field levels, allowing developers to manage all metadata in a single place.
Monte Carlo’s dbt Failures are Monte Carlo Incidents feature notifies dbt users when an error occurs or test fails to run, and gives them the tools necessary to troubleshoot the issue directly in the Monte Carlo UI.
Hear from our joint customers
Here’s what some of our mutual customers have to say about the value of our partnership and native integration:
Washington, DC based Optoro is on a mission to make retail more sustainable by eliminating waste from returns for industry leaders in the retail space like Ikea, Target and Best Buy. Their organization uses data to re-route inventory to the best locations. To achieve data trust and tackle data quality at scale, they turned to dbt and Monte Carlo to solve their data quality issues.
“We use dbt to test and transform data after it enters our warehouse and Monte Carlo to monitor for data quality issues at every stage of the data pipeline,” said Patrick Campbell, Lead Data Engineer at Optoro. “Now, we are the first to know when data quality issues arise, rather than stakeholders downstream.”
Auto Trader UK
Manchester-based Auto Trader is the largest digital automotive marketplace in the United Kingdom and Ireland. Their platform connects millions of buyers to sellers, and handles thousands of customer interactions a minute. As the data team migrated to trusted, on-premises systems to cloud they needed a way to ensure data is trustworthy to more than 50% of all Auto Trader employees.
“With dbt and Monte Carlo, we know our data is reliable and trustworthy,” said Edward Kent, Principal Developer at Auto Trader. “This integration signifies a commitment to adopting software engineering best practices for DataOps, particularly as it relates to validating and monitoring data as it evolves across its life cycle. We look forward to the continued innovation and collaboration!”
Berlin-based Kolibri Games has had a wild ride, rocketing from a student housing-based startup in 2016 to a headline-making acquisition by Ubisoft in 2020. While a lot has changed in five years, one thing has always remained the same: the company’s commitment to building an insights-driven culture based on accurate and reliable data.
“With over 100 million unique events produced per day across 40 different event types, our games generate an unprecedented amount of data, and in order to trust it, we need to prevent bad data from entering our pipelines and know when incidents arise downstream,” said António Fitas, Head of Data Engineering at Kolibri Games. “Monte Carlo and dbt are the perfect tools to help us achieve the level of trust and reliability as we scale our data platform in 2021.”
Last modified on: Nov 29, 2023