How SurveyMonkey sharpens dbt with data observability
Dec 03, 2024
InsightsSurveyMonkey is a company that lives and breathes data. For a while, its primary focus was getting data in front of business users. Over time, however, it realized multiple teams across the company were having similar—and, in some cases, unexpected—issues with data.
Let’s take a look at how data quality and data processing both began as pain points across SurveyMonkey—and how the company’s data engineering team used a combination of reactive monitoring with Monte Carlo and proactive data management with dbt to yield measurable improvements.
SurveyMonkey’s data landscape
SurveyMonkey is the leader in global online forms and surveys. With over 25 years of experience, they’ve helped organizations of all sizes, from startups to Fortune 500 companies, answer close to 88 billion questions.
Over 3,000 organizations look to SurveyMonkey for insights on their customers, employees, and products. Which means they process a lot of data.
To make smarter business decisions, you need to know what you don’t know. SurveyMonkey helps you ask, listen, and act, transforming customer insights into valuable, actionable strategies.
The magic happens behind the survey, in the company’s backend data landscape. SurveyMonkey imports data from multiple data sources into Snowflake Enterprise. It runs 180+ workflows, including data transformation pipelines powered by dbt, to transform, tailor, and mine meaning from all this data. SurveyMonkey also uses Airflow for orchestration and Monte Carlo for data observability.
The company’s dbt footprint is sizable. It maintains 600 models and 2,000 test cases, pulling data from close to 900 data sources.
The data problems SurveyMonkey unearthed (with a survey, of course)
When SurveyMonkey’s data engineering team first tackled data quality and data governance, its focus was building its dbt models. It focused more on availability of the data (getting it into the hands of business users) versus the quality of the data.
Soon, however, the team began receiving pointed requests from a diverse group of stakeholders:
- Executives wanted data pipelines to run early so they’d have scorecards in the AM
- Marketing wanted to monitor and track the performance of their campaigns in real-time against benchmarks and goals
- Directors requested proactive detection and notification via Slack of anomalies discovered by Monte Carlo
Instead of fixing these problems immediately, the data engineering team asked itself: Are other groups in SurveyMonkey having the same issues? Data engineering team manager Samiksha Gour sought to answer that question with—what else?—a survey.
The results were illuminating. When asked what the common data challenges were in their organization, 53% cited data quality issues (inconsistencies, inaccuracies, etc.). Which Gour expected.
What was surprising, however, was that 50% cited data processing issues. Many, many teams were concerned about the performance of their data pipelines.
How Monte Carlo + dbt enhanced data quality
To tackle these issues, SurveyMonkey integrated two tools it was already using: Monte Carlo and dbt.
SurveyMonkey leverages dbt for data transformation. It uses dbt models to clean raw data from various sources to create high-quality, usable data sets. The company leverages dbt tests to verify data quality and dbt documentation to create consistent, well-documented data models that serve as single sources of truth.
The company employs Monte Carlo to monitor these dbt models and pipelines. Monte Carlo’s monitoring capabilities detect data anomalies, ensuring data accuracy and maintaining data governance.
SurveyMonkey integrated these two tools in several different ways:
Standardized quality checks
dbt tests became mandatory for every model.
Regular performance reviews
Rather than abandon a model after pushing it to production, the team conducted regular performance reviews and optimizations for all dbt models.
Automated monitoring and alerts for Monte Carlo-detected anomalies and failed dbt tests
The team provided proactive notifications that something was wrong with data vs. waiting for the issue to show up on a scorecard.
Converting Monte Carlo anomalies into dbt test cases
For anomalies where the data team didn’t want the pipeline to proceed, they’d create a test that stopped it from running. Business users were informed of the problem and were responsible for addressing the issue in the upstream data source. That meant business users could unblock themselves instead of waiting on data engineering to unblock them.
Leveraging dbt documentation with Monte Carlo Asset
SurveyMonkey used a combination of manual documentation and AI-assisted suggestions generated by dbt Copilot to help document their 6,000-some tables for business users. Users would then search Monte Carlo Asset, the company’s single pane of glass for data assets, to filter down the available tables based on the documentation.
How dbt and Monte Carlo improved SurveyMonkey’s business
After putting these measures in place, SurveyMonkey measured their impact on the business.
One of the most impressive results is the data engineering team reduced its Snowflake credit usage by around 73% for close to 10,000 credit jobs.
A key driver was performance monitoring, which led the team to fix and merge queries, redo models, and delete unused models and tables. In particular, the team used performance monitoring to identify long-running update jobs doing cross-joins, simplifying the SQL statements that powered them.
A second benefit was that SurveyMonkey could scale while keeping its costs stable. Despite adding many more dbt models (e.g., by bringing a marketing analytics media platform in-house), the team’s job execution time remained stable while its cost per credit in Snowflake steadily decreased.
A third benefit was a marked increase in data quality. In March 2024, when the team brought in the third-party marketing analytics platform, it saw a spike in data anomalies reported by Monte Carlo. This was because the team itself didn’t fully understand this new data. It needed time to work with it, grok it, and shape its data pipelines to produce healthy, accurate data.
Remember, however—the team had a principle of converting Monte Carlo anomalies into dbt test cases. As its dbt test cases spiked, the anomalies started to drop.
Finally, SurveyMonkey saw its regular performance reviews pay dividends. In one case, as a result of obsoleting unused models and revamping inefficient queries, it saw a 94% reduction in pipeline runtime and a 97% reduction in Snlowflake credit usage.
Lessons learned
The data engineering team admits that their journey, at times, wasn’t a very smooth ride. However, they learned some valuable lessons along the way:
Monte Carlo and dbt is a winning combination
Using two tools that worked well together worked in the team’s favor, unlocking wins they might not have secured otherwise.
Don’t introduce anomalies to business users at the beginning of the project
It doesn’t make sense to send anomaly notifications to business users when the data engineers themselves still don’t understand the data. Take a month and wait for the data pipelines to stabilize before you involve business users.
Stop running pipelines on bad data
A stopped pipeline gets everyone’s attention. If you detect an important anomaly, make a test case for it. Then, you can stop the pipeline and restart it once your business users fix the upstream source.
Onboard new users to the system from the get-go
Specifically, SurveyMonkey now trains new business users to leverage Monte Carlo for incident routing and table research.
What’s next for SurveyMonkey’s data journey?
Using dbt and Monte Carlo, SurveyMonkey’s data engineering team successfully tackled two key data problems simultaneously: data quality and data processing times. In doing so, it decreased its total data processing time and costs—which enabled the company to handle even more data.
The team isn’t stopping there. Some key improvements it plans to make include:
Enhanced tool compatibility
Exploring future updates to bridge the gap between dbt and Monte Carlo features—specifically in handling dbt snapshots effectively.
Expanded monitoring capabilities
Specifically, developing and implementing more custom monitors.
Using data domains
A data domain is a logical grouping of data, along with all of the operations it supports. They enable domain teams to own and operate their own data while interacting with other teams via a data mesh architecture. SurveyMonkey aims to leverage data domains to identify further areas of data improvement.
Stakeholder tracking
Implementing a system to track stakeholder trouble tickets and monitor their progress closely.
To get more insights on SurveyMonkey’s use of dbt with Monte Carlo, watch the full presentation from Coalesce 2024.
Last modified on: Dec 03, 2024
Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.