Data testing

Data testing is a process that often doesn’t get enough love, but it’s so important.

Before I began working at Fishtown Analytics, I was one of those engineers who tested only lightly — I once had a project that consisted of 180 models, for which I ran a whopping six tests.

I’ve since upped my ratio considerably because I learned, sometimes the hard way, that data testing is a vital component of analytics engineering.

Why is testing a necessary part of an analytics workflow? #

The analytics engineering process runs on numerous assumptions we make:

  • That data is current.
  • That the model is sound.
  • That the transformed data is accurate.

If we don’t test those assumptions, we’re operating in a state of ignorance — and in analytics engineering, ignorance is definitely not bliss.

If you performed no testing, you could run countless models, never knowing if something’s gone wrong.

Why might a test fail? #

Depending on the type of test you’ve run (one of dbt’s four built-in tests or a custom-built one), a failed test will tell you one of three things:

  • The SQL in the model didn’t do what you intended it to do.
  • An assumption about your source data is incorrect.
  • An assumption about your source data that was true once upon a time is no longer valid.

What should you test? #

By Sanjana Sen

How much testing is too much? How much is not enough? How do you know? Testing is a science and an art, so there’s no one-size-fits-all solution.

Knowing where you currently stand is the fastest route to getting where you want to be. In my experience, there are 5 phases of testing adoption:

  1. Laying the Foundation
  2. Getting Proactive
  3. Instilling a Culture of Testing
  4. Confronting Alert Fatigue
  5. The Goldilocks Zone

Let’s walk through what you might consider testing at each of those stages:

Stage 1: Laying the Foundation

If you’re not currently testing, don’t worry — you’re not the only one: about 40% of dbt projects run tests each week.

If you’re just getting started, it’s useful to set your reason for beginning to test. There are two reasons we’ve seen teams be testing-curious:

  • Data doubt: Are persistent data quality questions dogging your team’s ability to build new stuff?
  • Preparing to grow: If you’re currently a data team of one (or two, or three), it’s crucial to establish a testing cadence before you write your hiring plan. You’ll attract better talent, they’ll ramp faster, and working async will be much easier.

What to Test in Stage 1

  1. Primary keys: unique & not_null tests. These help identify the usual suspects: null values that mess with filters and joins, or accidental fanouts that throw off your aggregate counts and sums.

  1. Accepted values or relationship tests. For example, say we have a set of transactions, with four potential statuses: placed, shipped, completed, returned. What happens when a new payment order status is introduced?

Stage 2: Getting Proactive

You have basic data quality tests in place — tests for uniqueness, not nullness, accepted values and maybe some relationship tests.

At this stage, you’re confronted with problems that can only be solved with proactive testing:

  • Data freshness: Reporting being out of date due to ETL processes not completing on schedule, or your dbt tests not running.
  • Domain-specific problems: Like the same customer having multiple active subscriptions — the edge cases that must be hammered out over time.

There is, of course, no way to think of everything that can and will happen to your data — so you’ll always need a plan for addressing reactive problems gracefully as well.

What to Test in Stage 2

  1. Expand into custom schema tests to hammer out those corner cases in your data. Explore the schema test macros in the dbt-utils package or dbt-expectations for inspiration.
  2. Consider implementing source freshness reporting in dbt, as well as alerts in your ETL tool if you’re using one. Note: if you’re using Mode (or another data visualization tool that supports dbt’s Metadata API), you can view dbt data freshness reporting directly from your reporting.

Stage 3: Instilling a Culture of Testing

At this stage, the safeguards and response mechanisms you’ve created will reduce the risk of unexpected issues. Your data is structurally sound. So what’s next?

The opportunities at this stage tend to be human-centric and workflow related:

  • Testing adoption: Is everyone on the team adding tests when they write new data transformations?
  • Issue resolution: Is there a team-wide process for quickly debugging test failures?
  • CI/CD process: Are test results being properly examined as part of pull requests in development?

What to Consider in Stage 3

  1. Think about how to best support your team process-wise. This might mean adding testing as a step on your pull request template or coming together to define a test failure debugging workflow. You may want to consider a framework like pre-commit-dbt, which checks code quality across your project.
  2. Consider implementing a blue/green deployment strategy to allow your team to catch issues in staging before they make it to production. This can be considered as part of a broader CI/CD workflow, which may include dbt Cloud’s Slim CI or a tool like Spectacles for Looker.

Stage 4: Confronting Alert Fatigue

Maybe you’ve gotten a bit carried away with testing. Are you a bit macro-happy? Maybe you tried adding a test on every column, but now you’re drowning in error messages?

Yep, I’ve been there. Over-testing can actually do more harm than good — but it’s a natural part of growth. We notice it in a few ways:

  • Alert fatigue: Are your source freshness latency expectations too rigorous, to the point where they fail every day? If you’re not actively resolving alerts (because they don’t really need to be that stringent), now’s the time to loosen those thresholds to reasonable levels.
  • Time / resource burn: Tests cost query time. Depending on your warehouse platform, that might mean credits ($), but it always means development time ($$$). Do you really need to test intermediate models that don’t undergo any aggregation or joining?

What to Consider in Stage 4

  1. Audit your tests. Examine your recently-failed tests: do they really need to exist? If so, prioritize them in order and fix them accordingly. You can do the same thing across the board — are there deprecated models, or never-used columns where you can remove tests?
  2. For non-critical tests, changing severities from “error” to “warning” — this will mean that those tests no longer cause the dbt test step to fail in your builds.

Stage 5: The Goldilocks Zone

Welcome to the promised land! You’ve put in the work to find what works for your project and team. At this point, you’ve found consistency in your process:

  • Test coverage: Using dbt’s out-of-the-box tests, custom-built tests, or those from open-source packages like dbt-utils or dbt-expectations.
  • Test placement: At either end of your workflow — closer to your source, and closer to your visualizations.
  • CI/CD: Ensuring that your end data product is accurate before deploying updates.
  • A workflow: For debugging issues that arise, and for paying down tech debt by deprecating old tests.

What to Consider in Stage 5

At this stage, your team likely has some strong opinions on how testing should be implemented.

Consider contributing those back to the dbt community! Check out #testing in dbt Slack, or Sean McIntyre’s great discourse post on the Stage of Testing in dbt for ideas there.

You may even be ready to apply to speak at Coalesce!

And never forget! It’s okay to fail — tests do all the time.

When should you run tests? #

By Andrew Escay

Generally, we’ll cycle through 3 stages of testing for a project:

  • Build - Create a query to answer your outstanding questions.
  • Validate - Check whether the data is valid and accounts for known edge cases and business logic.
  • Debug - Incorporate any missing context required to answer the question at hand.

These steps are repeated as your data models evolve.

Step 1: use tests to create a trust for faster development

There are two core challenges associated with the “build” phase. The first is to confirm whether identified data sources are reliable. The second is to determine whether a new model could break existing components. Modifying a query that powers downstream dependencies like dashboards could have far-reaching implications.

How does testing help you build faster?

Testing can actually make the data build process faster in a few ways:

  • Speed up the feedback loop for validating data. If you have questions about table structure, e.g. “Does this orders table contain one row for every order?” testing eliminates the need to check those assumptions with stakeholders. Instead, users can create a .yml file and add relevant tests to assert structure.

  • Tests can serve as assertions for other developers. If a data object (table or view) carries many passed tests, developers can trust the data is safe to use. This allows the developer to focus on building models instead of repeating validations.

  • Estimate the impact of changes on future builds. If downstream models carry tests, users can run a command to check validity instead of spot checking each artifact.

Remember my cohort analysis story? I have a confession to make. We were actually already using dbt, with tests. This made continued development easy, and quick. Tests helped me know which sources and fields were trustworthy, so I could feel confident relying on them as I continued to build.

So if we had tests, why was that stakeholder review process still so painful?

Step 2: Create a testing “paper trail” for easy validation

My teammates had written tests for previously developed code, but, [confession number two inbound] I didn’t write tests for my own new code. Let’s rewind that scenario, because I’m sure it will look familiar to many readers:

After building initial models, I prepared for stakeholder validation with a few sample use cases. When my models failed to produce expected results, (as was often the case), the challenge became pinpointing the problem.“Which assumption was wrong? Which edge case wasn’t accounted for? What did I assume was an edge case… but isn’t?”

My only option? Review each step of every query to isolate the error. That meant literally sitting down with a stakeholder and reading every line of code to find the weird quirk in the data that I might have accounted for, but couldn’t pinpoint on the spot.

How does testing help you validate?

When deep in the build stage, it’s easy to forget assumptions made and validated. Testing while building a query creates a clear “paper trail” of validated assumptions– quick reference for you, and better insight into preparation methods for stakeholders.

Imagine an orders table that includes three statuses: completed, canceled, and deleted. These are the only possible order statuses today, but that could change in future (doesn’t it always?).

To prevent disruption, create an accepted values test on the orders.status field. This lets you check whether the values fall under one of the three known statuses. When you spot an issue later, like a new status that wasn’t originally accounted for, it’s easy to check assumptions for missing context. Tests make it easy to reference, isolate, and challenge assumptions when needed.

To prevent disruption, create an accepted values test on the orders.status field. This lets you check whether the values fall under one of the three known statuses. When you spot an issue later, like a new status that wasn’t originally accounted for, it’s easy to check assumptions for missing context. Tests make it easy to reference, isolate, and challenge assumptions when needed.

In the end, I would have always conducted a standard stakeholder review as part of my development process. But writing tests while building the new models would have helped that review go much faster, as I wouldn’t have had to break my development flow by writing up notes on the gotchas and tracking every edge case. My code, plus my tests, would have been the easiest and fastest ways to retrace assumptions.

Step 3: use test failures to simplify your debugging process

At this point you’ve presented the first cut to your stakeholder. But you may still have some assumptions that aren’t correctly addressed with your current set of work. Maybe you need to rewrite some code to account for new information about the data.

What makes this challenging (and time-consuming) is not knowing where to start!

If you created tests for your assumptions in step 2, then any wrong assumptions can be easily identified and resolved in step 3. While things can get tricky when new assumptions are surfaced, you can still use your existing tests to reduce sections of code requiring manual review.

My teammates on that early dbt project left a lot of tests, so building was easy. I failed to write my own while building, so validating was hard. Since debugging requires you to fix incorrect assumptions or account for new ones, not having any tests on my new work to use as starting points made this just as tedious.

How does testing help you debug?

Tests are a quick way to check whether past assumptions are no longer true. And test failures, rather than something to bemoan, can actually help you know where to begin. Reducing time to identify problem areas reduces total time spent debugging.

Testing won’t eliminate every stakeholder data quality question. But each discrepancy discovered contributes to the hardiness of your future testing framework!

Reducing opportunity for error also reduces time wasted in reactive review.