Getting started with data quality management

on Apr 08, 2025

Poor quality data has a domino effect that negatively impacts decision-making, compliance, operational efficiency, and trust. By contrast, working actively to improve data quality gives stakeholders the confidence to rely on it to make critical decisions and power new business ideas, such as AI apps.

Achieving - and maintaining - a high level of data quality requires an active commitment from both data producers and consumers. In this article, we’ll introduce the principles of data quality management, discuss how to implement data quality management in practice, and look at some tools your teams can leverage in their day-to-day data workflows.

What is data quality management?

Data quality management is an organization's set of policies, practices, and methods to ensure its data is accurate, complete, and trustworthy. Ideally, these data quality management standards are a part of a larger data governance framework that sets criteria for data quality and consistency across the entire company.

The outputs of a data quality management process may include artifacts such as:

Data retention policies

How long to hold data and what happens to it after that time period. These policies will take into consideration, not just the business value of the data but compliance with applicable laws and regulations.

Data formatting standards

How to encode and represent data across the organization. This can include general rules (e.g., date formats) as well as guidelines for company-specific data (e.g., uniform customer IDs, acceptable ranges of values for a specific field). Data formatting standards help prevent downstream pipeline breakages and make data synchronization easier.

Data tests

Code that checks a data set for various attributes of data quality. A data test might ensure, for example, that all of the required fields in a record are specified, that individual fields are formatted correctly, and that no anomalies exist between records.

Data quality metrics

Data quality metrics provide a quantitative measure of data quality, enabling your organization to identify gaps and improve quality over time. Examples of data quality metrics include total number of data incidents, time to data incident detection, time since last data refresh, and number of passed/failed data tests for a table, among others.

Data quality management in practice

Data profiling

Assesses the structure of your organization’s data and how each table and field relates to the others. This often takes the form of:

A repository of data models describing data sources, data destinations, and data transformation rules.
Data lineage is typically represented as a Directed Acyclic Graph (DAG) that shows how both tables and columns relate to one another.

A data profile gives you a complete picture of the data you own and how it flows throughout your organization. Data engineers can leverage it to identify the root causes of data quality issues and fix them at their source.

Data transformation

Corrals your data into tables and views that meet the business requirements of data consumers. Transformation also involves cleaning and formatting data to fit your organization and team’s data quality policies as outlined in your data governance framework.

Data validation

Uses automated and manual validation to check data for its overall accuracy and sensibility. (For example, ensuring that an Age field never has a negative value.) Data validation ensures that the work done during the transformation phase is correct and that the data is free of obvious defects.

Metadata development

Identifies characteristics such as table owner, date data was last modified, a description of the data and how it was calculated, related data tests, etc.

By capturing its current status and business purpose, metadata enables better data discovery and usage. This helps you reduce “dark data,” or unused data because no one can find it or validate its meaning.

Monitoring and reporting

Tools and alert systems that track effectiveness and proactively monitor for errors. This will include data quality metrics dashboards and data usage statistics. Data engineering team members can set up ongoing testing on production data and issue alerts immediately upon detecting a data anomaly.

Implementing data quality management

Implementing data quality management requires a combination of processes and tools.

Processes ensure that all appropriate stakeholders—both technical and business line leaders—are involved in data quality management. In particular, processes need to involve data domain owners to validate that data conforms to business requirements and outcomes.

Tools support key elements of the data quality management process, including creating data transformation pipelines and automating data quality standards, testing, and metrics collection and reporting. Both processes and tools are necessary to implement data quality management at scale.

Analytics Development Lifecycle (ADLC)

Approaches such as the Analytics Development Lifecycle (ADLC) unite processes and tools into a united framework that enable data producers and consumers to improve data quality via short, rapid development cycles.

Using an ADLC approach, companies can identify important data quality use cases and address them through iterative improvements.

Data quality management life cycle

A typical data quality management life cycle will include:

Plan

Technical and business stakeholders work to identify existing data quality issues and prioritize them. For example, the team might identify that duplicate records in sales data are preventing an accurate analysis of purchasing trends.

After identifying use cases, the team will establish procedures for reconciling records, preventing duplicates, and creating a data set that accurately reflects the current business reality. The team should also establish metrics to monitor and verify correctness (e.g., less than x% duplicates tested in the final data set, % data test pass rate).

Develop

The data engineering team will then create data transformation pipelines that produce a clean data set in line with data consumer’s requirements. They’ll also create tests to run against both pre-production and production data to verify the quality of the output.

Test and deploy

The data engineering team checks all of its data transformation and testing code into source control, using Pull Requests (PRs) to review data code changes internally before deployment. It also implements a Continuous Integration and Continuous Deployment (CI/CD) process to test data quality management code in a pre-production environment before releasing to production.

Operate, observe, discover, and analyze

Data consumers use the new, clean data set to create reports and data-driven applications. Along the way, they report any identifiable issues back to the data engineering team for fixing. Simultaneously, the data team tracks metrics and alerts to identify potential issues before they result in report or application downtime.

All teams involved in the ADLC continue iterating over this cycle, delivering new data quality use cases with every release.

Automating data quality management with dbt

Implementing the processes and tools required for an effective data quality management program takes time. It takes even longer if you have to build all of your tooling and pipelines from scratch.

dbt Cloud offers a host of features that significantly reduce the time and effort required to ship high-quality data. These include:

Transformation: Create models that import data from multiple sources, cleaning and transforming them into new data sets that are ready to use.
Documentation: Add descriptions directly to data models. Publish new documentation automatically with every push to production, providing other data users with detailed information on the origin and meaning of your data.
Testing: Leverage built-in tests (e.g., not-null checks) and create custom tests that implement quality tests specific to each data domain.
Version control integration. Check data models, transformation, and tests into source control to ensure all changes are tracked and reviewed. Isolate in-development changes in branches so that data engineers can work freely on new features or fixes without affecting the current state of production.
Job scheduling and orchestration: Regularly run your dbt models and tests to bring data changes into production and continuously perform data quality checks. Unlike other tools, dbt Cloud easily enables automating data imports and testing in a single data pipeline.
CI/CD support. Automatically run jobs fro your Git provider based on check-in or completed PRs. Test changes in a pre-production environment before releasing to users.
Data cataloging: Data producers and consumers alike can use dbt Explorer to find existing data sets and related documentation, as well as trace data lineage to verify the origin of data and troubleshoot upstream data issues.
Monitoring and dashboards: Monitor metrics and fire alerts in response to dbt Cloud test failures.

Leveraging these tools in dbt Cloud, your team can build a robust data quality management process in a fraction of the time it’d take to build from scratch.

Learn more about how dbt Cloud can kickstart your data quality management journey—contact us today for a demo.

What is dbt? : Watch our co-founder, Drew Banin, explain dbt in a quick 2-minute video.

Published on: Jul 11, 2024

Don’t just read about data — watch it live at Coalesce Online

Register for FREE online access to keynotes and curated sessions from the premier event for data teams rewriting the future of data.

Secure your spot now

Set your organization up for success. Read the business case guide to accelerate time to value with dbt.

Read now

Latest posts

Community6 min

Here’s why you (and your team) should attend Coalesce 2025

Daniel Poppy

on Aug 05, 2025

Learn16 min

From stored procedures to dbt: A modern migration playbook

Kathryn Chubb

on Aug 04, 2025

Product4 min

The dbt Fusion engine public beta is now available on Redshift

Azzam Aijazi