Data mesh architecture: decentralizing data management

Understanding data mesh architecture

Last edited on Jun 03, 2025

In today's data-driven world, organizations are continually seeking better ways to manage and utilize their vast amounts of data efficiently. Traditional data architectures, with their centralized control and monolithic structures, often struggle to keep pace with growing demands. This is where a data mesh comes into play—an approach that decentralizes data management and empowers individual teams to own their data and its lifecycle.

What is data mesh?

Data mesh is a new approach to data architecture. Rather than managing all data and data processing as a single monolith, it decomposes data into a series of data domains, each owned by the team closest to that data.

How data mesh changes data architecture

In a monolithic data management approach, technology drives ownership. A single data engineering team typically owns all the data storage, pipelines, testing, and analytics for multiple teams—such as Finance, Sales, etc.

In a data mesh architecture, business function drives ownership. The data engineering team still owns a centralized data platform that offers services such as storage, ingestion, analytics, security, and governance. But teams such as Finance and Sales would each own their data and its full lifecycle (e.g. making code changes and maintaining code in production).

The perks of moving to a data mesh architecture

Moving to a data mesh architecture brings numerous benefits:

It removes roadblocks to innovation by creating a self-service model that allows teams to create new data products
It democratizes data while retaining centralized governance and security controls
It decreases data project development cycles, saving money and time that can be driven back into the business

Since it has evolved from previous approaches to data management, data mesh uses many of the same tools and systems that monolithic approaches use, but exposes these tools in a self-service model combining agility, team ownership, and organizational oversight.

Join us at Coalesce, October 13-16: Learn how to build scalable data mesh architectures from data leaders at leading organizations.

The challenges of monolithic data management

The distance between the data processors and the data owners results in long load-test-fix cycles that delay data project delivery times. Additionally, the complexity of these systems means few people understand how all the parts work together.

None of these problems are new. But they’ve become more noticeable as data architectures grow more complex in scope and scale.

ETL and its drawbacks

Before ‌cloud systems, data was stored on-premises. Companies stored most of it in large online transaction processing (OLTP) systems built on a symmetric multi-processing (SMP) models like Teradata. Some companies — those who could afford it — stood up data warehouses. Built around an online analytical processing (OLAP) model and massively parallel processing (MPP), data warehouses gave data analysts, business analysts, and managers faster access to data for reporting and analytics.

To load data into data warehousing systems, IT or data engineering teams used Extract, Transform, and Load (ETL) processes; which cleaned and transformed the data into a state suitable for general use. Although this approach worked, it had numerous drawbacks:

The expense of building data warehouses on-premises prohibited many companies from leveraging the technology.
Most workloads had to go through the IT and (later) data engineering teams, which quickly became bottlenecks
These centralized solutions produced one-size-fits-all datasets that couldn’t meet the needs of every team’s use case
Many ETL pipelines were hard-coded and brittle, resulting in fire drills whenever one went down due to bad data or faulty code
Some teams, frustrated by these bottlenecks, formed their own solutions, resulting in data silos and the growth of “shadow data IT”

ETL to ELT

As dbt Labs founder Tristan Handy has written before, the emergence of Amazon Redshift changed the game. Cloud computing eliminated the need for massive up-front capital expenditures for large-scale computing projects.

The availability of on-demand computing power also changed the way we processed data. Because cloud computing was so affordable, many teams could move from ETL processes to an ELT—Extract, Load, Transform—model that transformed unstructured and partially structured data after loading it into a data warehouse. Because it leveraged the processing power of the target system and instituted a schema on read approach, ELT often proved more performant and more flexible.

From data warehouses to data lakes and data catalogs (where we are today)

Despite these advances, some of the old problems with data management continued to linger. Concepts such as version control, testing, and transparent data lineage were mostly unheard of. Thus, data warehouses were still primarily storehouses for structured data.

The cloud didn’t solve the data silo problem; if anything it helped data silos proliferate. That made it harder than ever for data teams and business users to find the data they needed.

Data lakes addressed the initial problem of storing unstructured data. Data catalogs became an increasingly popular method to search, tag, and classify data.

The architectural elements of a data mesh implementation

Data mesh is a decentralized approach to data management that enables individual teams to own their own data and associated pipelines. When done well, data mesh balances two competing yet important priorities: data democratization and data governance. It unblocks data domain teams from waiting on data engineering to implement their data pipelines for them, which enables faster time-to-market for data-driven products.

It also enables data engineering teams to be specific about which datasets are “consumption ready” and as a result, what standards they agree to meet for those data. Data mesh combines this independence with security and policy controls that prevent ‌data democracy from degenerating into a data anarchy.

With data mesh architecture, teams leverage a shared infrastructure consisting of core data as well as tools for creating data pipelines, managing security and access, and enforcing corporate data governance policies across all data products. This architecture enables decentralization while ensuring data consistency, quality, and compliance. It also allows teams to leverage centralized functionality for managing data and data transformation pipelines without each team re-inventing the wheel.

Central services

The central services component of a data mesh architecture implements the technologies and processes required to enable a self-service data platform with federated computational (automated) governance. It’s further subdivided into two areas: management and discovery.

Management

Management includes functions for provisioning software stacks necessary to process and store data. This software stack will form the data platform that will then be leveraged by various domain teams. Central services implements a solution that creates the resources a team needs to manage a new stack.

Self-service data stacks include a standardized set of infrastructure that each team can leverage. This includes storage subsystems (object storage, databases, data warehouses, data lakes), data pipeline tools to import data from raw sources, and ELT tools such as dbt. They also include tools for creating versioned data contracts so that teams can register and expose their work to others as a reusable data product.

Management also includes federated computational governance. It enforces access controls, provides tools for enabling and enforcing data classification for regulatory compliance, and enforces policies around data quality and other data governance standards. It also provides centralized monitoring, alerting, and metrics services for organizational data users.

Discovery

Because central services acts as a clearinghouse for managing an organization’s data, it also serves an important discovery function. Users can use a data catalog to search organizational data and find both raw data and data products that they can incorporate into their own data sets.

Producers (data domains)

The producers represent the collection of data domains owned by data teams. Architecturally, producers make use of the stacks provisioned for them by centralized services. The producers leverage one or (usually many) more data sources through data pipelines to create new data sets.

The output of the producers’ work is one or more data products. A data product exposes a subset of the producers’ data that other teams can leverage in their work. Each data product may have a contract that specifies the structure of the data it exposes, as well as access policies that control who can see what data and code.

Consumers

Consumers take the output from the producers and use it to drive business decisions. Consumers can be salespeople or decision-makers developing BI reports, analytics engineers further refining data for data analytics, data scientists building machine learning or AI models, or others.

Additionally, producers and consumers often overlap. A team can be a consumer of one team’s data while also producing data that another team uses. Because every team publishes its output as data products that others can discover through the central data catalog, it’s easy for teams to build a web of connectivity between each other. This is what puts the “mesh” in “data mesh.”

How the pieces fit together

With all these pieces in place, a workflow between these architectural components emerges.

Data governance leaders—a combination of business stakeholders and members of the data platform team—define policies for data governance and data quality.
Centralized services then implement support for self-service data products and federated centralized governance, enforcing data governance policies through code.
From there, data producers use the self-service data platform to create a new stack they can use to create a new data product, using the data catalog supported by central services to find other data and data products.
Once ready, producers publish the initial version of their data product to the data catalog, where consumers and other producers can find it.
Consumers find and utilize data products either as an end product (reports) or as input to another process (a machine learning model).
As data and business requirements evolve, data producers release new versions of their data products with new contracts to preserve backwards compatibility.
Consumers and other producers receive alerts about the new version of the contract for the data product they’re consuming, and update their workflows to use this new contract before the previous version expires.
Centralized services and data governance leaders work to onboard more teams to the self-service data platform and use metrics on data quality and usage from the data catalog to measure progress towards KPIs.

What data mesh brings to modern data architecture

These components of data mesh all work together to bring a number of benefits to your existing data architecture.

Scalability

Scalability comes from two areas. First, it comes from the self-service data platform. In a monolithic data management architecture, employees who wanted a certain report or data set created would have to send the request to a central data engineering team. That inevitably results in large backlogs and delays. With a self-service platform, data producers can receive the capability they need to create new data products automatically.

Second, scalability comes from the data producer layer itself. Each team (and possibly each data product) can request the computing resources it needs to store, transform, and analyze data. Each data domain, architecturally, runs as its own separate data processing hub.

Increased trust in data

The centralized services layer supports a data catalog that enables all data producers to register their data products and data sources in a single location. The data catalog serves as the single source of truth within the company. This enables producers to own their own data domains while the company enforces data quality and classification standards across all owners.

Through the data catalog and other data governance tools, the data governance team can quantify and track the quality of data across the company. For example, it can report statistics on the accuracy, consistency, and completeness of the data it monitors, as well as produce reports on how much of the company’s data is properly classified.

Finally, because data domain teams own their own data, they can ensure that it’s kept up to date and that its structure reflects the changing realities of their business. All of these factors lead to an increased trust in data as companies move closer to a data mesh architecture.

Greater reliability and reduced rework

The self-service data platform also helps enforce uniformity across data domains through the use of contracts for data products.

One of the primary sources of data issues is disruptions caused by sudden and unexpected changes in the format of data. By enforcing the need for data contracts on data products, producers can alert consumers to upcoming breaking changes. This improves reliability across the data ecosystem. That saves everyone involved time and further increases confidence and trust in the company’s data.

Faster deployment of data products

The increased scalability, increased trust in data, and greater reliability of data mean teams can bring new data products to market faster.

One of the largest obstacles to launching new data products is finding trustworthy data. In a 2022 study by Enterprise Strategy Group, 46% of respondents said that identifying the quality of source data was an impediment to using data effectively. By building increased trust in data, companies can empower data domain teams to ship innovative ideas in less time.

Data mesh uses new and existing data management technologies to create a distributed, federated approach to managing data at scale. Understanding what each layer contains and how each one works together gives you a roadmap for transitioning to the next evolution of modern data architecture.

The dbt Approach to using a data mesh architecture

dbt transforms raw data into trusted analytics-ready models through code-first methods. dbt integrates with data mesh by enabling domain teams to own their data products. The platform supports decentralized ownership while maintaining governance standards.

Ready to learn more about data mesh with dbt? Try these resources:

Get started in dbt

Join the analytics engineers building data infrastructure that actually scales.

Install dbt Wizard CLI

Get started with an agent purpose-built for analytics engineering. It knows which tool to call, which context to pull, and checks its own work before surfacing anything to you.

Install dbt Wizard CLI

Latest posts

Insights16 min

The semantic debt crisis no one is talking about

Dustin Dorsey

on Jun 22, 2026

Pulse9 min

Start fresh, don't lift and shift: a dbt migration guide

Daniel Poppy

on Jun 16, 2026

Pulse7 min

The analytics engineer in 2026: system designer, governance owner, AI context provider

Daniel Poppy

on Jun 16, 2026

The dbt Community

Join the largest community shaping data

The dbt Community is your gateway to best practices, innovation, and direct collaboration with thousands of data leaders and AI practitioners worldwide. Ask questions, share insights, and build better with the experts.

Join the CommunityExplore the community

100,000+active members

50k+teams using dbt weekly

50+Community meetups

Understanding data mesh architecture

What is data mesh?

How data mesh changes data architecture

The perks of moving to a data mesh architecture

The challenges of monolithic data management

ETL and its drawbacks

ETL to ELT

From data warehouses to data lakes and data catalogs (where we are today)

The architectural elements of a data mesh implementation

Central services

Management

Discovery

Producers (data domains)

Consumers

How the pieces fit together

What data mesh brings to modern data architecture

Scalability

Increased trust in data

Greater reliability and reduced rework

Faster deployment of data products

The dbt Approach to using a data mesh architecture

Get started in dbt

Install dbt Wizard CLI

Share this article

Latest posts

The semantic debt crisis no one is talking about

Start fresh, don't lift and shift: a dbt migration guide

The analytics engineer in 2026: system designer, governance owner, AI context provider

Join the largest community shaping data