Data mesh concepts: What it is and how to get started
A data mesh is a decentralized data management architecture consisting of domain-specific data. Instead of having a single centralized data platform, teams own the processes around their data.
Data mesh is built around four core principles:
- Data domains: Manage data according to its role in the business, not its technology.
- Data as a product: Treat data as a deliverable with testing, versioning, and contracts.
- Self-serve data infrastructure: Enable business teams to create new data products and reports without creating a backlog for the data engineering team.
- Federated computational governance: Improve data quality and monitor security and compliance uniformly across a distributed data network.
Grasping data mesh concepts and the mechanics of data mesh requires a new way of thinking. We’ve broken down the key concepts for you below. Along the way, we share our guidance, based on direct work with our own customers, on when—and how—you should shift to a data mesh architecture.
The benefits of data mesh
“Since shifting to data mesh with Snowflake, 5.5x more people are using data across the business on a regular basis.” - Abhi Sivasailam, Head of Growth and Analytics, Flexport
Moving to a data mesh architecture confers numerous benefits over traditional approaches to data management:
- Scalability. Traditional approaches create unnecessary roadblocks by funneling all data projects and requests through a single team. By returning data ownership to its owners, domain data teams can create new data products without waiting on an overwhelmed data engineering team.
- Faster time-to-market. When data teams own their own data pipelines and business logic, they can deliver new solutions in less time than if they had to hand the full implementation over to a centralized data engineering team.
- Better data quality. Teams understand their data better than anyone. Yielding data quality decisions back to the data domain owners results in better decision-making and better data quality across the organization.
- Better governance. Federated computational governance helps prevent the creation of data siloes and ensures that all data in the organization is properly secured and governed. Federation sets standards for sensitive information and data quality for all teams. Automated policy enforcement reduces the manual labor required to remain compliant with the growing, complex body of data regulations worldwide.
Data mesh concepts
Have you struggled to scale your data projects or keep up with data governance as your demand for data processing scales? If so, a data mesh may help you break through your current plateaus and ship new data products faster.
In a data mesh framework, teams own, not only their own data, but also the data pipelines and processes associated with transforming it. A central data engineering team maintains both key data sets and a suite of self-service tools to enable individual data ownership. Domain-specific data teams then exchange data via well-defined and versioned contracts.
In this article, we discuss the circumstances that led to the birth of data mesh - namely, siloed data, monolithic data solutions, and brittle data processing systems. We then discuss the basic principles of data mesh, the teams needed to support it, and the benefits you can reap from a successful implementation.
Data mesh architecture and implementation
With data mesh architecture, teams leverage a shared infrastructure consisting of core data and tools for creating data pipelines, managing security and access, and enforcing corporate data governance policies across all data products. This architecture enables decentralization while ensuring data consistency, quality, and compliance.
If this seems more complex than “just stick everything in the one big database”…well, it is. But there’s a good reason for the complexity.
We look in detail at how companies managed Big Data in the BC (Before Cloud) era, how Hadoop moved the needle, and the arrival of Redshift, Snowflake, and the modern data stack. Finally, we discuss how data mesh builds on the successes of these technologies while solving for their shortcomings.
In a monolithic data management approach, technology drives ownership. A single data engineering team typically owns all the data storage, pipelines, testing, and analytics for multiple teams—such as Finance, Sales, etc.
In a data mesh architecture, business function drives ownership. The data engineering team still owns a centralized data platform that offers services such as storage, ingestion, analytics, security, and governance. But teams such as Finance and Sales would each own their data and its full lifecycle (e.g. making code changes and maintaining code in production).
Up until now, we’ve covered the what and the way of data mesh. In this article, we dig into the how. We show how the major pillars of a data mesh architecture - central services, data producers, and data consumers - fit together into a unique solution.
Should your company move to data mesh? If so, when? Can I use my existing toolstack? What about governance?
Data mesh architecture is a new approach to data management that combines data decentralization with federated computational governance. Done well, it improves data quality, reduces time to market, and saves money. But it can be difficult for everyone to wrap their heads around the required changes.
This article answers the key burning questions our own customers have asked us about data mesh architectures.
The four principles of data mesh
Data mesh is defined by four principles: data domains, data products, self-serve data platform, and federated computational governance. These principles can be hard to grasp at first, especially since they interact and overlap. This article provides a high-level overview of each principle and how they interconnect.
The key idea driving data mesh is that data owners should own their data. (Crazy, right?) But how do you implement them?
In data mesh, this is done through data domains. A data domain is a logical grouping of data, often source-aligned or consumer-aligned, along with all of the operations that its objects support.
Data domains parcel out data schemas into self-contained definitions owned and maintained by the business team that owns the data. The team owns the data storage and all processes - generation, collection, data pipeline transformations, APIs, reporting, etc. - that accompany it. Its output is a data product - a data container or a unit of data that directly solves a customer or business problem.
In this article, we discuss the three types of data domains, the benefits of a distributed data network, and how data domains work in practice.
As a deliverable, a data product can be as simple as a report or as complex as a new Machine Learning model. Data products will also contain any metadata required for consumption, such as API contracts and documentation.
In this article, we see how to use technology such as contracts, versioning, data catalogs, and testing to manage data and prevent all-too-common problems such as downstream breakages.
In federated computational governance, each data domain team continues to own its data (plus associated assets and processes). Each team is also required to register its data products with one or more data governance platforms. The data governance platforms, in turn, run automated data governance policies that ensure all data products conform to organizational standards for quality and compliance.
Learn how data mesh can eliminate data siloes while also improving security, compliance, and data quality.
Most data engineering teams are overwhelmed with requests. That leads to burnout among data engineers. It also creates frustration with data producers and consumers, who have to wait weeks - or even months - for action on their data requests.
A self-serve data platform is a data platform that supports creating new data products without the need for custom tooling or specialized knowledge. It’s the technical heart of data mesh, enabling data domain teams to take ownership of their data without unnecessary bottlenecks.
In this final installment, we discuss how a self-serve data platform eliminates these bottlenecks through centralized tooling that automates basic data tasks. We also enumerate the capabilities of a self-service data platform and how you go about implementing one.
Download our comprehensive guide to data mesh to gain a rich understanding of everything you need to begin your own data mesh journey. In this guide, you’ll learn:
- Problem-solving with data mesh: Learn about the challenges that data mesh solves, including data bottlenecks, inefficiency, and the loss of context in traditional centralized systems.
- The core principles of data mesh: Dive deep into the foundational elements of data mesh. Understand how domain-oriented decentralized ownership, data as a product, self-serve data infrastructure, and federated computational governance can transform your data ecosystem.
- Crafting your data mesh: Apply data mesh principles in practice to build a mesh architecture tailored to your organization’s unique needs and goals.
Last modified on: Jan 17, 2024