Implementing shift-left governance in your dbt stack: Practical application of data contracts and the role of metadata from Coalesce 2023
Shirshanka Das, CTO for Acryl Data, discusses implementing modern data governance using dbt, Data Hub, and data contracts.
"Just like how we apply CI/CD to software engineering, we should be applying the same principles to data governance."
Shirshanka Das, CTO for Acryl Data, discusses implementing modern data governance using dbt, Data Hub, and data contracts. He focuses on how to use these tools to ensure data quality and integrity, while also maintaining the speed of data production. He also explores the concept of "shifting left" on governance initiatives, meaning integrating governance practices into the data creation process itself.
Shifting left on data governance initiatives with dbt
Shirshanka describes the implementation of dbt in shifting left on data governance initiatives. He emphasizes the importance of establishing context, maintaining it, and acting on it via automations, as the basis for a continuous feedback loop.
"We think about data, the data plane, as being composed of everything that you're assembling in your data stack...all of these tools are either producing data continuously, metadata continuously, or getting metadata pulled from them continuously, and all this metadata gets pulled into the central metadata graph," says Shirshanka.
He explains, "The Acryl engine essentially continuously runs these tests against the metadata as it's changing…you can set automations that include things like sending Slack alerts or marking this thing deprecated."
The importance of technical metadata in data governance
Shirshanka stresses the value of technical metadata in data governance. Technical metadata, he explains, is continuously evolving and producing essential, useful information for data governance. The control plane is a technical metadata graph that can take automated actions instantly, providing a unified view across all data sources.
"Technical metadata that's coming out of these data sources is truth, but it has not yet been harmonized...there's a continuous monitoring and refinement loop that the control plane actually enables," Shirshanka explains.
He further adds, "The control plane should be a unified view across all of your data sources so that even your humans, as well as your programs, can actually take automated actions."
The concept of data products and their benefits
Shirshanka introduces the concept of data products, which allows you to combine all of your individual data assets and tier one assets into a unified data product. This can facilitate conversations with business stakeholders and provide a comprehensive overview of what has been produced in a certain period.
"We have the concept of ‘data product.’ So that allows you to combine all of these individual data assets and tier one assets into a unified data product that you can then have a conversation with your business stakeholders about," says Shirshanka.
He goes on to highlight, "When you write up your quarterly report, and you're like, ‘What did I produce this quarter?’ You can basically point them to the data product page, and that includes all of this information in one go."
The integration of dbt and data governance
Shirshanka discusses how dbt can be integrated in modern data governance. By connecting dbt with other data stacks, he proposes that data governance can be made continuous, federated, and driven by central standards. This enables data teams to work in collaboration, maintaining a unified view across all data sources.
"That’s what the whole loop looks like, all the way from your dbt ecosystem monitoring it, applying principles, policies, and then being able to reflect those back into the source systems where they belong," Shirshanka states.
He affirms, "Our goal really is to allow everyone to maintain velocity…and let's not ship low quality assets."
The contribution of metadata in continuous metadata-driven governance
Shirshanka delves into the impact of metadata in continuous governance. He explains that the Acryl engine runs tests against the metadata as it changes and sets automations. These automations can include sending Slack alerts or marking a thing as deprecated, which creates a continuous feedback loop.
"Metadata tests are essentially configurable workflows that you can define, that are continuously running and evaluating conditions on the metadata graph," he explains.
Shirshanka adds, "That's our vision for continuous governance. A lot of these features already exist in the product, so it'll be pretty cool to get your input and your feedback, because we’re building this as a collaborative standard with the rest of the community."
Insights surfaced
- Traditional data governance methods are broken due to the fragmented nature of modern data stacks
- Integrating governance practices into the data creation process, or "shifting left,” can help maintain data quality while keeping up with the speed of data production
- Acryl, a commercial version of Data Hub, provides a unified view across all data sources and enables automated actions based on metadata changes.
- Acryl integrates with dbt and Data Hub, allowing for continuous monitoring and refinement of data
- The concept of "continuous governance" involves establishing context, maintaining context, and acting on it, which can be achieved using Acryl