Demystifying Data Vault with dbt from Coalesce 2023
Alex Higgs of Datavault explains the value of Data Vault, a system of business intelligence and data warehousing.
"The business itself is growing, and they need a way for their data solution to be able to grow with them…or hopefully stay ahead."
Alex Higgs, Senior Consultant Data Engineer at Datavault, explains the value of implementing Data Vault, a system of business intelligence and data warehousing. He discusses the history, fundamentals, and practical applications of Data Vault, including its integration with dbt.
Data Vault is a scalable, pattern-based solution for businesses
Alex highlights the value of Data Vault–a scalable, pattern-based solution for businesses dealing with large volumes of data. He emphasizes that this method has been around since the early 2000s, and its usage has grown steadily.
Alex explains that Data Vault is agile because businesses can start small and build it out as they grow. "As businesses grow, they have more data to integrate from new branches…new systems…even new services," he states, adding that without a scalable solution like Data Vault, companies could accrue significant technical debt.
Alex also expands on how Data Vault integrates business keys, forming a data model of core business concepts. He elaborates, "Hubs are your list of unique business keys... links are rightfully named because they help you relate those concepts that are contained in the hubs…”
dbt and Data Vault are a match made in heaven
Alex believes that dbt and Data Vault work exceptionally well together, saying it's "a match made in heaven." He highlights how dbt's ability to handle multiple threads complements Data Vault's need for parallel loading of data.
"The parallel loading feature of Data Vault fits perfectly with dbt. Because we've got multiple threads in dbt–if we're loading, say, 10 hubs, 10 satellites, a couple of links–we can do all of that in parallel." He further emphasizes that through the use of macros in dbt, an entire Data Vault warehouse can be generated.
Data Vault needs proper implementation to be effective
"Start small, grow it out, [and] prove incremental value along the way."
Alex emphasizes the need for proper implementation of Data Vault to be effective. He warns against a common mistake: building a data model around the data rather than around the business, leading to a proliferation of unnecessary tables.
"Rather than modeling your Data Vault from the business's point of view, you've modeled it just using the data... and so you end up with this massive explosion of way too many tables," he explains. To avoid this and other pitfalls, he advised starting with a small, focused subset of a problem and proving incremental value along the way.
Alex’s key insights
- Data Vault was built for large-scale data and is capable of handling petabyte data scales and more
- Data Vault is agile, allowing businesses to start small and scale up as they grow
- The system is effective in dealing with technical debt, data silos, poor governance, and lack of access to data
- The implementation of Data Vault with dbt allows for automation and scalability