dbt

Understanding Lakehouse architecture: A unified platform for analytics engineering with Databricks and dbt from Coalesce 2023

Ken Wong from Databricks and Samuel Garfield from Retoolis discuss the benefits of using Databricks and dbt together.

"It becomes very very clear that with the help of open source...AI is not going to be reserved for the elites."

- Ken Wong, Senior Director of Product Management at Databricks

Ken Wong, Senior Director of Product Management at Databricks, and Samuel Garfield, Analytics Engineer for Retoolis, discuss the benefits of using Databricks and dbt together. They highlight how many companies are attempting to become data and AI companies and that Databricks with dbt makes this transformation easier and more efficient. They also discuss the key attributes of a unified Lakehouse architecture and share a customer's perspective on dbt and Databricks.

The rapid development of AI and open-source technology democratizes access to AI

Companies today aim to be data and AI companies, driven by the rapid evolution and democratization of AI in the open-source community. Ken notes that the progress in AI technology is staggering and that it’s being democratized at an incredible speed. New announcements are made weekly that lower the barriers for organizations to leverage these technologies.

"Every company is a data company’ is a bit of an old trope at this point, but over the last six months, it's no longer enough for companies to be a data company. Everyone now wants to be a data and AI company," Ken says. "The amount of progress that we've seen in just the last year alone has been staggering. The velocity at which not only generative AI has advanced, but also more incredibly, the velocity at which this technology has been democratized, especially in the open-source world, is just mind-blowing."

Ken adds that the barrier to becoming a data and AI company won't be AI, as everyone will be able to do it. Instead, the barrier will be getting a handle on your data, emphasizing that data engineering is messy and requires help from frameworks like dbt to manage it smoothly at scale.

The introduction of the Lakehouse architecture solves the bifurcated data stack issue

Ken explains the issue of bifurcated data stacks, where the data platform used to support business intelligence use cases is fundamentally different from the one used to support AI. Data is split between these two worlds, increasing the cost and complexity of running these systems. The solution to this problem was the introduction of Lakehouse architecture.

"The fundamental source of all this complexity is something that dbt can't quite help us with, and that's the fact that most organizations have what we call a bifurcated data stack. Typically, the data platform that you use to support your BI use case is a fundamentally different one than the one that you use to support AI," Ken explains.

He adds, "A big part of this is just being able to leverage the same compute and storage infrastructure and eliminate the need to replicate data."

Unifying AI and BI under a single roof is a key attribute of true Lakehouse architecture

Ken emphasizes that true Lakehouse architecture should be able to bring AI and BI under a single roof. This involves offering a unified user experience for different personas within an organization.

"The very first attribute is the ability to bring AI and BI under a single roof…But what it also means is offering a unified user experience that addresses the needs of different personas inside your organization to make it easier for them to collaborate together," explains Ken.

He also mentions the importance of having a unified governance layer, saying, “...there’s one last key key key aspect to a unified Lakehouse architecture and that's a unified data governance layer. You really cannot unify your data stack if your solution for metadata management and governance is fragmented."

Ken and Samuel's key insights

  • The combination of Databricks and dbt allows companies to unify their data and approach to analytics engineering
  • The adoption of Lakehouse architecture can unify AI and BI under a single roof, deliver the scale and performance needed for AI and BI workflows, and provide a unified data governance layer
  • Databricks and dbt together can offer a unified platform for data and a unified way of approaching analytics engineering
  • Companies can reduce the complexity of their systems and create a single view of their customer with the help of Databricks and dbt
  • The combination of dbt and Databricks can help in optimizing spend and making the analytics process more productive