dbt

Hybrid query execution: Understanding database clients and using DuckDB for efficient query processing from Coalesce 2023

Jordan Tigani, Co-Founder at MotherDuck, explains the importance of database architecture in analytics.

"The lakehouse architecture was really an attempt to solve a lot of the problems that were in the data lake."

Jordan Tigani, Co-Founder & Chief Duck-Herder at MotherDuck, explains the importance of database architecture in analytics and data understanding. Jordan discusses the evolution of database architecture, the rise and fall of Hadoop, the transition to cloud data architecture, and the future of hybrid execution in database systems.

The importance of architecture and abstractions in data analytics

Jordan emphasizes the significance of architecture and abstractions in the field of data analytics. He highlights the rise and fall of Hadoop, attributing its decline to its incorrect architecture and abstraction, rather than any operational missteps.

"So, once upon a time, Hadoop was the number one skill you could have on your resume to get you the highest salary... Now, it's basically gone," says Jordan. He believes that cloud data architecture has taken over due to its numerous advantages, such as separate storage and compute, transition from row stores to column stores, and storage in an object store.

Jordan also states, "The lakehouse is kind of becoming a data warehouse.” He argues that moving from dealing with files to dealing with tables simplifies the interaction with data, making it easier to handle local data, mix local and remote data, and do fast scans by farming out tasks to multiple machines.

The evolution and potential of hybrid execution

Jordan explores the potential of hybrid execution in data analytics. He explains the process of hybrid execution and suggests that it could lead to faster scans, cross-region joins, and even the reversal of the normal query direction. He also proposes several potential uses for hybrid execution, including local edge caching, and disaggregating data warehouses.

According to Jordan, you can also use hybrid execution for "cross-region joins.” This would mean not worrying about where data lives, as the system could optimally move data as needed.

The future of data analytics

Jordan envisions the future of data analytics, considering both the likely evolution of technology and the potential applications of new methods. He suggests that careful consideration of architecture and abstractions, as well as embracing new methods such as hybrid execution, are crucial for the development of the field.

"Architecture matters. Abstractions matter," says Jordan. He expresses his hope that data practitioners think about the architecture that will lead to the next phase in data analytics.

Jordan also touches upon applications. According to Jordan, "If somebody is building an application [and] they need to expose data to their end users, being able to do this hybrid execution thing—where you can actually run and push work down into the browser, is really super powerful." He believes that the future of data analytics could also involve greater integration with end-user applications.

Insights surfaced

  • Hadoop, once a highly valued skill, has fallen out of favor due to its slow speed and difficulty of use. Jordan argues that the real issue was that it was the wrong architecture and abstraction
  • Cloud data architecture has largely taken over, with changes in how storage systems work and transitions from row stores to column stores
  • Data lakes, while beneficial in many ways, have challenges including difficulty in management and lack of governance
  • Lakehouse architecture attempts to solve many issues found in data lakes
  • Hybrid execution, where data operations can be executed both locally and remotely, presents a future direction for database systems