Optimizing your dbt project: Proven methods and strategies for efficiency and performance from Coalesce 2023

Co-founders of SELECT, Ian Whitestone and Niall Woodward, explain how to optimize your dbt projects.

"We don't want people to be spending hours optimizing a model that's only going to save you five bucks a month."

- Niall Woodward, Co-founder of SELECT

Co-founders of SELECT, Ian Whitestone and Niall Woodward, explain how to optimize your dbt projects. They discuss the importance of understanding your data warehouse's architecture and billing model, tackling low-hanging fruit, measuring performance and costs, and implementing optimization strategies.

Optimizing dbt projects can improve the developer experience, reduce latencies, and lower cloud computing costs

Ian and Niall emphasize the importance of optimizing dbt projects. They point out that an efficient dbt setup improves the developer experience and significantly reduces latencies, leading to faster insights. Furthermore, an optimized dbt project can result in lower cloud computing costs.

"An efficient dbt setup improves the developer experience, which is a great thing," says Niall. He explains that "for the consumers of our projects, it also means lower latencies and faster insights." When it comes to cost-saving, he adds, "Finally, an optimized dbt project can lower cloud computing costs, which is quite a hot topic."

Understanding warehouse architecture and billing is fundamental for effective dbt optimization

Naill and Ian point out the importance of understanding warehouse architecture and billing for effective dbt optimization. dbt runs SQL queries, so optimizing dbt involves optimizing dbt's configuration, the SQL in those models, and the configuration of the warehouse where the SQL is running.

"dbt is ultimately executing SQL in your data warehouse and creating a bunch of different objects in it. So, the first thing you need to do is understand how that data warehouse works," Ian states.

Ian also mentions that "if you're focused on cost optimization, you have to understand how you're actually billed for that." He elaborates, “...with Snowflake in particular, you pay for the amount of time that the virtual warehouses that your queries are running on [are] up. What can cause a virtual warehouse to stay up is your model takes a long time to run, or it's running really frequently. The frequency one is something that people often overlook."

Using appropriate materialization strategies can significantly enhance dbt project performance and reduce costs

Ian and Naill discuss the importance of using the right materialization strategies to enhance the performance of dbt projects and reduce costs. They highlight the three most used materialization strategies: view, table, and incremental.

"dbt provides a bunch of different options out of the box. We're going to limit today's focus to the big three that we see out in the wild the most: view, table, and incremental," Ian mentions. He elaborates, "View should be used for very lightweight queries... Table materialization is quicker to develop and easier to maintain... Incremental models are harder to develop, but they are much faster and more efficient as you only process new data."

Insights surfaced

  • Understanding your data warehouse's architecture and billing model is crucial in optimizing your dbt project
  • It is important to tackle low-hanging fruit such as warehouse configuration, dbt configuration, and deleting unused models
  • Regularly measuring your performance and costs can help identify areas for optimization
  • Implementing optimization strategies such as query pruning and simplifying complex views can significantly improve performance and reduce costs
  • Products like SELECT can help monitor and optimize dbt usage