dbt

Managing dynamic and expanding self-service dbt projects at Whatnot from Coalesce 2023

Alice Leach, data engineer from Whatnot, discusses managing a rapidly growing dbt project and how her team overcame various challenges.

"Give people the power to do what they need to do, trust them to make the right choices, and know that because you don't touch the source data you–can always rebuild it."

Alice Leach, data engineer from Whatnot, discusses managing a rapidly growing dbt project. She also covers the challenges her team faced, the solutions they implemented, and future plans for their project.

Fast growth and effective management in dbt projects

In the rapidly evolving world of data, managing a growing dbt project is crucial. Alice details her experience in managing a dbt project that has grown rapidly within a short period. She speaks about the challenges involved, particularly regarding long-running queries, complex model dependencies, rapidly evolving source data, and project bloating due to many developers contributing.

"We've hit against a lot of long-running queries… not only due to inefficient SQL but also due to the fact that our data sets have expanded significantly," Alice explains. She also emphasizes that the rapid changes in source data by their software engineering team were a significant factor–requiring guidelines and guardrails to keep the project on track.

The importance of guidelines and guardrails in dbt projects

Establishing guidelines and guardrails is essential in managing a rapidly growing dbt project. This ensures that contributors adhere to best practices and that the project remains organized and efficient.

"We break our CI into three critical sections at Whatnot: Does the bundle run, are project requirements met, and is the model readable?" Alice explains. These guidelines, she notes, help prevent contributors from making mistakes that could significantly affect the project.

"The final guardrail that I wanted to mention was our cleanup processes...We have that in place, it runs weekly, and it will post a message to Slack pointing out the models that it intends to drop," she adds. This process effectively controls project bloat by regularly removing unused models.

Automation and reusable code are key in managing dbt projects

Automation and reusable code emerged as essential tools in managing growing dbt projects. They save time, maintain consistency, and reduce the risk of errors.

"We have three groupings of macros: dry SQL, SQL functions, and helper functions," Alice mentions, explaining that these macros help keep the code clean, uniform, and easy to handle.

"Equipping developers with reusable code and making it easier for them to use the macro than it is to write the code themselves...is going to save you a lot of time in the long run," she elaborates. This approach not only streamlines the development process but ensures that all contributors follow established best practices.

Consideration for future enhancements in dbt projects

Alice also covers future enhancements in dbt, with the speaker highlighting the need to simplify the path to incrementalization, improve documentation and discovery, and enhance developer experience.

She also speaks about the importance of making the project fun and enjoyable for all contributors: "Finally, I will leave you with one of Whatnot's core values, which I think is very important in a self-service project, and that is to always have fun and be nice." This emphasis on enjoyment and camaraderie, she suggests, can contribute significantly to the success and growth of your project.

Alice's key insights

  • Managing a rapidly growing dbt project can present challenges such as long-running queries, complex model dependencies, rapidly evolving source data, and project bloat
  • Implementing guardrails, or systems to prevent incorrect actions, can help manage these challenges. This can include CI (Continuous Integration) systems, cleanup processes, and allowing for operations on tables
  • Guidelines, or suggestions for best practices, are also important. This includes maintaining good documentation, keeping a modular workspace, and using macros for reusable code