dbt

Transitioning from a monolithic dbt project to a multi-project collaboration: A case study from Cityblock from Coalesce 2023

Cityblock Health team members, Katie Claiborne and Nathaniel Burren, discuss the transition from a monolithic dbt project to a multi-project collaboration.

"Rather than trying to save that monolithic legacy project, we decided to try again and build a collection of smaller projects that could work together more nimbly."

- Katie Claiborne, Staff Analytics Engineer at Cityblock Health

Cityblock Health team members Katie Claiborne, Staff Analytics Engineer, and Nathaniel Burren, Analytics Engineer, discuss the transition from a monolithic dbt project to a multi-project collaboration at Cityblock. They share their journey of designing a prototype dbt project and developing the infrastructure to replicate it, allowing analytics engineering teams to move efficiently from a monolithic to a multi-project collaboration.

The shift from a monolithic dbt project to a multi-project collaboration

Katie and Nathaniel explain Cityblock's journey from a monolithic dbt project to a multi-project collaboration. The team found their monolithic project becoming unsustainable due to variant SQL styles, inconsistent documentation, and inefficient modeling. They decided to break down their monolithic project into smaller, more manageable projects.

Katie also shares the difficulties they faced as their monolithic project grew in size, stating "We felt like we were spending more of our time simply maintaining the project, as opposed to creating and sharing new knowledge for the organization."

To find a solution, they decided to "build a collection of smaller projects that could work together more nimbly." Katie emphasizes, "We made a bet that a short-term investment in prototyping multi-project collaboration would yield a higher long-term value to the organization and help us turn this curve back in the right direction."

The role of scalable infrastructure and standardization in project efficiency

Katie and Nathaniel highlight the importance of creating scalable infrastructure and enforcing standardization to make their projects more efficient. They discussed using GitHub for version control, dbt for data transformations, Terraform for cloud infrastructure, and Google for their cloud platform. They also pointed out the value of having dedicated DevOps support.

"Prototype project is a dbt project that can be used as a template for future ones at an organization. These can contain any group of models that you want, but it has to start with a team that's willing to commit to new coding standards for quality," notes Nathaniel. He also stresses the need for consistency and reliability in setting up new dbt projects, advising to "make friends with your DevOps folks."

Katie expands on this, stating, "Dedicated DevOps support can elevate analytics engineering. Establishing standards for code quality and designing a multi-repo strategy can allow teams to collaborate nimbly across projects."

The process of moving from a single project to multiple projects

Katie and Nathaniel demonstrate the process of moving from a single project to multiple projects using their "project in a box" strategy. The "project in a box" was a prototype dbt project that could be used as a template for future projects. They also developed a GitHub app for the initial setup of new GitHub repositories.

Katie explains their strategy: "We use the template repository as a home for our prototype. By putting the prototype dbt project into the template repository, we ensure that all future projects are created consistently in a way that's not subject to human error." She also notes the use of GitHub to make the process easier.

Katie details their process further, adding, "We created two new GitHub repositories. The first is our template repository, which contains the files needed to get a new project off the ground. The second is our workflows repository, and it contains shared files that a project, or really any number of projects, can call over time." She emphasizes how this helped ensure that all dbt repositories were testing and deploying their code in the same way.

Katie and Nathaniel's key insights

  • Cityblock noticed its dbt project slowing down due to more models and sources. This led to maintenance challenges and less time for creating and sharing new knowledge
  • The company decided to build a collection of smaller projects that could work together more nimbly, betting that a short-term investment in prototyping multi-project collaboration would yield higher long-term value
  • The template repository was used for one-time setup for new GitHub repositories containing a dbt project, ensuring all future projects are created consistently
  • The workflows repository provided ongoing support for all repositories containing a dbt project, ensuring all dbt repositories are testing and deploying their code in the same way
  • Dedicated DevOps support can elevate analytics engineering and allow teams to collaborate nimbly across projects