Maximizing dbt Cloud efficiency: Using Slim CI for error detection and table optimization in large-scale data organizations from Coalesce 2023
Leo Folsom, Solutions Engineer at Datafold, explains how to use dbt Cloud and set up Slim CI.
“I wanted my data code to be reliable, version-controlled, efficient, understandable. I wanted to do a good job. I wanted the numbers to not be changing for reasons I didn't understand…dbt really helped me get there."
- Leo Folsom, Solutions Engineer at Datafold
Leo Folsom, Solutions Engineer, at Datafold explains how to use dbt Cloud and set up Slim CI. He also elaborates on why those who don’t leverage powerful functionality like state comparison and deferral during implementation, cause teams to miss errors and build unnecessary tables.
Continuous Integration (CI) is a critical component to ensure the successful operation of dbt Cloud
Leo highlights the importance of CI in running dbt Cloud. This process involves automatic checks that happen when a pull request is opened in GitHub, essentially ensuring the stability and reliability of the code.
"The stuff that happens automatically when you open a pull request in GitHub...That's CI," he explains. He elaborates that CI is about balancing speed and stability to avoid causing potential damage that can't be undone. "There's not a simple answer, and it's often a challenging trade-off to figure out how to make sure your dbt project on your next production run does not go kaput," he adds.
Leo mentions that one of his favorite features of dbt Cloud is the ability to set up a CI job that runs when a pull request is opened. He emphasizes that CI runs should be faster, warehouse costs should be lower, and models should not be built unnecessarily.
dbt Cloud allows for the scaling of CI
Leo outlines how dbt Cloud allows for the scaling of CI with the use of Slim CI. It's about only building the models that are being modified, or the downstreams, thus reducing the need to build the entire project.
"Slim CI makes this really scalable," he explains. He also notes that this process reduces the time and cost involved in running CI, while also avoiding building models unnecessarily.
When it comes to scaling, Slim CI can be set up for any branch that merges into any other branch. Leo states, "It does add a lot of complexity to manage multiple jobs... but it is kind of infinitely scalable in that respect."
The deferral process in dbt Cloud provides significant advantages
Leo discusses the deferral process in dbt Cloud, which enables users to select from production data directly. This process is smart enough to build upstream models only when necessary, which enhances efficiency and reduces wasted resources.
"The exciting thing here is that you're selecting from the upstreams that already exist," he points out. He also briefly explains how to set up the deferral process in dbt Cloud and emphasizes its simplicity.
Leo's key insights
- Slim CI allows for faster CI runs and lower warehouse costs. It's scalable and can be set up for any branch that merges into any other persistent branch
- The process of deferral in Slim CI involves selecting from existing production data, which reduces the need to build upstream models
- Setting up a CI job in dbt Cloud is simple. It involves selecting an environment to defer to and toggling on the option for the job to be triggered by pull requests