Extending dbt's Reach
What is dbt Live?
The dbt Live: Expert Series consists of 60-minute interactive sessions where dbt Labs Solution Architects share practical advice on how to tackle common problems that they see over and over again in the field — like how to safely split environments with CI/CD, or how to implement blue/green deployments.
Event agendas are shaped by audience requests, so whether you’re new to dbt, or just want to sharpen your skills, we’d love to see you there! Register to take part in the next live session.
Session Recap: Extending dbt’s reach
The APAC-friendly edition of dbt Live featured Doug Guthrie, Solutions Architect and Joel Labes, Senior Developer Experience Advocate from dbt Labs.
Doug started us off with a few slides setting the scene. His slides contained situations where dbt needed to be extended to:
- Enforce coding standards by codifying them and checking them in the pull request process
- Provide richer context to data stakeholders by visualizing the relationships between the data they’re consuming
- See the lineage from dbt source to the business intelligence consumption layer by automating exposures creation
- Create more enhanced notifications and reporting by connecting dbt Cloud’s metadata API with external tools
You can watch the full replay here or read on for a summary of the session.
How do we enforce coding standards?
Before Doug jumped into the set up to enforce coding standards, he talked briefly about dbt Cloud’s Slim CI jobs. Slim CI jobs are similar to scheduled jobs, but are different for three main reasons:
- The Slim CI job must defer to a production job.
- The Slim CI job commands need to have a state:modified+ selector to build only new or changed models and their downstream dependents. Importantly, state comparison can only happen when there is a deferred job selected to compare state to.
- The Slim CI job must be triggered by pull request.
Source: Configuring a dbt Cloud CI job documentation
After he set the foundation, he changed a dbt model and committed his changes to his version control repository, and started a pull request. The pull request on Github triggered dbt Cloud via webhook to start the pull request job on dbt Cloud and a series of Github Actions. One of the Github Actions was to run SQL Fluff that was configured with rules for how the SQL in his models should be formatted. To see the configuration for this Action, check out the lint_sql.yml file in Doug’s Github repository.
The run of SQL Fluff helped highlight some SQL format errors in Doug’s code, which he fixed before his code was reviewed. This helps focus the review on the logic changes and not the format of the code. It also makes the code review faster because the code is always in a common format, no matter who the contributor is within your organization.
How do we visualize relationships between the data being consumed?
The next Github Action Doug showed helped create an entity relationship diagram (ERD) using Graphviz that can be added automatically to your dbt Docs. This enhances the value of the Docs and gives your users another way to visualize the relationships among the data they use in their business intelligence tool of choice.
Doug created an erd.yml file that contains the configuration for his Create ERD Github Action. This action runs a Python file called run_and_erd.py to gather metadata to help populate an ERD diagram and then adds this diagram to dbt Docs hosted on dbt Cloud.
This ERD diagram hosted in dbt Cloud enriches documentation and can help your analysts who use SQL-based tools to analyze data to identify keys for different tables they work with quicker and to improve the quality of their analysis.
How do we better integrate dbt with our BI tools?
dbt exposures can be added to your projects to help your users traverse from source to dashboard to understand what models are in use in the lineage graph.
In the image above, the tableau_sales_056 and orders_dashboard are dbt exposures for Tableau dashboards. The dashboards use data from the fct_order_items and fct_orders models. Normally, the set up of exposures is manual. But, Doug showed us how to automate the creation of these exposures using GitHub Actions. To view the configuration check out Doug’s tableau.yml file in his project and this Tableau Exposures Action to learn more about this automation.
This workflow can help keep exposures up to date and relevant for your data stakeholders and build more trust in the dashboards and reports they use regularly because they can drill into the lineage.
How can we push dbt Cloud metadata to other tools?
The last use case Doug walked through dealt with the need to analyze dbt job runs over time instead of on a run-by-run basis. The run-by-run analysis can be accomplished in dbt Cloud’s model timing tab.
To analyze job runs over time, Doug wanted to push dbt job run metadata to a tool like Datadog to see if changes to the data or model are having an impact on run times. Here is the run_and_log.yml file used to configure a Python environment to run a dbt job and send the results to the Datadog API and the Action page where you can see the log of runs.
The analysis this enabled could result in the faster delivery of data to stakeholders or the ability to ensure data delivery times are continuously met by setting up time based alerts in Datadog or similar tools.
You can check out Doug’s full walk through of his use cases below.
Following Doug’s presentation, he answered Community member questions received in advance of the session and coming up live from session attendees.
Here are some of the questions:
- Are there other BI tools supported with the automated exposure crawler other than Tableau?
- Is there any way to call a dbt pipeline from Snowflake Snowpark?
- Currently we are doing all our data transformation in BigQuery where we have N number of views and tables so the flow is very confusing for any new developer, how can dbt help here?
- What are the best practices on sharing models within the organization between independent dbt projects? What are the best approaches to share these common sources and models across projects?
- Can we do a block in a block while documenting?
- Can we reference an upstream field further downstream?
You can hear Doug and Joel’s responses to these questions and more questions from dbt Community members on the replay.
Thank you all for joining our dbt Live: Expert Series with Doug and Joel!
Want to see more of these sessions? You’re in luck as we have more of them in store. Go register for future dbt Live: Expert Series sessions with more members of the dbt Labs Solution Architects team. And if you’re as excited as we are for Coalesce 2022, please go here to learn more and to register to attend.
Until next time, keep on sharing your questions and thoughts on this session in the dbt Community Slack!
Last modified on: Sep 21, 2023