Standardizing for Success!

What is dbt Live?
The dbt Live: Expert Series consists of 60-minute interactive sessions where dbt Labs Solution Architects share practical advice on how to tackle common problems that they see over and over again in the field — like how to safely split environments with CI/CD, or how to implement blue/green deployments.
Event agendas are shaped by audience requests, so whether you’re new to dbt, or just want to sharpen your skills, we’d love to see you there! Register to take part in the next live session.
Can we make data governance fun?
Data governance - what does that make you think of? If it makes you want to stop reading this post or watching the replay, please stay with me for the next few minutes.
I used to think data governance was about committees and slowing down processes, but I’ve come to see it differently after watching Randy Pitcher’s dbt Live: Expert Series session. He and Luis Leon are members of the dbt Labs Solutions Architect team and hosted the North America-friendly August session. From the get go, it challenged how I thought of data governance, I hope it does the same for you.
Let’s reframe data governance for this post as a set of standards you can define and then codify in your dbt project. Some standards checks you can code in dbt are:
- Following naming convention rules
- Adding tags to models
- Enforcing testing requirements
- Requiring descriptions for models
- Validating database and schema selection
- Distributing approval of model changes
Codifying and automating adherence to standards can help your developers comply with standards without thinking while helping satisfy your organization’s data standards to create a better data experience for your internal and external stakeholders.
Imagine the data experience your team will be able to deliver if:
- Your models follow consistent naming conventions making it easier to navigate your project
- Your models have documentation making it easier for your users to learn about a model
- Your users have a question about a model and they can look at the tags to see what team they need to go to for more information
- One of your jobs fail (gasp!) and there is a test that makes it easy to troubleshoot the issue and get your jobs back up and running faster
Are you seeing how data governance can be important to delivering a better data user experience and dare I say, fun? If so, you can watch the full replay of Randy’s session here or read on for a summary of tips from Randy on getting started.
How do we enforce model naming conventions?
Let’s say you want your models in your staging folder of your dbt project to always be prefixed with stg_
. How could we check this each time a developer clicks save in dbt?
Randy added a “governance” folder under the “macros” folder in his dbt project to start off. You can see the project repository he worked with here.
Then he added a new file and called it “governance_check.sql”. This file will house the Jinja macro that checks the file names during each save. Below is the code that Randy used in this file. Let’s walk through some of the building blocks so you can extend this to apply in your workflows.
{% macro governance_check() %}
{{ log('Validating object: ' ~ this.identifier, info=True) }}
{% if 'staging' in model.path %}
{% if not model.name.startswith('stg_') %}
{{ exceptions.raise_compiler_error('Invalid naming convention. Staging object must start with "stg_": ' ~ model.path) }}
{% endif %}
{% endif %}
{% endmacro %}
Line two writes to the logs the identifier (name of the model) when the project is built. this
is a dbt Jinja function that returns the database representation of the current model. This is paired with the identifier
property that returns the table name as stored in the database. You can explore more source properties in the dbt Docs to learn more on using them for setting permissions, dynamically changing schema, or dynamically referencing models.
Line four begins a logic block that checks the model path using model.path
to see if it contains “staging”. If the model is stored within the “staging” path then it will check the name of the model, if it does not the check will end.
Line five is another logic block for models within the “staging” folder to see if the model name starts with stg_
. If the model name has the “stg_” prefix then the check stops. If the model does not have the “stg_” prefix the user will see a compilation error in dbt Cloud that can be clicked to reveal the issue. Below you can see an example of this compilation error message from a sample project:
Lines six and seven close out the logic blocks. After saving the macro, you need to go to the project.yml
file and add a pre-hook configuration. Here is a screenshot of the code to add to your models configuration:
Once the pre-hook configuration is in place you can test out your new governance check by changing the name of one of your models in your staging folder to not start with stg_
to see the compilation error.
Now, you can make sure your models comply with your standard naming conventions and report to your data governance committee or team that you’ve automated one of their requirements. From here you can move on to create a check for descriptions or tests to line up with your data governance standards.
Want to watch Randy code this solution live? Check out the video replay below.
dbt Live: Expert Series with Randy (August 12th NA)
Participant questions
During Randy’s session, he answered Community member questions received in advance of the session and coming up live from session attendees.
Here are some of the questions:
- Can we use this object to target column names and standardize naming conventions?
- How do you decide how to use incremental models?
- Is it possible to have a macro that would take table information and load that into definitions for doc files?
- Is there a way to bring all of the metadata created in dbt into standard metadata tools?
You can hear Randy’s responses to these questions and more questions from dbt Community members on the replay.
Wrapping up
Thank you all for joining our dbt Live: Expert Series with Randy and Luis!
Please join the dbt Community Slack and the #events-dbt-live-expert-series channel to see more question responses and to ask your questions for upcoming sessions.
Want to see more of these sessions? You’re in luck as we have more of them in store. Go register for future dbt Live: Expert Series sessions with more members of the dbt Labs Solution Architects team. And if you’re as excited as we are for Coalesce 2022, please go here to learn more and to register to attend.
Until next time, keep on sharing your questions and thoughts on this session in the dbt Community Slack!
Last modified on: Nov 29, 2023