class: title, center, middle # Polish your dbt project ??? [Teacher Notes]: A list of hot-tips that will take your project from good to great! The aim here is to pause a bit on fundamentals to talk about best practices and the do's and dont's of utilizing the functionality we've learned about so far. ---
Polish your dbt project | Focus
Develop a mental checklist for polishing your project before sharing your code
??? "" Our focus of this session will be to develop our mental checklists to continually keep our project clean and facilitate ease of onboarding. "" --- class: subtitle # dbt code ??? "" First we'll talk about polishing up our dbt code "" -- i.e. the code in your `.sql` and `.yml` files ??? "" When we say 'code', we mean the contents of our .sql and .yml files. "" --- ## Require a dbt version to prevent conflicts _(Also, you can remove the comments that were generated by the `init` command)_ .dense-text[ ```yml # dbt_project.yml require-dbt-version: [">=0.17.1", "<0.19.0"] ``` ] .caption[ [Source](https://github.com/dbt-labs/dbt-init/blob/master/starter-project/dbt_project.yml) ] Use this to frequently update your dbt version for your team. Note - You can follow dbt releases on GitHub: [github.com/dbt-labs/dbt-core/releases](https://github.com/dbt-labs/dbt-core/releases) ??? "" It's best practice to require a dbt version to prevent conflicts. We do this by setting a configuration called `require-dbt-version`. In our example, our project works with versions later than or equal to 0.17.1 and less than 0.19.0. By requiring this, you're ensuring other developers are running the project with a version that can handle the functionality used within it. This is more important for CLI users because dbt Cloud uses a development environment, which configures the version that developers use when they access the `develop` tab to make changes. "" [Demo:] Exemplify what happens when you add this config. You should hit these points: - How does a dbt run function when your version is within the requirements? - How does a dbt run function when your version is out of the requirements? - You get an error. Click on the status (with the "light") in the bottom right corner - Scroll to the bottom of the error message when it pops up. - Show the error that explains the project version requirements. --- ## Ensure you don't have any direct table references ??? "" Ensure you don't have any direct table references. "" -- 🙅♀️ ```sql select * from dbt_claire.stg_customers ``` ??? "" Don't directly reference schema tables - as we learned about in earlier lessons this: - won't dynamically change where the data is coming from, which means data is referring to a place you have less control over - it won't build dependencies in the order it needs to - and it won't show up in the DAG as a dependency for other nodes "" -- 🙅♀️ ```sql select * from {{ target.schema }}.stg_customers ``` ??? "" We also don't want this, which is using the target context to identify the schema. This will work because it's looking at your profile setup and schema that you defined, but it won't properly identify the order that models need to run in. It also won't show dependencies in our DAG. "" -- 🙆♀️ ```sql select * from {{ ref('stg_customers') }} ``` ??? "" This is what we want! Always use the ref() function to refer to other models in your project! This is the atomic unit of dbt - it identifies the correct order that your models should run in and allows you to efficiently see dependencies and injections to make informed modeling decisions. "" --- ## Ensure you don't have any direct table references ??? [This just clears the section for the next example] -- 🙅♀️ ```sql select * from raw.jaffle_shop.customers ``` ??? "" The same goes for sources of data - you shouldn't hardcode the raw data reference. By explicitly stating this, you're not centralizing your configuration for your data sources. If there's a change in where the data lives (it happens!), you'll need to go through all of your code and change the references. "" -- 🙆♀️ ```sql select * from {{ source('jaffle_shop', 'customers') }} ``` ??? "" We always want to configure and use our source functions to refer to raw data! This will ensure: - the source shows up in our DAG - we can use selection syntax to rebuild models coming off of this source - and we can easily reconfigure the source in one place and have the changes flow downstream. "" --- ## Apply configurations at the group level ??? "" Apply configurations at the group level. "" -- 🙅♀️ ```sql -- models/staging/jaffle_shop/stg_jaffle_shop__customers.sql {{ config(materialized='view')}} ``` ```sql -- models/staging/jaffle_shop/stg_jaffle_shop__orders.sql {{ config(materialized='view')}} ``` ??? "" You can see here that we have two staging models: customers and orders. They have the same configuration to materialize them as views. "" --- ## Apply configurations at the group level 🙆♀️ ```yml models: jaffle_shop: staging: +materialized: view ``` ??? "" To apply that configuration at a group level, we would instead refer to the folder in our dbt_project.yml. You can see here all models in our staging folder in the jaffle_shop project will be materialized as a view. This helps keep configurations concise and targeted for specific scenarios. When a model doesn't fit a group-level configuration, you can then specify it within the model to override it. "" [Teacher Notes:] In our experience when configurations aren't applied at the group level first, we tend to see a lot of configurations that don't or shouldn't apply. Having the configuration at the top of every model can cause onboarding developers to copy and paste just because they see the configuration on everything. By applying at the group level, it tailors one-off configurations to specific use cases, and prompts developers to ask the right questions (i.e, "any reason why this model is configured as a table when the rest are views?"). --- # non-dbt code ??? "" What about things that aren't in a .sql or .yml file -- our non-dbt code? "" -- i.e. the code in your repo that dbt doesn't use (but humans do)! ??? "" That means the things in our repository that dbt doesn't use, but humans do. "" --- ## Improve your README A README can help answer: - How do I get started with this project? .dense-text[ * Links to dbt Getting Started instructions * Who to contact to get database access * Snippets of SQL for a superuser to set up a new user * A rough orientation of the project (e.g. which folder does what) ] - How can I contribute to this project? .dense-text[ * Code conventions to follow, e.g. [dbt conventions](https://github.com/dbt-labs/corp/blob/main/dbt_style_guide.md) and [git conventions](https://github.com/dbt-labs/corp/blob/main/git-guide.md) ] ??? "" One of those things being the README in your repository, which helps someone get familiar with the project. A best practice is adding useful information like: - how someone would get started with the project - who they would contact to get access to various things - snippets to help admins add a new user - if using the CLI, how to set up the profiles.yml file - a rough orientation of the project, - and finally, how one would contribute. You can include coding conventions and style guides here. "" [Teacher Notes:] Many of these are links to our start project script, so they include extra jinja You can use cmd+click to open the links in a new tab if you want to show them. --- .center[
] .caption[ [Example](https://github.com/dbt-labs/rapid-onboarding-exemplar), [Source](https://github.com/dbt-labs/dbt-init/tree/master/starter-project) ] ??? [Teacher Notes:] Pause here to show the slide's example of the README (use cmd+click to open in a new tab). Another good example is the [Rapid Onboarding Readme](https://github.com/dbt-labs/rapid-onboarding-exemplar), which includes dropdowns for different ways to run the project. --- ## Add a PR template PR templates help you write better PRs (and are great tools for reviewers!) * Use headers to prompt someone to fill in the right information * Checklists are also useful ??? "" Add a PR template. These prefill a pull request description with headers to prompt the developer for the right information. They're created with markdown, so you can use helpful things such as checklists to ensure developers are polishing up their code before requesting a review. "" --- .center[
] .caption[ [Example PR template](https://github.com/dbt-labs/dbt-init/blob/master/starter-project/.github/pull_request_template.md) ] .caption[ [Guide for adding a PR template in GitHub](https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/creating-a-pull-request-template-for-your-repository) ] ??? [Teacher Notes:] Use cmd+click to open the links on the slide and show the example of our PR template. (Optional) - How to walk through each section of the PR template: - cmd+click the link, then click `raw` to view the markdown code of the PR template - Between the `
Polish your dbt project
--
Intro to Jinja
dbtonic Jinja
Packages and projects
dbt Materializations
Incremental Models
Techniques of the trade
Closing ceremony
---