class: title, center, middle # dbt Materializations --- # What is a materialization? -- 1. A special kind of **Jinja macro** that wraps your model SQL - Includes all the boilerplate DDL/DML: `create`, `alter`, `drop`, ... -- 2. An abstraction on top of the **database** - Does this model already exist in the database? - What if it exists as a table, but I want it to be a view? - What if there are differences between data platform syntax (e.g. Snowflake vs Databricks)? ---
Standard Materializations | Focus
Name and describe the three standard materializations in dbt
Explain the differences between views, tables, and ephemeral materializations
--- # Types of Materializations -- .left-column[ ## Standard - Table - View - Ephemeral ] -- .right-column[ ## Advanced - Incremental - *Snapshot - not _technically_ a materialization ] -- ### You can write your own! - Just like you can write your own macros, schema tests, etc. - E.g. ["insert by period"](https://github.com/dbt-labs/dbt-utils#materializations) ??? - Materializations are some of the trickiest macros we write --- # Materializations Defined ###- View: -- - a select statement that is stored in your data platform -- ###- Table: -- - a data set that is stored in your data platform -- ###- Ephemeral: -- - a select statement that exists solely in dbt (not in your data platform) - imported to downstream models as a CTE --- # Configuring Materializations -- * `configurations` are model settings * They can be set in your `dbt_project.yml` file, _and_ in your model file. * Configurations are applied _hierarchically_ * specific > general * Example configurations: * `materialized: view` * `{{ config (materialized='view') }}` --
demo
??? Demo Time: - "Let's see configurations in action!" - show dbt_project.yml materialization - run `dbt run` - show staging model DDL stating view Now let's do an example that highlights hierarchical nature, showing that more specific configurations take precedence - add a config blocks in a file with a different materialization (e.g. staging file materialized as a table) - run `dbt run` - show staging model DDL stating table --- class: subtitle # Checkpoint * What is the default materialization in a dbt project? * How does dbt prioritize materializations (config block vs `dbt_project.yml`)? --- class: subtitle #Hands-On (10 min)
Find:
`snowplow__page_views.sql` file in your project
Materialize:
All marts models as tables
All staging models as views
Build:
Use the dbt command that builds your models
Find
`snowplow__page_views` table in Snowflake
`stg_events` view in Snowflake
--- # Imagine: some people | id | name | cohort | is_enrolled | |----|---------|----------------|-------------| | 1 | Rumi | Earth | false | | 2 | Angaline | Wind | true | --- # Imagine: a model `models/students.sql` ```sql select id as person_id, name, cohort from raw_people where is_enrolled = true ``` --- # View In the database: `analytics.students` ```sql select id as person_id, name, cohort from raw_people where is_enrolled = true ``` -- If we run: ```sql select * from analytics.students ``` We will see: | person_id | name | cohort | |----|---------|----------------| | 2 | Angaline | Wind | --- # Table In the database: `analytics.students` | person_id | name | cohort | |----|---------|----------------| | 2 | Angaline | Wind | -- If we run: ```sql select * from analytics.students ``` We will see: | person_id | name | cohort | |----|---------|----------------| | 2 | Angaline | Wind | --- # Ephemeral - Does not exist in the database - Reusable code snippet - Interpolated as CTE in a model that `ref`s this model -- If we run: ```sql select * from analytics.students ``` We will see: ```sql Database Error ``` --- # What if... | id | name | cohort | is_enrolled | |----|---------|----------------|-------------| | 1 | Rumi | Earth | false | | 2 | Angaline | Wind | true | -- | 3 | Ursula | Fire | true | -- ### What would you do? * **View:** Recalculate every time I'm queried. Always up to date, often slow. * **Table:** Rebuild whole thing from scratch! Brute force always works. * **Ephemeral:** I don't exist in your data platform. No way to query me, even if you wanted to. --- layout: true # Checkpoint
--- .left-column[ If... - Both models are materialized as views ] -- .right-column[ Then... - How will these models be stored in your data platform? - Which will be faster to query, `stg_customers` or `customers`? ] --- .left-column[ If... - `stg_customers` is an **ephemeral model** - `customers` is a **table** ] -- .right-column[ Then... - How will these models be stored in your data platform? - Which will be faster to query? ] --- layout: false class: subtitle # Knowledge check You should be able to: - Name the three standard materializatons - Explain how dbt processes the tables, views, and ephemeral models differently ## Questions? --- # Resources * [Materializations](https://docs.getdbt.com/docs/build/materializations) * [Materializaitons - Best Practices](https://docs.getdbt.com/guides/best-practices/materializations/1-guide-overview)