class: title, center, middle # dbt Materializations --- # What is a materialization? -- 1. A special kind of **Jinja macro** that wraps your model SQL - Includes all the boilerplate DDL/DML: `create`, `alter`, `drop`, ... -- 2. An abstraction on top of the **database** - Does this model already exist in the database? - What if it exists as a table, but I want it to be a view? - What if Redshift uses `alter table` for views, but Snowflake uses `alter view`? (It's true!) ---
dbt Materializations | Focus
Name and describe the four standard materializations in dbt
Explain the differences between views, tables, and ephemeral materializations
--- # Standard materializations .left-column-66[ - Table - View - Ephemeral - Incremental - Snapshot* ### You can write your own! - Just like you can write your own macros, schema tests, etc. - E.g. ["insert by period"](https://github.com/dbt-labs/dbt-utils#materializations) ] .right-column-33[
] ??? - Materializations are some of the trickiest macros we write --- # Imagine: some people | id | name | cohort | is_enrolled | |----|---------|----------------|-------------| | 1 | Rumi | Earth | false | | 2 | Angaline | Wind | true | --- # Imagine: a model `models/students.sql` ```sql select id as person_id, name, cohort from raw_people where is_enrolled = true ``` --- # Table In the database: `analytics.students` | person_id | name | cohort | |----|---------|----------------| | 2 | Angaline | Wind | -- If we run: ```sql select * from analytics.students ``` We will see: | person_id | name | cohort | |----|---------|----------------| | 2 | Angaline | Wind | --- # View In the database: `analytics.students` ```sql select id as person_id, name, cohort from raw_people where is_enrolled = true ``` -- If we run: ```sql select * from analytics.students ``` We will see: | person_id | name | cohort | |----|---------|----------------| | 2 | Angaline | Wind | --- # Ephemeral - Does not exist in the database - Reusable code snippet - Interpolated as CTE in a model that `ref`s this model -- If we run: ```sql select * from analytics.students ``` We will see: ```sql Database Error ``` --- # What if... | id | name | cohort | is_enrolled | |----|---------|----------------|-------------| | 1 | Rumi | Earth | false | | 2 | Angaline | Wind | true | -- | 3 | Ursula | Fire | true | -- ### What would you do? * **Table:** Rebuild whole thing from scratch! Brute force always works. * **View:** Recalculate every time I'm queried. Always up to date, often slow. * **Ephemeral:** I don't exist in your data warehouse. No way to query me, even if you wanted to. -- * **Incremental:** I've got an idea! --- .left-column[ # The Big Idea * Let's keep the old table, and we'll just _add_ the _new_ records * Less wasteful and probably faster * We just need to know: * What we mean by "new" * What we mean by "add" * As long as we have good answers, we're in good shape ] .right-column[
] --- layout: true # Who is new? And how do you know? --- | id | name | cohort | is_enrolled | |----|---------|----------------|-------------| | 1 | Rumi | Earth | false | | 2 | Angaline | Wind | true | | .highlight[3] | Ursula | Fire | true | -- - The auto-increment ID from our source data system is greater than the previous greatest auto-increment ID - Assuming we can put unquestioning trust in our source data systems --- | id | name | cohort | is_enrolled | created_date | |----|---------|----------------|-------------|--------------| | 1 | Rumi | Earth | false | 2020-03-01 | | 2 | Angaline | Wind | true | 2020-03-02 | | 3 | Ursula | Fire | true | .highlight[2020-03-03] | - A date when the record was initially created - _Much_ more reliable than an auto-increment ID - Assuming records can only be created once and never updated later --- | id | name | cohort | is_enrolled | created_date | updated_date | |----|---------|--------------------|-------------|--------------|--------------| | 1 | Rumi | Earth | false | 2020-03-01 | 2020-03-01 | | 2 | Angaline | .highlight[Air] | true | 2020-03-02 | .highlight[2020-03-04] | | 3 | Ursula | Fire | true | 2020-03-03 | 2020-03-03 | -- | 4 | Christoph | Metal | true | .highlight[2020-03-02] | .highlight[2020-03-02] | -- - Identifying what is "new" requires a comprehension of how your database functions, and can vary across organizations --- layout: true # Checkpoint
--- .left-column[ If... - Both models are materialized as views ] -- .right-column[ Then... - What will appear in the database? - Which will be faster to query, `stg_customers` or `customers`? ] --- .left-column[ If... - `stg_customers` is an **ephemeral model** - `customers` is a **table** ] -- .right-column[ Then... - What will appear in the database? - Which will be faster to query? ] --- layout: false class: subtitle # Knowledge check You should be able to: - Name the four materializatons - Explain how dbt processes the tables, views, and ephemeral models differently ## Questions? --- layout: false class: subtitle # Choose your path .left-column[ ### Learning - Content-based - Learn about data warehouses ] .right-column[ ### Doing - Practice-based - Tackle an open-ended assignment involving ticket sales data ] --- class: subtitle # Zoom Out
Polish your dbt project
Intro to Jinja
dbtonic Jinja
Packages and projects
dbt Materializations
--
Incremental Models
Techniques of the trade
Closing ceremony
---