class: title, center, middle # Snapshots --- # dbt Materializations * **view**: stored query in database * **table**: stored data in database * **ephemeral**: exists exclusively in dbt * **incremental**: adds _new_ data to an existing table -- ### What if I want to track changes in my data? --- # Snapshots Defined -- - Often times, we need to look back in time at previous states of **mutable tables** -- - **mutable tables**: contain records that can be changed/updated after creation -- - Snapshots utilize **type-2 slowly changing dimensions** to identify how a record changes over time -- - **type-2 slowly changing dimensions**: method for tracking historical data by creating multiple records for a given natural key ??? [instructor notes] """ * An example of a changing record in a mutable table would be roles in a company. * Let’s say we have a table with all the employees at dbt Labs. I am on the Training Team, but I am transferring to the Professional Services team. * We want to update the table with the current team I’m on, but we don’t want to lose the historical data of which team I was on previously. * Type-2 slowly changing dimensions adds new rows with the current data without overwriting the previous data. """ ---
Snapshots | Focus
Explain the use-cases for snapshots
Describe Snapshot strategies
Implement snapshots in a dbt project
--- # Working Without Snapshots - You have an `orders` table where the `status` field can be overwritten as the order is processed | id | status | updated_at | |----|--------|------------| | 1 | pending | 2019-01-01 | -- - Your order updates from "pending" to "shipped" | id | status | updated_at | |----|--------|------------| | 1 | shipped | 2019-01-07 | -- - We've lost the information about when the order was "pending" -- ### What complications could this cause? --- # Enter dbt Snapshots -- - dbt can "snapshot" changes in a table to help capture how values change over time -- - Here's what our order changes would look like after implementing snapshots | id | status | updated_at | dbt_valid_from | dbt_valid_to | |----|--------|------------|----------------|--------------| | 1 | pending | 2019-01-01 | 2019-01-01 | 2019-01-07 | | 1 | shipped | 2019-01-07 | 2019-01-07 | `null` | -- - snapshot tables are created as a clone of the dataset, with some additional meta-fields --- # Snapshot Anatomy - Snapshots, like most everything in dbt, are `select` statements, which are defined in a `.sql` file - Snapshot blocks are traditionally saved within your `snapshots` directory .dense-text[ ```sql -- snapshots/products_snapshot.sql {% snapshot products_snapshot %} {{ config( target_database='analytics', target_schema='snapshots', unique_key='id', strategy='timestamp', updated_at='updated_at', invalidate_hard_deletes=True, ) }} select * from {{ source('jaffle_shop', 'products') }} {% endsnapshot %}``` ] ??? [instructor notes] * Let's breakdown the anatomy of a snapshot --- #Building Your Snapshot - Snapshot configurations should be wrapped in a `snapshot block` -- ```sql {% snapshot name_of_snapshot %} {% endsnapshot %}``` --- #Building Your Snapshot - Snapshot configurations should be wrapped in a `snapshot block` - Define the snapshot results by writing a `select` statement inside the block -- ```sql {% snapshot name_of_snapshot %} select * from {{ source('source', 'table_name') }} {% endsnapshot %}``` -- * ideally your results contain a reliable timestamp column for record changes --- #Building Your Snapshot - Snapshot configurations should be wrapped in a `snapshot block` - Define the snapshot results by writing a `select` statment inside the block - Add configurations to your snapshot using a `config` block -- ```sql {% snapshot name_of_snapshot %} {{ config( target_database='analytics', target_schema='snapshots', unique_key='id', strategy='timestamp', updated_at='updated_at', invalidate_hard_deletes=True, ) }} select * from {{ source('source', 'table_name') }} {% endsnapshot %}``` ??? [instructor notes] * Some of these configs are required, some are optional * target_schema, unique_key, strategy, and strategy field are required * target_database and invalidate_hard_deletes are optional * We recommend putting your snapshots in a designated schema so you can limit the permissions on it. * For the unique_key, you can have one field or a combination of fields * There a two different types of snapshot strategies. Let's look at both. --- #Snapshot Strategies -- .left-column[ ###timestamp - **the recommended strategy** - requires an `updated_at` field - if `updated_at` field has changed, new record added - meta-fields updated ] -- .right-column[ ###check - used when no reliable `updated_at` column - must specify which columns _can_ change using `check_cols` - can specify `all` columns - compares current & historical values of specified columns - if `check_cols` changed, new record added - meta-fields updated ] --- #Configuring your Snapshot -- - `target_database`: indicates into which database your snapshot should render (**not required**) -- - `target_schema`: indicates into which schema your snapshot should render (**required**) - this should be _different_ from your current analytics schema (e.g. `snapshots`) -- - `unique_key`: the `primary key` column in the record (**required**) -- - `strategy`: the strategy (`timestamp` or `check`) used for building snapshots (**required**) -- - `updated_at`: the timestamp column to compare (**required w/ `timestamp` strategy**) -- - `check_cols`: the columns to check for changes (**required w/ `check` strategy**) -- - `invalidate_hard_deletes`: find hard deleted records and set `dbt_valid_to` to current time if deleted (**not required**) --- #Now What? -- - Now that my snapshot has been configured, I want to build it -- - Execute the `dbt snapshot` command --- #dbt snapshot .left-column[ **On first run**: - create initial snapshot table - all columns from `select` statement - dbt meta-fields added - `dbt_valid_to` = `null` ] -- .right-column[ **On subsequent runs**: - check for changed records - update `dbt_valid_to` column on changed records - insert updated record(s) into snapshot table - `dbt_valid_to` = `null` on new records ] --- class: subtitle #Hands-On (15 min)
Create:
`products_snapshot.sql` file in the `snapshots` folder
Add the necessary sql for the snapshot
Snapshot:
Use the command that snapshots data
View:
Check out the results of your snapshot (scratchpad or Snowflake)
Snapshot
Use the command that snapshots data
View:
Check out the results of your snapshot (scratchpad or Snowflake)
--- # Resources * [Snapshots](https://docs.getdbt.com/docs/build/snapshots)