class: title, center, middle # Refactoring ??? [Instructor notes] Start the lesson by connecting to previous knowledge that was either just taught or we can safely assume learners know. Prerequisite knowledge: - models, ref function, materializations - sources - dbt project design ---
Refactoring | Focus
Build a mental model for refactoring existing transformation code into dbt
Apply refactoring principles to refactor transformation code
??? [Instructor notes] Set the focus for the lesson with the slides here. --- # Refactoring transformation code into dbt 1. Migrating legacy code 2. Translate hard coded table references 3. Choose a refactoring strategy 4. CTE Grouping and Cosmetic Cleanups 5. Centralizing Transofrmations & Splitting up models a. Staging models b. CTEs or intermediate models c. Final model 6. Auditing ??? [Instructor notes] We will go deep on each of these shortly, so do not spend too much time here. --- # 1. Migrating Legacy code - Find the SQL code that you want to refactor from your legacy system - Add a folder to your models directory titled `legacy` - Copy over the snippet of SQL to a new SQL file - Make light adjustments to the query to make sure that you can successfully build the model with `dbt run` --
demo
??? [Instructor note]: The legacy query is already in the legacy folder in the model directory. Briefly show learners that file and ensure that the command `dbt run` executes and we can actually build this particular model. --- # 2. Translate hard coded table references - For all source tables, we want to leverage the source function in dbt - Build a sources.yml file to configure a source table for each raw table - Update raw table references using the source function .left-column[ **Original SQL:** ```sql select * from raw.stripe.payment``` ] .right-column[ **Updated SQL:** ```sql select * {{ source('stripe','payment') }}``` ] --
demo
??? [Instructor notes] The source.yml file is already in the model directory, so jump to updaing the raw table references. Learners already know the source function at this point, so leverage that knowledge and get them involved in replacing the hard coded table references with the source macro. --- # 3. Choose a refactoring strategy Two options: - In place refactoring: Work directly on the original query - Pro: No old models to clean up at the end - Cons: Pressure to get it right the first time, harder to audit, must leave a good git trail to review progess - Side-by-side refactoring: - Pros: less impact on end users, less pressure to get it right the first time, easier to audit - Con: Old files need to be cleaned up at the end. - We generally recommend the latter, this really becomes clear in step 6: auditing **Note: for this lesson, we are going to use side-by-side refactoring** ??? [Instructor notes] **Do not spend too much time here**. While knowing the different refactoring strategies is important, we want to get to the actual refactoring. Show, don't tell! --- # 4. CTE grouping and cosmetic cleanups - Reorganize distinct concepts into CTEs - Rewrite subqueries as CTEs - Ensure that the cleanup align with your style guide --
demo
??? [Instructor notes] - Separate each of the concepts brought together in the query into separate CTEs and then start to refactor the code. - The first few CTEs should represent only the raw tables that are brought in and then they should start to be joined. --- # 5. Centralizing transformation & splitting up models After refactoring your code into CTEs, it becomes much clearer for how to break up the model into **staging models, intermediate models, and final models.** - **Staging models:** 1:1 with the source, light clean up of field names and casting data types. - **Intermediate models:** Best of pushing code upstream that is NOT 1:1 with source but may be resuable in other future models. - **Final models:** Remaining logic that you want to expose to end users. --
demo
??? [Instructor notes] - Leverage the CTEs from the last step to migrate code into separate staging models. - There won't be time to create the intermediate and final models, so we can skip that part. --- # 6. Auditing - Leverage the audit helper package to be able to compare the original query to the refactored final model. - [Audit helper package](https://hub.getdbt.com/dbt-labs/audit_helper/latest/) ??? [Instructor notes] - Show the audit helper package on the hub site and *briefly* discuss the functionality. We won't have time to implement it, however.