Coalesce 2024: How Amplify optimized their incremental models with dbt on Snowflake
Like many other dbt users, Amplify has some very large data sets. (Their largest model needs to be updated every two hours and would cost $2.6 million to build annually if they fully refreshed it every time).
Like many other dbt users, Amplify has some very large data sets. (Their largest model needs to be updated every two hours and would cost $2.6 million to build annually if they fully refreshed it every time). Turning this into an incremental model was a natural choice, and helped a lot. However, they found that simply adding `materialized = ‘incremental` didn’t solve all of their problems.
Specifically, they still had issues running `not_null` and `unique` tests against such a large model, issues sizing their Snowflake warehouse appropriately to accommodate both incremental builds and full-refreshes, and perhaps most importantly, the model was still costing $50,000 annually to build (which can quickly add up when you have dozens of similarly sized models). In this talk they discuss several innovative solutions that they implemented to address these issues, including how they ultimately brought the cost of building this particular model down to just $600 annually!