As an analytics engineer on the professional services team at dbt Labs, I've helped a dozen or so data teams refactor from a legacy SQL transformation workflow to modular data modeling in dbt.
I've noticed a clear pattern: rare is the team that starts with a completely clean slate. More often than not, teams have accumulated years of SQL code powering business-critical, often complex data transformations.
These SQL transformations may run in a stored procedure, in an Airflow SQL operator, or some other scripting environment---but regardless of where it runs, that legacy SQL code must be migrated!
How can that migration be done with a) minimal headache and b) maximum quality?
Announcing a new (free!) on-demand course
Over the last 18 months, the dbt Labs professional services team has evolved a process to help teams migrate legacy SQL queries to dbt SQL.
And as of today, you can now learn + implement that same refactoring process, with a new on-demand course.
If you're completely new to dbt, you may want to start with the dbt Fundamentals course before diving into this one.
All the links
Everyone learns a bit differently, so you can pick up the course materials in a few different formats:
- The video course, Refactoring SQL for Modularity, which includes a practice query + dataset to test your refactoring skills.
- A pull request in a GitHub repo, where you can follow commit-by-commit how a monolithic query (as from a stored procedure) breaks down modular dbt code (note: these commits are covered in-depth in the video course).
- A written tutorial on legacy SQL -> dbt SQL refactoring, in case you prefer reading or bookmarking the short version.
Our SQL refactoring process, in short
For a bit of background, the migration workflow that we follow internally is:
- Migrate your legacy SQL code, unchanged---make sure it runs and produces an identical output.
- Implement dbt sources (as opposed to raw table references).
- Choose a refactoring strategy.
- Implement clean CTEs (we ❤️ CTEs).
- Separate transformation logic into standardized data model layers (often the longest step).
- Audit the output (often using the audit_helper dbt package).
There is, of course, no one true way to refactor legacy SQL into dbt models!
I'd love to hear from you (on Twitter or in #learn-on-demand in dbt Community Slack) how your approach varies from the one we lay out in the course.
Last modified on: Jun 11, 2024
![](/_next/image?url=https%3A%2F%2Fcdn.sanity.io%2Fimages%2Fwl0ndo6t%2Fmain%2Fffd8a2782be2395ac0de21c38affcd431d86586a-1964x450.png%3Ffit%3Dmax%26auto%3Dformat&w=3840&q=75)
Achieve a 194% ROI with dbt Cloud. Access the Total Economic Impact™️ study to learn how. Download now ›