« Back to blog

Refactoring legacy SQL to dbt

Where to begin when migrating legacy SQL transformation code to dbt? In a new (free!) on-demand course, analytics engineers from dbt Labs unpack the refactoring process they've employed across dozens of projects.

Refactoring legacy SQL to dbt

As an analytics engineer on the professional services team at dbt Labs, I’ve helped a dozen or so data teams refactor from a legacy SQL transformation workflow to modular data modeling in dbt.

I’ve noticed a clear pattern: rare is the team that starts with a completely clean slate. More often than not, teams have accumulated years of SQL code powering business-critical, often complex data transformations.

These SQL transformations may run in a stored procedure, in an Airflow SQL operator, or some other scripting environment—but regardless of where it runs, that legacy SQL code must be migrated!

How can that migration be done with a) minimal headache and b) maximum quality?

Announcing a new (free!) on-demand course

Over the last 18 months, the dbt Labs professional services team has evolved a process to help teams migrate legacy SQL queries to dbt SQL.

And as of today, you can now learn + implement that same refactoring process, with a new on-demand course.

If you’re completely new to dbt, you may want to start with the dbt Fundamentals course before diving into this one.

Everyone learns a bit differently, so you can pick up the course materials in a few different formats:

Our SQL refactoring process, in short

For a bit of background, the migration workflow that we follow internally is:

  1. Migrate your legacy SQL code, unchanged—make sure it runs and produces an identical output.
  2. Implement dbt sources (as opposed to raw table references).
  3. Choose a refactoring strategy.
  4. Implement clean CTEs (we ❤️ CTEs).
  5. Separate transformation logic into standardized data model layers (often the longest step).
  6. Audit the output (often using the audit_helper dbt package).

There is, of course, no one true way to refactor legacy SQL into dbt models!

I’d love to hear from you (on Twitter or in #learn-on-demand in dbt Community Slack) how your approach varies from the one we lay out in the course.

Last modified on: Oct 13, 2022

dbt Learn on-demand

A free intro course to transforming data with dbt