/ /
Start fresh, don't lift and shift: a dbt migration guide

Start fresh, don't lift and shift: a dbt migration guide

Daniel Poppy

Last edited on Jun 16, 2026

We've seen this pattern often enough to name it. A team migrates to dbt, spends six months, and ends up with a dbt project that looks exactly like their old workflow, just with Jinja templating instead of drag-and-drop. The data model still has the same problems. It's only newer.

The migration project gets checked off as complete. The downstream problems show up six months later: reports that contradict each other, engineers who can't explain what a model does without reading the SQL, a semantic layer that makes data trust worse instead of better.

This is the lift-and-shift problem, and dbt isn't the cause of it.

Signs your dbt project is a legacy migration

The symptoms are recognizable once you know what to look for. Here are the six most common.

Every stored procedure has a 1:1 dbt model equivalent. The migration team mapped source code to dbt models one-for-one. The logic lives in SQL now instead of PL/SQL, but the structure is identical, and no one asked whether the original structure was worth keeping.

Models are named after source systems, not business entities. You see salesforce_accounts_cleaned and hubspot_contacts_deduped instead of customers and leads. The names describe where data came from rather than what it means to the business.

No staging models exist. Everything jumps directly from raw source to reporting table. The dbt project structure guide is explicit: staging models are the atomic building blocks of a dbt project, and skipping them skips the part that makes the architecture maintainable.

Tests are bolt-ons rather than design principles. The migration got done, then someone added not_null tests to the most important columns as an afterthought.

Documentation is empty or copied from source-system field descriptions. Column descriptions say things like "from MDM system." That records provenance, which is useful, but it leaves the actual documentation undone.

No ref() lineage. Models depend on hardcoded schema names instead of ref(). When something breaks, teams find out from a failed report rather than a CI check.

None of these is hard to fix individually. The problem is what they signal together: the team migrated code, and left the thinking behind.

The rewrite feels risky and the migration feels safe. Look closer.

Here's how teams end up here. Migration projects get measured by completion, not quality. There's a deadline, there's a checklist, and the checklist says "migrate 200 stored procedures." Whether those procedures represent good data modeling never makes the checklist.

The path of least resistance is to replicate what exists. Translate the stored procedures, get to green on the migration tracker, and ship. Quality gets deferred, and deferred quality hardens into permanent technical debt.

A recent consulting engagement shows the pattern. An insurance company undertook a major migration from a legacy ETL platform to dbt. The migration ran on schedule and on budget. A year later, the data team was spending more time debugging model failures than shipping new analytics. The architecture had carried over all the original fragility, in a different tool.

The symptoms: 300-plus models with no staging layer, no tests on roughly 40% of models, reports built directly on raw source joins, and a semantic layer that three different teams maintained independently. The migration was complete. The architecture was broken.

Here's the framing that helps. A lift-and-shift migration is really a replatform: the tool changed, and the system stayed the same. If the data model was wrong before dbt, it's still wrong after.

The triage decision: which legacy models deserve to exist

Before writing a line of dbt SQL, teams should ask a harder question than "how do we migrate this?" The better question: "Does this deserve to exist?"

Here's a practical triage framework.

Eliminate. Reporting tables built for a deprecated BI tool, summary tables that exist only because the old database couldn't handle the underlying query, and "just in case" tables nobody queries. Leave these behind.

Rewrite. Any logic that joins raw source tables with no staging layer. Any model that does transformations and aggregations in the same step. Any model named after source systems rather than business entities. Redesign these rather than translating them.

Translate with care. Validated, tested business logic the organization depends on. Bring it over with explicit tests, document every column, and have someone who understands the business domain review the logic, not just the SQL.

Build fresh. The semantic layer. This should always be built from business requirements: what is a customer, how do we define revenue? Migrating legacy SQL into the semantic layer inherits every ambiguity and compromise baked into the original definitions.

The migration health conversation with stakeholders usually centers on timeline and scope. The more useful conversation is about triage: what are we keeping, what are we rebuilding, and what are we cutting?

Seven signs of a healthy dbt migration

Use this as a diagnostic. If you're mid-migration, run through it this week.

  1. Every source model has a staging model. The staging model cleans and standardizes the data: type casting, column renaming, basic validation. Nothing downstream touches raw source tables directly.
  2. Business entities are represented as marts. Final models are named for business concepts: a customers mart, an orders mart, a revenue mart. They represent what the business cares about, not what the source systems happen to contain.
  3. Every model has at least not_null and unique tests on primary keys. These are the minimum, and they catch the most common failure modes: duplicate rows and unexpected nulls. Without them, you have data hope, not data quality.
  4. Documentation coverage is tracked and improving. Not every column needs a long description, but every model should have one that tells a new team member what it is and why it exists.
  5. ref() is used everywhere. No hardcoded schema names. If the project can't run in a fresh schema without breaking, it isn't production-ready.
  6. CI runs on every PR. Every pull request runs tests before merge, so nobody merges broken SQL.
  7. A team member who didn't write a model can explain what it does from the documentation. If someone can't understand a model without reading the underlying SQL, the documentation has failed, and eventually the model will too.

Why migration quality matters even more in 2026

The pressure to migrate from legacy systems to dbt is real, and so is the pressure to do it fast. In 2026, a third pressure has arrived: the AI systems being built on top of data infrastructure are only as good as that infrastructure.

A lift-and-shift migration produces exactly the kind of foundation that makes AI unreliable. Models without semantics, tests, or documentation mean AI systems inherit every gap. The semantic layer doesn't know what "customer" means. Tests don't exist to catch when something breaks. Documentation can't help a model trace back to its source.

The data shows how wide the gap is. Gartner predicted in early 2025 that through 2026, organizations would abandon 60% of AI projects built on data that isn't AI-ready. And according to a March 2026 report from Cloudera and Harvard Business Review Analytic Services, only 7% of enterprises say their data is completely ready for AI, while 73% say their organization should prioritize AI data quality more than it currently does.

The teams building reliable AI systems are doing it on data infrastructure they actually trust. That starts with the migration, well before the AI project.

Closing

Two practical moves from here.

If you're mid-migration right now, run the seven-sign checklist as a diagnostic this week. Don't wait until the migration is "done." The checklist will tell you how far the project has drifted from good architecture and how much rework is piling up.

If you're planning a migration, the triage framework is the architecture conversation to have before writing a line of code. Get the team in a room, look at the models in scope, and answer one question: does this deserve to exist in the new system?

Get started in dbt

Join the analytics engineers building data infrastructure that actually scales.

Install dbt Wizard CLI

Get started with an agent purpose-built for analytics engineering. It knows which tool to call, which context to pull, and checks its own work before surfacing anything to you.

Share this article
The dbt Community

Join the largest community shaping data

The dbt Community is your gateway to best practices, innovation, and direct collaboration with thousands of data leaders and AI practitioners worldwide. Ask questions, share insights, and build better with the experts.

100,000+active members
50k+teams using dbt weekly
50+Community meetups