Modern data engineering best practices

Data engineering best practices: What's new?

last updated on Nov 18, 2025

Data engineering is at a turning point.

We live in an era where data underpins everything from customer experiences to strategic decisions. AI-driven analytics, real-time demands, and distributed data ownership are reshaping how teams build, manage, and deliver data.

Data engineering has always been about building reliable, scalable pipelines — but the landscape is changing. In recent years, three significant developments have reshaped how teams design and deliver modern data systems:

The explosion of data volume and diversity
The near-ubiquity of cloud-native infrastructure
The advent and adoption of AI across the enterprise

These changes have created new challenges — and opportunities — in data engineering. This article addresses these challenges and discusses how teams can adopt best practices that improve collaboration, increase agility, and deliver trusted data at scale.

Today's data engineering challenges (and solutions)

Data engineering must continuously adapt to unprecedented growth in scale, speed, and complexity while delivering reliable, self-service analytics to drive business value. Four critical challenges define today’s landscape:

Too much data to process
Data engineering teams are doing too much
Development and debugging are too slow
Data governance can’t be manual

Challenge 1: Too much data to process

In 2024, industry analysts estimated that the amount of data created, captured, and consumed globally was 149 zettabytes, with projections surpassing 394 zettabytes by 2028. Businesses are drowning in data — 64% of organizations already manage at least one petabyte of data, and 41% manage at least 500 petabytes worth.

Data can be unstructured, structured, or semi-structured. And it all needs to be analyzed to generate business value from insights.

Traditional, manual approaches to data discovery and pipeline creation simply can’t keep pace. This yields backlogs, bottlenecks, and missed opportunities. The scale of today’s data landscape demands a new layer of acceleration.

The solution: AI-powered acceleration with dbt Copilot

dbt has long been the industry leader in data transformation platforms. Using dbt as your data control plane, your data teams can more quickly create trustworthy data outputs with data workflows that are flexible, cross-vendor, and collaborative.

Now, dbt Copilot brings generative AI directly into your dbt environment, giving you a way to move faster without sacrificing quality or control. Copilot lets you harness your data's full context, including relationships, metadata, and lineage, to automate routine tasks using natural language prompts.

With Copilot’s AI capabilities, you can:

Accelerate model creation: Generate SQL models from natural‑language prompts, using your project’s metadata (relationships, lineage, and context) to ensure relevance and accuracy.
Auto‑suggest transformations, joins, and model structures: Leverage warehouse context to recommend the right building blocks for your models.
Speed up data discovery: Use context‑aware AI that understands your dbt models’ metadata, lineage, and relationships to recommend the most relevant assets and transformations.
Accelerate data testing: dbt Copilot uses the context of your dbt models to suggest context-aware validation tests. With one click, it adds the corresponding test code directly to your project, ready to run during your builds.
Embed governance from the start: Every Copilot‑generated asset is version‑controlled, testable, and documented like any other dbt model, ensuring speed never comes at the expense of trust.

Copilot is like a dedicated data intern who standardizes legacy documentation, improves query optimization, checks for SQL syntax errors, enhances metadata compliance, and speeds up migrations — all within your dbt workflows!

Challenge 2: Data engineering teams are doing too much

Today’s engineers build pipelines that feed self-service analytics, power AI models, and enable real-time decision making. They also enforce governance, maintain data quality, and scale across an expanding ecosystem of tools and data sources. As a result, engineers can become bottlenecks for basic data access, documentation, and model creation.

The solution: Empower stakeholders with self-service tools

Data self-service is now commonplace in data analytics, with modern BI platforms that enable business users to explore and analyze information without relying on IT. Data democratization has been shown to shorten time‑to‑insight, boost productivity, and drive innovation as teams can act on trusted data in minutes instead of days.

dbt has recently introduced self‑service tools for analysts (and engineers!) to accelerate model development, streamline transformations, and boost productivity without sacrificing trust.

The first is dbt Canvas, a visual, drag-and-drop interface that lets teams create and edit dbt models without starting from a blank SQL file.

In the dbt Canvas no-/low-code modeling environment, your analysts can:

Build and edit models without hand-coding every step. Simply drag and drop operators, such as input, join, select, aggregate, and formula, onto a canvas, then connect them visually.
Generate valid SQL code. Canvas-generated code is version‑controlled, testable, and deployable like any other model.
Find and explore data without SQL knowledge. Canvas includes robust search and discovery features and always-on data profiling capabilities to help analysts (and engineers) gain a deeper understanding of the data.
Facilitate iterative collaboration. Each transformation is visually represented as a node, making it easy to trace logic, understand relationships, and work with teammates. Step‑by‑step previews at every stage build confidence, reduce errors, and accelerate iteration.
Harness AI‑assisted code generation. Canvas integrates with dbt Copilot to suggest transformations and generate SQL, accelerating model development.

We’ve also introduced the self-service dbt Catalog tool to help engineers and analysts alike quickly understand your entire lineage from data source to the reporting layer. Using Catalog, data teams can troubleshoot, improve, and optimize your data workflows faster.

In the dbt Catalog self‑service data discovery environment, your team can:

Search and explore data assets without writing SQL. Analysts can instantly find models, sources, and metrics with rich metadata, descriptions, and tags to speed up analysis.
Understand lineage and dependencies. Interactive, column‑level lineage maps show how data flows from source to dashboard, helping teams assess impact before making changes.
Assess data quality at a glance. View test coverage, documentation completeness, and performance insights to ensure trusted, production‑ready datasets.
Troubleshoot and optimize faster. Identify slow‑running models, missing documentation, or failing tests directly from the Catalog interface.
Enable governed self‑service. All assets are tied to the dbt project, so definitions, lineage, and quality checks stay in sync with production, empowering analysts without sacrificing control.

We believe the future of analytics engineering is collaborative, governed, and accessible to every data practitioner. Canvas and Copilot enable analysts to work with well‑governed data earlier in the pipeline in a self-service manner.

That helps your engineers to focus on what matters most: designing scalable, high‑performance data models, enforcing rigorous quality and testing standards. Instead of answering simple questions about data, they can return to solving complex transformation challenges that unlock faster, more reliable insights for the business.

Challenge 3: Development and debugging are too slow

In many data engineering environments, development remains slow. Writing and editing transformation code can mean waiting minutes for parsing and compilation just to validate changes.

Errors often surface only after running against the warehouse. Once changes are made, long feedback loops, redundant runs, and the challenge of pinpointing root causes further stall delivery and inflate compute costs. That leads to longer development cycles, slower debugging, and higher compute costs.

The solution: High-velocity development with dbt Fusion

dbt Fusion attacks these bottlenecks end-to-end in your pipeline. With lightning-fast parse times, instant error detection during coding, and state-aware orchestration that runs only what’s changed in the pipeline, teams can iterate, debug, and deliver at top speed.

With dbt Fusion’s state-aware orchestration, teams can:

Develop at record speed. 30x faster parse times mean instant feedback and near‑real‑time iteration.
Catch errors early. Live error detection flags issues before running code against the warehouse.
Run only what’s changed. State‑aware orchestration pinpoints modified models and their dependencies, avoiding costly full‑project rebuilds.
Iterate faster. Shorter feedback loops enable engineers to test, refine, and ship changes in minutes, rather than hours.
Debug with context. Built‑in dependency awareness makes it easier to isolate issues and understand downstream impacts before deploying.
Control costs. Early adopters report significant savings, with visibility into spend via the cost management dashboard.

Challenge 4: Data Governance can't be manual

In fast-moving analytics environments, governance processes that rely on manual checks, ad-hoc documentation, or after-the-fact reviews can’t keep pace with the speed of modern data delivery. Fragmented documentation and inconsistent metadata standards undermine trust and self-service, leaving analysts to hunt for definitions and lineage.

Solution: Automated governance inside the pipeline

dbt embeds governance directly into the transformation layer. It enforces quality, consistency, and compliance as part of the workflow, not as a bolted-on afterthought.

dbt Catalog provides interactive, column‑level lineage across your entire data estate, making it easy to audit changes, trace data flows from source to dashboard, and assess downstream impacts before deploying.
dbt’s Semantic Layer centralizes metric definitions using MetricFlow. Every stakeholder — whether they’re working in BI tools, spreadsheets, or embedded analytics — starts from the same, governed business logic.
dbt Fusion’s governance‑aware orchestration automatically enforces policies in‑flight, running only the models that have changed and ensuring compliance rules are applied consistently at build time.

With dbt’s built-in governance capabilities, teams can:

Automate lineage tracking. Capture end‑to‑end data flow for complete auditability and faster impact analysis.
Standardize metrics. Use the dbt Semantic Layer to define and enforce consistent business logic across teams and tools.
Enforce policies in‑flight. Apply governance rules automatically during orchestration with Fusion’s governance‑aware orchestration, not after the fact.
Accelerate compliant delivery. Ship governed, high‑quality data at the same speed as ungoverned pipelines.
Build trust at scale. Ensure every dataset meets quality, compliance, and consistency standards before it’s consumed.

dbt and the next era of data engineering

Modern data teams can’t afford to ignore these four challenges facing data engineering. If they do, they risk higher costs, lower data quality, missed opportunities, and a growing gap between strategic business needs and the insights delivered.

dbt brings together the solutions to all four challenges — accelerating development, empowering stakeholders, speeding up debugging, and automating governance — in a single, integrated workflow:

Copilot speeds development with AI-assisted code generation.
Fusion eliminates bottlenecks in parsing, debugging, and orchestration.
Catalog provides complete lineage for auditability.
Semantic Layer enforces consistent, governed metrics across every downstream tool.

Together, they enable teams to move faster, catch issues earlier, enforce governance automatically, and control costs — all within the same platform they already use to build and manage transformations. dbt makes this possible by enabling true collaboration, giving every role shared context, consistent definitions, and built‑in guardrails inside the pipeline.

dbt equips data engineering teams to move faster and with more confidence. To learn more about how dbt can help you future-proof your data engineering practices, request a demo.

Live virtual event:

Experience the dbt Fusion engine with Tristan Handy and Elias DeFaria.

Save your seat

VS Code Extension

The free dbt VS Code extension is the best way to develop locally in dbt.

Install free extension

Latest posts

Press3 min

dbt Labs Expands dbt Fusion Engine Ecosystem with Microsoft Fabric Integration

Elaine Green

on Nov 18, 2025

Learn16 min

Modeling for success: Building data structures that last

Kathryn Chubb

on Nov 18, 2025

Learn16 min

Talk to your data: AI-powered conversational analytics with the dbt MCP server

Kathryn Chubb

on Nov 13, 2025

The dbt Community

Join the largest community shaping data

The dbt Community is your gateway to best practices, innovation, and direct collaboration with thousands of data leaders and AI practitioners worldwide. Ask questions, share insights, and build better with the experts.

Join the Community Explore the community

100,000+active members

50k+teams using dbt weekly

50+Community meetups