/ /
Modeling for success: Building data structures that last

Modeling for success: Building data structures that last

Kathryn Chubb

last updated on Nov 18, 2025

Once upon a time, data transformation used to be hard. The good news is that, technically, it's now largely a solved problem. Tools like dbt have made it easy for data engineers, analytics engineers, and others to build high-quality data pipelines, no matter the size of their organization.

The problem (if you can call it that) is that dbt might have made things too easy.

dbt's ease of use is its greatest strength - and sometimes its biggest pitfall. Teams jump in and start building, delivering value quickly. Eventually, however, they hit a wall where their approach no longer scales.

The temptation is understandable. Someone asks you to whip out a quick report. You say, I have a simple tool for that! And for a while, this one-off approach works…until it doesn't.

To build a successful data transformation architecture, you need to step back and consider the bigger picture. In this article, we'll look at the difference between dbt models and data models, how to choose the correct data model for a given use case, and best practices for data modeling with dbt.

Understanding data models vs. dbt models

First, let's start by resolving a basic confusion. Some customers confuse a dbt model with your data model. But the two are distinctly different.

A dbt model is a set of files that define transformation logic for a given set of data tables. These models consist of YAML files that contain the SQL or Python code laying out how source tables are converted into destination tables.

A data model, by contrast, is the complete blueprint of how data is organized, stored, and connected across an entire system. It can span many dbt models and defines the overall data structure and relationships within the data.

Think of the data model as the crafting recipe itself, with dbt models as the individual ingredients that come together to create the final product. This distinction matters. While dbt makes it easy to create individual transformation files, the real challenge lies in architecting how those pieces fit together into a coherent, scalable whole.

The spectrum of data modeling approaches

That makes it important to talk about the different types of data modeling. When we talk about data modeling, we're essentially talking about the difference between normalized and denormalized models.

Think of these types of normalization as different types of vaults within Fallout. More normalized models like third normal form excel at write optimization, making them ideal for operational systems that process transactions and handle frequent updates. Denormalized models, on the other hand, optimize for read performance, which is exactly what analytics workloads need.

Four common data modeling methodologies illustrate this spectrum:

Third Normal Form minimizes redundancy and dependency through highly structured, isolated tables. While it offers efficient storage, it requires complex queries for reporting.

Data Vault tracks every decision with full audit history, making it ideal for high-governance use cases. It's highly scalable and flexible for data integration, but complex to implement upfront.

Dimensional Modeling pre-aggregates data and optimizes for specific analytical use cases. It's designed for business users to understand and works exceptionally well with BI tools, though it can be challenging to change once established.

One Big Table preserves everything in one place for simple access, but suffers from high redundancy and scaling challenges.

It's important to emphasize that none of these data models is the "right" model. That depends on your use case. For example, a data vault delivers high scalability along with comprehensive auditing and efficient incremental updates. However, it's harder to engineer and query. At the other extreme, One Big Table delivers architectural simplicity and fast read performance, but doesn't scale as your business grows.

The table below gives a good overview of the pros and cons of each model:

It's possible to have all four of these approaches in your data architecture. Some companies, for example, adopt what's called a "medallion architecture," with raw data (Bronze level) as the base and other models built on top of that.

Why dimensional modeling wins for analytics with dbt

That said, dimensional modeling has become nearly synonymous with dbt, and for good reason.

dbt is predominantly a batch processing tool, and dimensional modeling is typically the best selection for batch-processed analytics. Originally developed in the 1990s by the Kimball Group, dimensional modeling was designed to solve two core problems: deliver data that's understandable to business users, and deliver it fast.

While some argue that modern infrastructure has made dimensional modeling obsolete, this perspective misses the approach's most valuable benefit: making data understandable to business users. This alignment with business needs enables self-service analytics in a way that normalized transactional structures cannot.

The benefits of dimensional modeling extend well beyond historical performance concerns:

Alignment with business needs: Building data structures according to how the business thinks and operates ensures that stakeholders can actually use the data being produced.

Simplified data: Column naming, values, and structures become user-friendly. A business user could theoretically work with a dimensional model in Excel and make sense of it, unlike transactional systems filled with cryptic IDs and bit columns.

Single source of truth: Building each business process once eliminates redundant values replicated across multiple dbt models—a common problem in poorly architected projects.

Fast query performance: Denormalized structures enable faster reads, crucial for analytics workloads that consume large datasets.

Modular and scalable: Fewer dbt models need to be built, run, and maintained when following dimensional modeling principles.

Optimal for BI tools: Industry-leading platforms like Tableau, Power BI from Microsoft, and other BI apps are built to sit on top of well-constructed dimensional models and achieve peak performance with them.

Translating business needs into dimensional models

At its core, dimensional modeling separates measurable facts from descriptive attributes:

  • Facts represent the verbs—the events and actions happening within a system
  • Dimensions represent the nouns—the people, places, and things that describe those events

Facts are tables that store quantifiable data about business processes: sales transactions, shipments, website visits, payments. They contain measurable columns that can be aggregated—summed, counted, or averaged—along with foreign keys that link to dimensions.

The grain of a fact table determines its level of detail. An orders fact table could exist at the total order level or drill down to the order line item level. Starting at the lowest level of granularity that might be needed makes sense, since aggregating up is always easier than disaggregating.

Dimensions provide the descriptive context that enables slicing, filtering, and grouping of facts in meaningful ways. They represent business entities: customers, products, employees, locations, and time.

Dimensions contain attributes that provide context. A product dimension might include category, brand, and SKU; a customer dimension might include segment, region, and contact details.

The relationship between facts and dimensions creates either a star schema or a snowflake schema. Star schemas feature clear separation between facts and dimensions with relatively simple joins, making them ideal for analytics dashboards. Snowflake schemas normalize dimensions further, which can reduce duplication but adds join complexity and moves away from the benefits that make dimensional modeling valuable for analytics in the first place.

Star schemas remain the preferred approach for most use cases.

Building scalable, analytics-ready data models with dbt

Building on this, we can think of facts as your verbs. They're the events happening within your system. Your dimensions are your nouns - people, places, etc. They're values by which you filter.

Starting from this, we can start to build out a star schema approach, where we have the centralized fact table. The fact table should be built around the business process you're trying to model or you're trying to solve with it. This allows us to bring everything together into a very intuitive, easy, and performant manner.

What does this have to do with dbt? dbt enables us to modularize our transformations into reusable models, building them out with the proper facts and dimensions. This workflow mirrors software engineering best practices, where code is structured for maintainability and reuse. Along the way, we can make sure these models are documented, well-tested, and clear and reliable for team members.

You can use dbt to gradually build out the medallion layers we talked about above - starting with raw data sources and then building up your staging, intermediate, and mart layers through an iterative development process. You can use tests for validation to enforce key relationships, uniqueness, and referential integrity to prevent model drift. dbt also generates dependency graphs that visualize the data flow across your entire project, making it easier to understand how models connect.

Building with the bus matrix

How do we start to put all these things together?

The Kimball bus matrix provides a practical tool for visualizing dimensional models during the planning phase. This matrix maps business processes (which become fact tables) against dimensions they need to connect to. Business processes appear as rows, dimensions as columns, and intersections show which dimensions each business process requires.

Creating a bus matrix involves gathering information from stakeholder conversations and organizing it into a format that both technical teams and business users can understand. The grain of each business process should be documented—the lowest level of detail needed to serve current and anticipated use cases. This becomes a living document that grows as new dimensions and business processes are added.

Translating business needs into dimensional models

Understanding how to translate business conversations into facts and dimensions is a critical skill for data engineering teams. Business users don't speak in technical terms about tables and joins—they describe their needs in terms of processes they want to track and attributes they need for analysis.

Consider a sales analytics use case. When a business stakeholder says, "We need to track every sale—who bought what, when, how much they spent, and if they got any discounts," they've essentially outlined both the fact table and several dimensions:

The business process (sales) becomes the fact table. Measurable elements (amount spent, discounts) become numeric columns that can be aggregated. The descriptive elements (who, what, when) point to dimensions that need to be created or linked: customer, product, and date dimensions.

When the same stakeholder adds, "It would be great to know more about our customer base—what type of customers they are and where they're from," they've identified attributes for the customer dimension: customer type and geographic details. These attributes will enable filtering and grouping of sales data by customer characteristics.

A request to "see details about our products—category, brand, SKU—so we know what's selling well" similarly outlines a product dimension with specific attributes. Each additional requirement builds out more of the dimensional model, creating a comprehensive structure for analysis.

Time dimensions deserve special attention. While it might seem like a date is just a single value, a well-constructed date dimension can contain hundreds of variations: weekday, month, year, quarter, fiscal values, day of week, and more. This allows users to slice data in countless ways without adding complex calculations to individual queries. Time dimensions standardize how the entire organization thinks about dates, whether using fiscal calendars or standard calendar years.

Best practices for long-term success

The impact of poor data model design often isn't realized until it's too late, and there's no easy button to fix a fundamentally flawed model. The cost to rebuild can be substantial.

The following best practices help avoid that pain:

Start with business needs in mind: Every data model should solve for a business process, enabling business users to answer their own questions and explore new ones. This data-driven approach ensures your models deliver real value.

Be clear about grain: Building at the wrong grain creates significant technical debt. Determine the lowest level of detail needed before development begins.

Create conformed dimensions: Master data elements like customer and product dimensions should be built once for the entire project and shared across all subject areas. This ensures data quality and consistency and makes numbers tie out across business processes.

Keep it simple: Don't over-engineer. Build only what makes sense for the business need. Focus on clarity and reusability.

Manage slowly changing dimensions: Understand which types of data changes need to be tracked historically versus simply overwritten. Different types of data require different handling strategies.

Govern evolution incrementally: Establish processes for how models will evolve and how breaking changes will be handled to avoid downstream outages in your data pipelines.

Leverage SQL as your standard: SQL remains the universal language for data transformation. Writing transformations in SQL ensures portability across platforms and enables broader team participation in the development workflow.

Addressing data modeling from the outset

dbt's accessibility makes it easy to deliver quick wins. But scalable, long-term success comes from being intentional.

The lesson is clear: even with tools as user-friendly as dbt, a well-thought-out data model should be the first thing addressed in any new implementation, not an afterthought. Whether you're building data warehouses for analytics or operational pipelines for real-time decision-making, proper data modeling lays the foundation. Teams that invest in thoughtful data modeling upfront avoid the costly rebuilds that plague organizations taking shortcuts.

You can either have pain now or pain later. The pain of thoughtful upfront design is significantly less than the pain of rebuilding a poorly architected system. In the analytics wasteland, survival depends on building a vault—a data model—that can withstand whatever comes next.

Live virtual event:

Experience the dbt Fusion engine with Tristan Handy and Elias DeFaria.

VS Code Extension

The free dbt VS Code extension is the best way to develop locally in dbt.

Share this article
The dbt Community

Join the largest community shaping data

The dbt Community is your gateway to best practices, innovation, and direct collaboration with thousands of data leaders and AI practitioners worldwide. Ask questions, share insights, and build better with the experts.

100,000+active members
50k+teams using dbt weekly
50+Community meetups