dbt
Blog How to build a semantic layer

How to build a semantic layer

Even if you have an amazing data analytics story, your users may still tell you it’s challenging to find the data they need. Some may not be comfortable writing SQL code or well versed in creating accurate aggregations. Others may not know where to find the correct data.

A semantic layer can help by providing a single source of truth for an organization’s key metrics. It brings consistency, discoverability, and democratization across teams, giving business users the data they need to self-service answers to their own data questions.

Building a semantic layer from scratch isn't easy. That’s where dbt comes in. We'll look at the value of building a semantic layer and show how you can build one easily if you're already using dbt to model your data transformations.

Why build a semantic layer

No matter what business you're in, there are certain metrics that cut across teams. ‌These might include things like estimated sales per quarter, quota attainment, net versus gross profit margins, and others.

The problem is that teams within a division often derive these metrics individually for their own reports, using their own tools. A Forrester survey in 2021 found that over 61% of organizations use four or more BI tools. A staggering 25% use 10 or more. This results in inconsistency across teams, which undermines trust in data.

A semantic layer acts as a translation layer between data and human language. It combines metrics along with context - such as documentation, data lineage, etc. This provides users with details on how a given metric was calculated, who calculated it, and on what data it is based‌ on. It creates a “hub and spoke” model for analytics so that data stakeholders can be sure they’re working off the same metric everywhere, every time.

A semantic layer also provides both discovery and reuse. It enables business users who might not have the confidence to write their own queries and aggregations to find and use the metrics they need. And it enables other business users and analytics engineers to incorporate the work of others easily into their own reports and applications, rather than reinventing the wheel every time they need to leverage a standard calculation.

The dbt Semantic Layer

For years, dbt has worked to bring increasing standardization to data analytics code. Using dbt, you can model all of your data transformations as code that can be version-controlled, tested, and deployed automatically with every change.

Over the years, however, we’ve noticed many customers struggling with common gaps in their analytics workflows. One has been the lack of a consistent approach to developing, deploying, and operationalizing analytics code. ‌That's why we've championed the Analytics Development Lifecycle (ADLC) as a method for standardizing and streamlining your analytics code development process.

The other issue has been metrics consistency. Once companies exceed a certain size, they struggle with providing reliable metrics while also giving business users and data application developers the freedom to choose their own BI tools and create their own data workflows.

This is why we've built the dbt Semantic Layer as part of dbt Cloud. ‌Using your dbt models, you can define metrics that are common to the organization. You can then grant access to this metrics layer using role-based access control (RBAC), enabling data stakeholders to access metrics via API calls and a wide variety of BI tools.

How to build a semantic layer with dbt

Once your data models are defined in dbt Cloud, it’s easy to add metrics definitions to the dbt Semantic Layer. This consists of five steps:

  • Define the metrics you need
  • Set up the environment
  • Create verified data sources
  • Create metrics derived from the data
  • Integrate metrics into data products

Let’s look at each step in detail.

Define the metrics you need

As always, the first step in measuring is deciding what you need to measure. As part of the Planning phase of the ADLC, you should plan whatever metrics you want to push as part of a new analytics code deployment or change. Work with each project’s stakeholders to hammer out agreed-upon definitions for metrics, resolving any inconsistencies across teams.

If you’re just getting started with building a semantic layer, don’t try and create dozens of new metrics all at once. Instead, identify three to five key metrics for the business across a couple of dbt projects.

The ADLC is all about making small, right-sized improvements to your analytics codebase, rigorously testing and deploying the smallest unit of work possible with each push. Once you’ve defined and released one metric and worked out any kinks in the process, you can apply the lessons you learned when deploying the others.

Set up the environment

If you’re a dbt Cloud Team or Enterprise user, you’re ready to set up your environment to start building out your dbt Semantic Layer. If you’re only using dbt Core, you can use our step-by-step guide to transition the projects containing your metrics to dbt Cloud. (You can transition as many or as few of your dbt projects as you want to dbt Cloud, migrating at your own pace.)

Setting up the dbt Semantic Layer requires a previously successful dbt run. Once that completes, you can set up a connection to a Snowflake, BigQuery, Databricks, or Redshift data warehouse from one of your environments (development, staging, staging, production, etc.) to drive metrics definitions. Here, you’ll supply the credentials you use to connect to each service that contains the data required to drive your metrics.

Setting up the dbt Semantic Layer

Create verified data sources

To build metrics, you need a dbt model. If you don’t have one for the data that drives your metrics, define, test, and deploy those models before continuing.

Next, you should familiarize yourself with the key concepts of the dbt Semantic Layer, which is built on our own MetricFlow project. In particular, you should understand semantic models. These correspond to models in your dbt project and consist of three core pieces of metadata:

  • Entities (nouns) - Your data table and their relationships
  • Measures (verbs) - The aggregation function you’re calculating (e.g., total sales). This can consist of your metric or can represent multiple metrics aggregated into a single, new metric
  • Dimensions (adjectives/adverbs) - Aspects of the entities you can use to slice and dice your data - location, time period, etc.

You can then define your semantic model using YAML alongside your dbt YAML model. If you have a dbt Cloud Enterprise account, you can make this even easier by using dbt Copilot to generate your semantic model for you.

As with dbt models, you can—and should—write thorough documentation for your semantic models. This should highlight everything data consumers need to understand, use, and have trust in your metrics.

Create and deploy metrics

With your semantic models defined, you’re ready to commit, build, and deploy your first metrics. Once you’ve committed your changes and another team member has signed off on the pull request, create a deploy job and run it to create a new semantic model in your environment. dbt Cloud will create the new metrics as well as any documentation.

Integrate metrics into data products

Your data consumers can now find your business metrics using dbt Explorer and consume them in their own BI tools. Consumers will be able to see, not just the metric, but its associated documentation and a map of its data lineage. This gives users the knowledge they need about how to use the data - and confidence that it’s sourced and derived correctly.

Integrate metrics into data products

dbt Cloud supports a number of out-of-the-box integrations for popular tools such as Tableau, Microsoft Excel, Google Sheets, and others. For tools not directly supported as of this writing (such as PowerBI), you can use exports to create custom integrations.

You can use our Java, Python, and R clients to consume metrics in your applications (check out these examples). For other languages, you can leverage the Semantic Layer REST API.

Conclusion

dbt models provide a common way to talk about data within your organization. A semantic layer takes this further by providing a translation layer between your data and your everyday business language.

Using the dbt Semantic Layer, you can build this language directly on top of your existing dbt models. Once published, data consumers can easily find and use these standardized metrics using whatever tools they choose. This provides a single, centralized source for the data that matters to your company.

To learn more about how dbt Cloud can bring consistency and simplicity to your data, contact us for a demo today.

Last modified on: Mar 07, 2025

dbt Developer Day

Join us on March 19th to hear from dbt Labs product leads about exciting new and coming-soon features designed to supercharge data developer workflows.

Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.

Read now

Recent Posts