The new dbt Semantic Layer spec: the DNA for our vision
With the upcoming re-release of the dbt Semantic Layer, we are on the precipice of a technological shift that will significantly influence how we analyze data. This revolution revolves around something seemingly simple yet critically important to technology: specifications (or “specs” for short). Specs are the DNA of technology, the foundation that guides the functionality and interoperability of tech products and platforms.
Well-defined specifications drive the most transformative technological advancements in the past few decades. HTML (Hypertext Markup Language), for example, set the standard for how web pages are structured and formatted, opening up endless possibilities for the internet’s growth and utility. It’s a spec that was painstakingly crafted, continually updated, and universally adopted to revolutionize how we share and consume information. Eventually, paving the way to a broad enough corpus of text to train massive LLMs… just kidding, we’re not going there yet.
Now, imagine a similar leap forward, not in how we share web content but how we organize and interface with data. That’s what we’re aiming to achieve with the release of our new specification and the open source protocols they compile down to. These form a foundation for the forthcoming dbt Semantic Layer. With this beta release, we’re not only introducing new semantic components—semantic models and a reorganized metric object—but also sharing more broadly an evolution in the capability for semantic layers.
Why this spec is different
The DNA of our semantic layer differs in one crucial aspect from others. In most semantic layers, users define the edges of the graph by describing the left and right join keys.
The dbt Semantic Layer spec takes a different approach with the introduction of Entities which allow us to infer the edges of the graph. For example, a user table with a user entity as a primary key and a transactions table with a user entity as a foreign key can form a relationship or more specifically, an edge of the graph. Since there are typically fewer nodes than edges in a graph, the result is greatly reduced logic to maintain.
The simplicity and efficiency of this approach cannot be overstated. It captures semantic logic in a way that is far more DRY (Don’t Repeat Yourself), enables more combinations of metrics and dimensions and leads to cleaner SQL (no more symmetric aggregates!), making it easier for data teams to manage, evolve, and utilize their data models. Let’s dive in to understand why this is a foundational shift.
The building blocks of the dbt Semantic Layer
At the heart of our new semantic layer are two fundamental constructs: semantic models and metrics.
Semantic Models are the foundational building blocks. They consist of three core objects: entities, measures, and dimensions. These allow MetricFlow (the framework that powers the Semantic Layer) to construct queries for metric definitions. The semantic model and its components are new concepts and the primary way in which it differs from old dbt metrics and other legacy semantic layers.
semantic_model: name: transactions description: | Each row represents one transaction event. defaults: agg_time_dimension: ds model: ref('fact_transactions') entities: - name: transaction type: primary expr: id_transaction - name: customer type: foreign expr: id_customer measures: - name: transaction_amount_usd description: The total USD value of the transaction. agg: SUM create_metric: true - name: transactions description: The total number of transactions. expr: "1" agg: SUM create_metric: true - name: transacting_customers description: The distinct count of customers transacting on any given day. expr: id_customer agg: COUNT_DISTINCT dimensions: - name: ds type: time type_params: time_granularity: day
Metrics are the instruments we use to measure and analyze our data. They exist atop the semantic models, facilitating rich, nuanced definitions on top of reusable objects.
metric: name: revenue_usd type: derived type_params: expr: transactions * transactions_amount_usd measures: -transactions -transactions_amount_usd
With the release of this beta branch, we’re enabling dbt-core to work seamlessly with MetricFlow, leveraging this common spec which allows the full benefit of MetricFlow on your existing dbt project. This integrated approach ensures tighter alignment and superior compatibility between the two platforms.
Participate in the evolution
We are in the very early days of releasing this work but with the intention of building in the open, starting today, you can use the new spec with the dbt 1.6 beta release:
>> pip install dbt-metricflow
You then need a dbt manifest that the semantic layer can use. This can be accomplished by running the parse command in dbt to build a manifest(
build also accomplish this and more):
>> dbt parse
You can then explore the tutorial or run a MetricFlow CLI query against the provided dataset.
>> mf tutorial
>> mf query —-metrics revenue_usd —-group_by metric_time
Dive deeper into the spec by visiting the docs. This is your opportunity to be part of the journey, to test the boundaries of what this new spec can do, and to provide feedback that can help shape the future of data analytics.
Let us know your feedback in the #dbt-core-metrics channel on dbt Slack.
The release of this spec is not a standalone event but part of a continuous journey we embarked upon when dbt Labs acquired Transform four months ago. The combined Semantic Layer team has worked together to integrate the strengths and capabilities of both products seamlessly. We’ve built remarkable momentum, and we have a drumbeat of features and availability coming in the next few months.
This all leads to an upcoming Beta release of the new dbt Semantic Layer later this summer. That in turn will be followed later this year by partner integrations to allow you to access those metrics in spreadsheets, notebooks, BI tools and so many more places where we aim to bring consistency on metrics.
Do note that going forward, it will be possible to define metrics using dbt Core and query them from the CLI using MetricFlow. However, the dbt Semantic Layer experience, including the ability to access those metrics from external integrated tools, will require dbt Cloud API access, which is available on dbt Cloud Team and Enterprise plans.
The dbt Semantic Layer spec is not just a set of technology decisions—it’s a stepping stone to a new era–one where logic is maintained centrally, and new products can be built with semantics at their core.
We can’t wait to see the next generation of data applications we’ll build with this foundation. Everyone plays a critical role in shaping this revolution. We appreciate your continued feedback and contributions. We look forward to building with you all on the dbt Semantic Layer. Happy data modeling!
Last modified on: Sep 20, 2023