dbt Labs has signed a definitive agreement to acquire Transform, the original innovators behind the semantic layer in the modern data stack.
Transform’s core technology, MetricFlow, is best-in-breed when it comes to defining metrics and compiling those definitions into performant SQL. This technology—the creation of the “query plan” and then generating high-quality SQL from it—is the Hard Part of what we refer to as the semantic layer, and it’s something that Transform has absolutely nailed.
I am personally just unbelievably excited to work alongside Transform’s founders–Nick Handel, James Mayfield, Paul Yang–and the Transform team to bring the best modern semantic capabilities to dbt’s massive user base.
2014: Origins of the metric store
Back in 2014, while serving as a Data Scientist on the Airbnb data team, Nick faced a challenge that all data-obsessed businesses inevitably face—his company’s desire to measure everything. Every launch, every feature, every sliver of every campaign by timezone, segment, and sector.
This is a manageable problem for companies of a certain size, but as companies scale and downstream tooling diversifies, it becomes increasingly difficult to ensure metric consistency. Airbnb solved for this with an internally developed metrics store that enabled any integrated analytics tool to leverage centrally defined and maintained metrics. Nick used the metric store to multiply his impact many times over, delivering insights that would’ve been totally inaccessible without it. Nick was an early contributor to an idea that blossomed into Minerva and eventually formed Transform with James, the original PM of the metrics repo in 2014 at Airbnb, and Paul, who built the infrastructure behind it all.
Nick’s story is, ultimately, not so dissimilar from my own with dbt: some of the best products come from people just trying to solve their own problems.
2021: Commercializing the metrics layer
Transform is still early-stage, yet its ideas have already impacted the entire modern data stack. It was a series of two posts in 2021 that brought the concept of the “metrics layer” to the forefront of the industry conversation. The first, by Alana and Ankur at Base Case Capital, describes the core idea and market need:
Imagine instead that you could disentangle metric definition from visualization. In this world, the teams that own metrics would be able to define them once, in a way that’s consistent across dashboards, automation tools, sales reporting, and so on. Let’s call this “Headless BI”.
The second, by Benn Stancil, suggested that the idea of the metrics layer was the “missing piece” in the modern data stack and described functionally what it would need to look like. Quoting:
A better architecture would do for metrics what dbt did for transformed data—make them globally accessible to every other tool in the data stack. Rather than each tool defining their own aggregations, the metrics layer is a centralized clearing house for how all metrics are calculated.
What neither of these posts explicitly call out is that these ideas were drawing on work being done by the Transform team and other early innovators in the space. And it was these ideas that caught Drew’s and my attention in early 2021.
The semantic layer is a hard problem, and Drew and I are not the first to attempt to solve it. It is both a technical challenge and a distribution challenge, and we knew that when we made the decision to pursue this space in mid-2021.
The semantic layer has historically been integrated with business intelligence tools. Those of us who have been working in the modern data stack for the past decade fell in love with building our semantic model in LookML.
What Transform introduced to all of us was the incredible potential of semantic capabilities that are decoupled from a single business intelligence tool – or, “headless semantic layer”. In this world, metrics and entities are no longer locked into a single BI tool, they can be accessed by all downstream tools.
In my opinion, though, the hardest part about building a headless semantic layer is you have to somehow convince the entire ecosystem of downstream products to integrate with you. If you have the best semantic layer in the world but it’s missing the integration with my BI tool or my reverse ETL tool or my CDP (etc), it’s not helpful to me at all. The quality of these integrations matters too—the difference between a fantastic experience and a poor one makes all the difference.
So: integration coverage and quality, rather than functional coverage, has been our primary goal to-date. And while you’ve only seen the tip of the iceberg so far, I’m really excited about the progress we’ve made and are making there. We have a path towards supporting far more data warehouses, and we’re getting real traction in the BI tool market as well.
Even with our current modest integration coverage, there are now over 500 organizations defining metrics alongside the transformations in their dbt projects. This is a huge number in a short amount of time, but it only represents about 2.5% of the entire dbt install base! This is the start of the avalanche—just like the adoption of dbt itself, this number is only going to increase.
But functional coverage matters too! This is the area that we’ve always known we weren’t going to be best-in-the-world-at upon launch. If you follow the public GitHub issues, you’ll see that users want joins in their metrics! This is surprising to exactly no one, and we’ve been looking for ways to move faster here.
This is where Transform comes in. Over the coming months, we’ll be adding MetricFlow to the dbt Semantic Layer to deliver its power and flexibility to everyone who has metrics defined in their projects. And yes, it supports joins 😄
There’s a LOT more to say about MetricFlow, its capabilities, and its mental model of the world, and I’m excited to talk more about that over the coming months. For now, I want to talk about exactly one more thing before wrapping.
2023 and beyond: A much-needed open standard
From where I sit, it seems incredibly obvious that there should be a single way to define metrics and entities so that they can show up consistently in all of your downstream analytical and operational systems. Of course you want this! No one wants the current shitty state where everyone and every tool disagrees on these definitions.
The question, then, is: why is this the way things are? Why do we have an outcome that practitioners don’t want?
I think there are two reasons for this.
- There is no standard for a semantic layer. The modern data stack emerged around the standard of SQL, but there’s no equivalent semantic layer standard. All semantic layers created in the past have been vendor-specific.
- In the past, semantic layers have led to vendor lock-in. So rather than joining forces to create a standard, vendors have historically advocated for their own proprietary semantic layers as the single source of truth.
This is not a good outcome, and it’s not going to magically change on its own without some kind of hard reset. And such a reset would need to come from somewhere that operated on fundamentally different principles. That’s where dbt comes in.
From the very earliest days of dbt, it was clear to me that the core of dbt needed to be open source. Why would anyone invest so much time encoding their knowledge into a framework that could fundamentally be ripped away from them at any point? dbt’s licensing strategy has always existed to create the largest-possible groundswell of usage, to enable an entire industry to change the way it worked, and to give its users control.
The semantic layer operates on this same fundamental logic: users simply are not going to invest in creating a single source of truth without this control. This is also one of the reasons I have so much respect for the Transform team. It is still incredibly unusual to take the core IP of your software business and open source it, but that’s what Transform did—Nick and the team open sourced MetricFlow last year. We will preserve 100% of your ability to run MetricFlow for yourself via the same license the dbt Semantic Layer already uses – the Business Source License.
dbt has already created a standard for data transformation in the modern data stack. In joining forces with Transform, we’re now poised to do the same, at long last, for the semantic layer.
I’m excited to see how this will change the entire industry, from how we work to how we build data products. The future is bright.
Last modified on: Mar 10, 2023