Table of Contents
- • No silver bullets: Building the analytics flywheel
- • Identity Crisis: Navigating the Modern Data Organization
- • Scaling Knowledge > Scaling Bodies: Why dbt Labs is making the bet on a data literate organization
- • Down with 'data science'
- • Refactor your hiring process: a framework
- • Beyond the Box: Stop relying on your Black co-worker to help you build a diverse team
- • To All The Data Managers We've Loved Before
- • From Diverse "Humans of Data" to Data Dream "Teams"
- • From 100 spreadsheets to 100 data analysts: the story of dbt at Slido
- • New Data Role on the Block: Revenue Analytics
- • Data Paradox of the Growth-Stage Startup
- • Share. Empower. Repeat. Come learn about how to become a Meetup Organizer!
- • Keynote: How big is this wave?
- • Analytics Engineering Everywhere: Why in the Next Five Years Every Organization Will Adopt Analytics Engineering
- • The Future of Analytics is Polyglot
- • The modern data experience
- • Don't hire a data engineer...yet
- • Keynote: The Metrics System
- • This is just the beginning
- • The Future of Data Analytics
- • Coalesce After Party with Catalog & Cocktails
- • The Operational Data Warehouse: Reverse ETL, CDPs, and the future of data activation
- • Built It Once & Build It Right: Prototyping for Data Teams
- • Inclusive Design and dbt
- • Analytics Engineering for storytellers
- • When to ask for help: Modern advice for working with consultants in data and analytics
- • Smaller Black Boxes: Towards Modular Data Products
- • Optimizing query run time with materialization schedules
- • How dbt Enables Systems Engineering in Analytics
- • Operationalizing Column-Name Contracts with dbtplyr
- • Building On Top of dbt: Managing External Dependencies
- • Data as Engineering
- • Automating Ambiguity: Managing dynamic source data using dbt macros
- • Building a metadata ecosystem with dbt
- • Modeling event data at scale
- • Introducing the activity schema: data modeling with a single table
- • dbt in a data mesh world
- • Sharing the knowledge - joining dbt and "the Business" using Tāngata
- • Eat the data you have: Tracking core events in a cookieless world
- • Getting Meta About Metadata: Building Trustworthy Data Products Backed by dbt
- • Batch to Streaming in One Easy Step
- • dbt 101: Stories from real-life data practitioners + a live look at dbt
- • The Modern Data Stack: How Fivetran Operationalizes Data Transformations
- • Implementing and scaling dbt Core without engineers
- • dbt Core v1.0 Reveal ✨
- • Data Analytics in a Snowflake world
- • Firebolt Deep Dive - Next generation performance with dbt
- • The Endpoints are the Beginning: Using the dbt Cloud API to build a culture of data awareness
- • dbt, Notebooks and the modern data experience
- • You don’t need another database: A conversation with Reynold Xin (Databricks) and Drew Banin (dbt Labs)
- • Git for the rest of us
- • How to build a mature dbt project from scratch
- • Tailoring dbt's incremental_strategy to Artsy's data needs
- • Observability within dbt
- • The Call is Coming from Inside the Warehouse: Surviving Schema Changes with Automation
- • So You Think You Can DAG: Supporting data scientists with dbt packages
- • How to Prepare Data for a Product Analytics Platform
- • dbt for Financial Services: How to boost returns on your SQL pipelines using dbt, Databricks, and Delta Lake
- • Stay Calm and Query on: Root Cause Analysis for Your Data Pipelines
- • Upskilling from an Insights Analyst to an Analytics Engineer
- • Building an Open Source Data Stack
- • Trials and Tribulations of Incremental Models
Keynote: The Metrics System
For the last 5 years, dbt has been an authority on how analytics work gets done. But we know novel workflows aren’t promulgated through top-down adoption—they have to be baked in at the start.
This principle applies equally to how teams adopt analytics engineering, as well as how tools are built to enable it. While dbt’s open source roots has always made this much easier, we believe in a world where the entire analytics ecosystem grows with us, from Core, to Cloud, and beyond.
Browse this talk’s Slack archives #
The day-of-talk conversation is archived here in dbt Community Slack.
Not a member of the dbt Community yet? You can join here to view the Coalesce chat archives.
Full transcript #
Julia Schottenstein: [00:00:00] Welcome everyone to the second FounderLine keynote of Coalesce 2021: The Metric System. I’m Julia Schottenstein, and I’m part of the product team at dbt Labs. And I’ll be your host for this talk. In this session, we’re going to hear from our co-founder and chief product officer Drew Banin, He’s going to share more about how dbt has become a standard in modern data workflows and what big bets we’re taking with the product so that we continue to earn the right to be the backbone of your team’s analytics stacks.
Before I pass the mic, I wanted to take the rare opportunity to share some fun facts about Drew. Drew is at the top of the dbt slack leaderboard in terms of most messages posted. Can people respond in the chat with the guess for how many messages Drew has sent in dbt slack? Since it’s just since its inception. Hint, it’s nearly two X as many messages as the number two spot.
His [00:01:00] average message length is 155 characters long, a bit too verbose for a tweet.
Drew is also holding on to the number one contributor spot for dbt Core and git hub. Although, thankfully we have far more contributors these days, so it’s only a matter of time before there’s a new number one. And while you may be right in thinking Drew has super powers, he is in fact human and has a fear of snakes.
Okay. And now for the fun facts, please join the conversation in the slack channel #coalesce-keynote-metric-system where Drew will answer questions after the session. And without further ado over to you Drew,
Drew Banin: thank you so much, Julia. And welcome everyone to this talk on the metric system. My name is Drew Banin, I’m one of the co-founders here at dbt. And last year I took some time to talk about art and data. But [00:02:00] today I’m here to talk about science and data, and specifically the science of measurement or metrology.
In a couple of years time, we will hit the whole steam agenda. We’ll hit all of them, but today science. So for as long as there have been like humans and trade and industry human beings that needed to measure. Ancient systems of measurement, we’re rudimentary and often use the human body or kind of our natural environment around us as the measuring stick.
[00:02:27] Measuring things #
Drew Banin: These systems measurement just varied incredibly either within or across cultures and nothing was standardized, but somehow it. So here’s a depiction of one of the earliest ways that humans measured distance. We created these nodded courts and so these cords are like a hundred cubits long and there was a knot in the cord, every 10 cubit and this kind of rough gauge helped with early civilizations marking out, agreed upon distances and like survey, so it leads to a really good question, which is what’s a cubit. So a cubit is like the length [00:03:00] of your forearm, basically. It’s the distance from your elbow to your elbow, to your fingertip, right? It comes from a Latin cubitm meaning elbow and there’s other similar sort of distance measures that have been used, like one’s called the L it’s from the same elbow origin.
It’s like a 5,000 year old concept measuring things this way. So early humans without access to durable and consistent tools could use their bodies or in some case the environment around them to, to measure out approximate distances and for a lot of applications, this was close enough. So I’ve had one source that said that a cubit is 18 inches.
We can all measure your own forearms and like we can make a big spreadsheet. So the distribution, I dunno, but this is interesting. And I thought I’d share here. Another word for cubit is it is a COVID and this is it started a fun fact I wanted to share with you. But the thing that’s so fascinating is we, like measurements are just so entrenched in our data that our day-to-day language and sort of culture and ways that we’ll it’s like when you think about COVID and social distancing, like six feet, two meters apart do we realize that we’re measuring things [00:04:00] in this context? So we’re going to bring it back to measurement that they would call it here is like a cubit, depending on who you ask is between 14 and 36 inches.
[00:04:08] Measuring things: feet #
Drew Banin: So they’re like small or a long arms depending. And this isn’t the only body part that we use to measure things. And literal human feet were for a very long time, how we measured distances. It turns out there are five feet and a pace and a thousand patient a mile. And so the word mile actually comes from this Latin term mille passus. So a thousand paces. And so five feet in the pace with a thousand paces gets you 5,000. Which you know, is, as we know is not actually what a mile is at least not today. So what happened there? A mile was also defined as eight furlongs. And in 1593, queen Elizabeth the first redefined the length of the Furlong and she made it longer.
And so that resulted in a longer mile, which is today’s 5,000, 280 foot long mile. So that’s a classic senior leadership move changing the OKR is in the middle of the millennium like that. But we’ll let it slide this time. So we’re using like human appendages as rulers, because we have [00:05:00] them with us and we don’t need to create anything or carry it around.
[00:05:02] Measuring things: pounds #
Drew Banin: Like it’s present all the time. So we’ll shift gears from measuring distances to measuring like weight, or I guess your mass, right? So it’s the pound it’s comes from the Latin Libra, Pando and Libra. The sort of astrologists among us know that this is balanced, right? So the Libra Zodiac symbol is scales that balance.
The other thing that’s really interesting to note here is that the abbreviation for a pound comes from the word Libra. So that’s like lb right now, there was no one pound throughout most of human history. There were the retrial pounds and tower pounds and merchants pounds. And they were used for different things.
And it’s such a cluster. There are 13 ounces in a troy pound and 12 ounces in a merchants pound. But that’s actually not a meaningful distinction because the number of ounces in these different systems. And so it’s just like chaos, right? Trying to like weigh things against each other.
Unless you actually had a scale, you didn’t know what Wade, what depending on who you were talking to. These measurements are just like [00:06:00] entrenched in our lives. Like when we talk about pounds Sterling as a currency, first of all, the symbol there is it’s the L from Libra Pando, right?
It means literally like one pound of silver. And that’s what it meant at one point in time. Really cool thing is like lb. If you’re a sort of monk in the dark ages, writing out on parchment paper, this is how you would abbreviate I’ll be, and this is like where the pound symbol comes from.
So forget about Zodiac symbols. Like we have a pound symbols on our I-phones right. And this is like a thousand or more year old, a unit of measure. And so the way that we measure things becomes like a really crucial part of our. And it extends far beyond any conscious measurement.
[00:06:37] Measuring things: tons #
Drew Banin: It’s just ingrained in our culture in a lot of ways. And so staying on this topic of measuring weight, it’s talking about tons, which are like, I think the funniest, like unit know measure tons originally just meant that something was heavy. And so they came from these French wine barrels.
And the word for those barrels came from the sound that they made as they were rolled down the street full of wine. Like they’re really loud. So the word [00:07:00] comes from the French word for thunder I guess tone air. And so this is a unit of measurement that originally just meant, like this is heavy and the word came from that’s loud.
Like how not rigorous is that as a unit of measure, so there became different types of tons and they still existed. It’s like incredibly like comically confusing. So the short tons, you got your long tons and you got your metric tons, which conveniently are called tons, but I think you might actually say it differently than that.
So like in general, homophones, like not a good thing to have in a system of measure. For trying to verbally communicate with people. So finally we get to some standards. And we say that the ton is going to be standardized as 2000 weight and the statute of measurement sorry. The stood among you.
My guess that a hundred weight is a hundred pounds and you’d be right. Except a hundred weight could be 108 and, or 112 or 120 pounds, depending on who exactly you asked. So standardization is elusive. It’s [00:08:00] not enough to just say a ton is 2000 weight. You need to describe what a hundred weight is and if that’s in pounds are they tower pounds or merchant pounds, right?
Like you need this whole sort of like ontology of measurements in order to build on top of these lower level measures into like higher level constructs.
I mentioned like trading in the system. Industry commerce. Imagine being a data analyst in this world where you like show off a report and someone says, oh, how many a hundred weights is that? And you say that’s 20 a hundred weights. Then you have to talk about what a hundred weight is like up on it.
[00:08:39] Measurements are relative #
Drew Banin: So a couple of key takeaways at this point, measurements are always relative, like fundamentally, this is like a big insight for me and doing research for this talk. There are no absolute measurements of anything anywhere. It is always relative to some other. These units of measurement have staying power.
So they’re really ingrained in our culture is that our societies and our language and ways that we don’t often realize the [00:09:00] other thing is we talked about the cubit and the foot for a long right. As long as people have shared definitions of things, as long as you’re using like the same person’s foot for like measurements, you can actually get really far it’s like standardization is the important part, like shared definitions come first and you can always get more precise as like time and capability allows.
But if I’m talking to cubits and you’re talking ELs, like we don’t even have the same concept when we’re talking about. So how precise do we need to be, is near enough, good enough
Fast forward for the Magna Carta. It’s 12, 15, a D assigning of the night in the card. And it’s one of the foundational moments in the arc of individual freedoms against arbitrary authoritarian. So what we see is in this document that gives like fundamental rights to man.
We are also saying the king is agreeing. There’s, there’ll be standard measures of wine ale and corn throughout the kingdom. It’s like, why do we find these things connected together? Why do we find like reforms and standards [00:10:00] going hand in hand? I think the answer is that fundamentally, like standards are empowering.
They give people autonomy and ages. And, imagine you’re a Cooper, right? You’re the person building the wine barrel just built like a hundred of them. And the king says actually wine barrel should be bigger. There should be more wine in a wine barrel. Like that kinda sucks. So standards fundamentally, they like give power to the people that are doing the work, because they’re now more like in they’re more of an active participant in the process rather than the recipient of some sort of standard that someone else created.
[00:10:31] The Elightenment #
Drew Banin: So keep fast-forwarding right. The enlightenment .It’s the 17th, 18th century. And it’s a time of great scientific and cultural, like exploration and understanding. So we’ve got international trade and cultural exchange and scientific discovery is happening, like all over Europe and with this sort of explosion and in particular like scientific development, we see that people across borders or different cultures, societies want to start collaborating with each other, but they start running up against these differing [00:11:00] systems measure. So one of the key players here was James watt who you might know from like the steam engine fame, right? James Watt says, what if we all align around a standard distance of length and we can call it the meter from the Greek Metron meaning measure.
And so it might actually not have been James Watt who proposed the meter, but he was a big part of this early conversation about standardizing around measurements. Okay. So there’s a really good idea rather than defining a measurement in terms of someone’s upended, which is how it historically worked, or like how big a step is we could do it based on something more empirical in nature. And so if you create a pendulum with a length of one meter it’s period will be one second. And so that could be a really good way to define what I mean. There was a problem here, though. The earth is not a perfect sphere.
So gravity is different. There’s a different amount of like gravitational force and different parts of the planet. And in particular, it’s like weaker closer to the poles. I think the north and south pole. [00:12:00] And so the period of this pendulum will vary with latitude and they had to decide whose latitude they were going to use.
And the problem is every single participant in this conversation wanted to use a line of latitude that went through there. So it feels petty, right? It’s like we were right on the precipice of the almost like prodo metric system and every gets hung up on whose line of latitude did we use.
But actually I think it’s a really important point. And I think that folks books, I think that like that people back then were right to reject this. Because it’s fundamentally disempowering. If you can’t understand what a meter is within the borders of your country. And so if you’re in Spain, but the line of latitude to measure a meter, exactly goes through France. Like you’re disempowered, it’s no different than using someone else’s foot, or arm as the length of measure, you can’t reproduce it. So we’re not all the way there yet to the metric system, but we’ve pinned down this problem. And we’re able to start working for this solution.
[00:12:58] Towards Standardization #
Drew Banin: So it’s not the end of the [00:13:00] 18th. For instance in turmoil and we’re on the brink of, the French revolution, the enlightenment very much created the need for standardization, this like scientific revolution and collaboration across borders and cultures. And it turns out the French revolution would be the thing that provides the the opportunity for realizing it.
And so it’s this guy, Talleyrand shown here. He was one of the driving forces behind the creation of the metric system here. He is showing us how long one cubit is, which is a little bit uncharacteristic for him. Who is this guy? Who’s Talleyrand? Turns that I could talk for an hour about Talleyrand. But I couldn’t, but somebody who knew more about Talleyrand could, he’s just like a fascinating person.
It turns out he worked for Louis the 16th. He then after he was supposedly the 16th sort of working with the French revolutionaries, and then after that, And then after that Lou the 18th. So he’s a diplomat during this whole era and every step of the way he was just like constantly backstabbing, like [00:14:00] Louis the 16th and the French revolutionaries and Napoleon and so on.
Like he was taking bribes he was like double dealing. And this is a time where if you looked at someone the wrong way, you’d be convicted of treason. But not only did he get away with it, but his power and influence grew throughout the whole period. Just like a fascinating character.
Did this day, if you’re like a really skilled diplomat, you, you might be called Talleyrand after this guy. So he’s actually, in my opinion, like just a reading about him, he’s the main character. And it made me feel like an NPC personally.
So why the French revolution and the metric system, like why are these two things happening at the same time? On the Eve of the revolution, the 800 or so units of measure and. Had up to a quarter of a million different definitions because of the quantity associated with each unit could differ from town to town and from trade to trade.
It’s think about that. You’re a French I don’t know how to say it sounds cool. Oh, you’re like a French peasant, right? There’s a quarter of a million different units. You might need to [00:15:00] inter-operate with other people in your own country. There’s no Wolfram alpha good at graphing calculators, right?
Like you said, So historically a lot of this cultural and commercial and scientific exchange local, but it was now national and international. And the activation energy was now present at the French revolution to, to overcome the existing standards that were imperfect, but that worked right. The French revolution was this opportunity to change the fundamental units that everybody worked with.
And the French very much did. At least initially during the revolution. Th this actually goes far beyond the metric system. The French actually created their own calendar. And just, there’s a lot to say about the French calendar, the French revolutionary calendar. One funny thing is they changed the week to be 10 days long.
So there were like three, 10 day weeks. In a month, but nobody liked it because they still only kept one day for rest. And so everyone, instead of having a six day work week had a nine day work week. And I think they mixed that up for a couple months. So some good, some bad other [00:16:00] out of this.
[00:16:02] The metric system #
Drew Banin: Okay. So this slide is for the Americans in the room. And if any librarians are calling me for you to everyone else already gets it. The metric system. 37 base units, right? So meters, kilogram seconds, couple others. These base units are intended to be atomic. And so there’s really no overlap between the meter and the kilogram, like the fundamental thing, measuring different things.
You have as few base units of. Crucially, we don’t have different types of units for different scales. We use decimal ratios on top of the space, basically. That’s right. So instead of saying for a long is if you have enough of them become a mile, we say, okay, there’s a meter. And if you have a thousand limits a kilometer, so he’s consistent prefixes for multiples, and it helps us all understand, like what we’re talking about.
There’s a different language for different units. The scales of units, it’s all the big downside to the metric system is that it’s actually a lot of fun to like draw the big G with the fork use inside of it, or like Cortes and pints and cups and all that. But outside of that just like hard [00:17:00] to overstate the positive impacts of the metric system, right?
It’s trade it’s commerce it’s industry. It’s like a driving force behind the industrialization that happened and like the bigger arc of globalization over the next 200. So there’s one other thing here that we didn’t talk about yet. It’s this idea of a metric, I’m sorry, of a unit and the metric system being realizable.
And so this was an ideal, it was not always possible, but this concept of being realizable means that anybody, anywhere in the world can independent independently measure like the value of the base unit. And so for a base unit to be entered into the metric system, you must also submit and a procedural format, how exactly to produce.
One of that unit and like laboratory conditions. So if you want to know how long a meter is, here’s how you can do that in the lab. In practice, these things are like challenging for you and I to reproduce independently. But certainly could do it back in the day. Okay. I’m good on time.
Everyone has like more or less broadened out of the metric system and America. Okay. We’re still using parallel units and that’s okay. But if you’re doing science or sort of an industry, like it’s going to be metric for the [00:18:00] most part. So standardization leads to coordination, and a big part of that, a big part of this is having standards that are both universally accepted and rigorously.
And so the universal acceptance part is key because this will see rigorous definitions for the units of the metric system sometimes came later, but the thing that needed to come first was everyone agreeing that the kilogram and the meter and the other kind of base units were in these standards that we were going to apply.
So what we see here, this picture in the bottom right hand corner of the slide was for a time. Okay. First of all, it’s like a, it’s a metal cylinder, right? Under all these bell jars, it’s a metal cylinder. It’s like a. Iridium alloy. And up until 2019, it was the literal definition of a kilogram.
And I don’t mean that it weighed very close to a kilogram. I mean that in the sort of metric system, as an SSI unit, a kilogram was defined as the mass of that particular piece of metal. So later this was changed to something like a lot more complex. It was. The equivalent mass of the [00:19:00] energy of a photon with a given frequency.
So like how fun to reproducing that particular value, like at home. But it actually didn’t matter up until 2019 for the most part, because we had such broad alignment and acceptance of that kilogram being B kilogram that we could, put people on the moon and Savannah. So that’s one of the big takeaways here.
Both of these things must be true for a system of measurement to be successful. Like first and foremost, it must be universally accepted. We must both agree to use the kilogram, but then we can always refine exactly what a kilogram is and exactly how we define it over time. Get more and more precise.
Okay. So standards and measurements are fundamentally empowering, right? It’s one of the big arts. Why do we see standard showing up in the Magna Carta and why do we see the metric system coming out of the French revolution? Because the change in standards both requires and in some ways, facilitates upheaval.
The standards lead to networks, they [00:20:00] lead to a Federation of control and more autonomy for the people that participate in them. And the modern world. We can look at the internet, right? So these standards like HTTP and TCP, IP and DNS, like they’re standardized and they think this is still true. Like you can plug a computer into an Ethernet cord and have a webpage up on the internet.
That’s a powerful thing. It like fundamentally changes who has control in our entire global system, the internet, and say what you will about cryptocurrencies. Like I own zero Bitcoin. I’m not particularly interested in it personally, although this talk will be minted as an NFP. If you’re interested, just kidding.
And okay. But the big idea of crypto, right? It’s like talk about like distribution of control and power. Federal governments being controlled currency and banks and these institutions. It’s what if we all participated in a set of standards that we can all interact with independently?
It’s all we’re going to say about crypto [00:21:00] here, but like fundamentally I believe it’s a powerful and like transformative idea.
[00:21:06] Standardizing data #
Drew Banin: So let’s bring it, let’s bring it into our. It’s like about business. I’d argue that today we’re at pre French revolution levels of sophistication around measurement in the modern data ecosystem. We don’t have standards or consistency in how we define metrics or how we interface with metrics. We have some of these rough units, like we probably lack precision within our own organizations about what exactly.
So this is like a key insight, right? If we launch a new product and I say it, it’s Tristan oh Tristin, great product launch. We went up by 400, say 401. And if I say daily active units first of all, I’m sorry, daily active users. First of all, that’s a unit, right? Like 400 kilometers, 400 daily active users.
It’s a unit it’s like a ball. It’s that shared understanding between Trista and I, that will help us know if daily active users growing up by I’m sorry. [00:22:00] It’s a shared understanding between trust and I and very much the rest of the business that will help us understand exactly what we mean.
When we say daily active users, that’s true for revenue and trade accounts and so on and so forth. So we must ask how and where are these units defined? Do we have a metric system equivalent in our business and how, and where are these units exposed to the data tools that we use to actually interface with this data?
Getting to be more concrete, let’s imagine like a fictitious e-commerce company. They sell stuff online. And so maybe the most naive version of what’s our revenue is this particular SQL expression. So some of the order totals, but more accurately, you have to remove. Tax that was paid, right?
This is business logic, your order total minus the tax becomes revenue, but even more accurate than that, you need to remove discounts. These are the discount code. You want to move that to, this is just like such a toy example. But it’s a really good example of why the obvious answer isn’t [00:23:00] always the right one. And why we need to have shared definitions around these core constructs. Otherwise we’re going to get it wrong. It’s so remember that a good system of measure is both universally accepted and rigorously. And so the question is, does everyone always calculate revenue the right way?
[00:23:14] Metrics at work #
Drew Banin: Is it inevitable that if you’re making a chart with revenue on it that you will do so correctly? Like how close is good enough? Like it’s 90%. Okay. Do you wanna get the 95%? What does it take to get to a hundred percent so we can look at this neater on standardization in two ways. The first one is how do the teams in a business collaborate or a metric.
[00:23:38] Needs around standardization #
Drew Banin: So how are they defined? Are they documented? And when these business metrics change, when their definition has changed and sometimes they do, how do we version control those changes? And the second way to think about this is the way in which all of our myriad data tools inter-operate right. So it’s we can’t have these pockets where different cultures are using different systems, right?
Like the pre metrics pre metric system How [00:24:00] do we help everyone tap into the same sort of lineage and provenance? Like how do we make these metrics realizable? So that’s that every tool and every person using those tools can understand exactly what we mean when we say revenue and where that number comes from and all the transformations that happen to that data along the way, right?
[00:24:14] The metrics layer #
Drew Banin: How do we get everyone to participate in this sort of same semantic graph around our data? So here’s how we’re going to make it happen. The thing we’re looking at here is new and dbt version one. And if you want to hear more about what’s happening 1.0, please stay tuned for Jeremy’s talk a little bit later.
And which will cut the ribbon. But I don’t wanna still slender. We’re going to focus on metrics here. So the thing we’re looking at is the definition of a metric for new customers. And this is like a toy example. Again it’s super simplified just for this conversation, but we think about defining a metric.
It’s like the, the SQL query to express that metric on top of a model in this case, dim customers. And we can enumerate the valid dimensions through which we can explore them. We can look at the time grains. So [00:25:00] for looking at new customers, we book that daily, weekly, monthly, you can imagine quarterly annually, but not by the microsecond, it’s like not an appropriate way to explore this data.
And beyond that, thinking about like appropriate use for metrics, new customers is convenient because you can look at new customers a year to date or month over month. But if you imagine a different metric, like average order value average order value. Like you can’t sum that up year to date and you can’t do a trailing seven day average.
Like you’re not going to get a useful number back if you do that. And so this is what we have in mind when we think about a system for defining business metrics and the ability to essentially define them under version control and document them and help them participate in this broader kind of semantic graph.
Some of you who’ve been around for a while. I might say, Hey, drew, this isn’t a new problem. Our industry has been solving this Burt for decades. And the answer is to pre aggravate your data in the data warehouse. And [00:26:00] for awhile, that was the best answer. Before we talk more about metrics, I do just want to say this line, if they don’t have bread, let them, your cake is not correct.
The actual line was let them eat brioche. And the second thing is Marie Antoinette never said. So we should all collectively like and cancel her. I have a lot of respect for Marie Antoinette personally. I said, it keeps though, if you’re not familiar, there’s this concept it’s called an OLAP cube.
And the idea is you can pre aggregate data and in a database and the data warehouse along a series of well-defined dimensions. So in this example that we’re looking at here, it’s its location, its product, its shipping dates, and you can define these as dimensions and you can precalculate and tables in the database.
Some number like say like total sales or revenue, whatever it is you precalculate, that will get you standardization. It is incredibly rigid and inflexible and on modern data warehouses, just like super wasteful. More than that, like they’re really great tools in the modern data ecosystem.[00:27:00]
Don’t work with OLAP cubes because it’s like an antiquated way of doing things, right? And so you’re going to get locked into specific vendors. If you go all in, on cubes in the year 2021, this approach made sense 10, 20 years ago, but it’s not the right approach for us anymore. So just to go deeper on, on what this looks like and why it’s a challenge.
This is an example of a, of a cube that rolls up across a year state and product. And so if you imagine there’s 10 years of data with 50 states and say 10 products, you have to calculate total sales for all of those combinations of dimensions. It’s like a two, a 5,000 row table, which like no problem.
5,000 roads is not a big deal, but imagine like less of a toy example in which there are 10 dimensions and 10 possible values, which is not many for. You get this combat atrial explosion, right? And the number of rows in this table. So you attended a 10th or like 10 billion rows on this table. Most of the rows in this table are like a combination of dimensions that you would never care to [00:28:00] query, right?
Like it’s not useful. No, one’s going to look at them, but you must calculate the cube at the smallest grain that you would imagine anyone wanting to explore the data on, do you have no choice, but to calculate upfront the metric values for all of these combinations of demand.
When I see LF cubes, I think about antiquated systems of measurement. I think that we could do better. I think what it’s going to take isn’t evolution specifically. It’s going to take a revolution.
We don’t need to make this trade off anymore between consistency and flexibility. We can in fact change the way that our system of measurement works to empower these end users is ultimately like what we’re doing it for. We’re doing this. So that end users like these folks in the business can interface with the data, understand what’s happening in the business.
We can create order out of chaos and we can [00:29:00] adapt our metrics as our businesses change in real time, the future is heavily.
[00:29:11] Today’s artchitecture #
Drew Banin: Headless BI is more than just a French revolution joke. This diagram comes from Ben sub stack. And it’s really about this concept of taking metric definitions out of the data applications and pulling them into a more centralized place. So headless BI metrics layer, it’s the same concept.
The idea is. This last mile of metric calculation is certainly okay. Like we’ve done a lot of good work by pulling business logic into the transformation layer and exposing it to all the different data tools. But this last mile of actually calculating the metric along a set of specified dimensions and doing so accurately in a world where these metrics that dimension, I’m sorry, these metric values can change out from under us.
It’s just like extremely challenging creating consistency in this sort of like today’s architecture, virtual world. And so instead there’s this opportunity for dbt to serve as the metrics layer, which makes a lot of sense as an adjacency [00:30:00] to like its practices, the transformation tool. So DBQ already knows about your core entities and there’s about users and orders and sessions and the step like one step further from, okay not just sessions, but what’s the average time on site or not just users, but how many new users did we have across these dimensions and sometime time range, right? It’s I’m very logical next step. And as we talked about in one point, oh, you can now you can now also defined metrics, which specified aggregation on top of these data models and your dbt project.
Drew Banin: And so this is powered by a brand new dbt server. The dbt server, and dbt Core in general has your metric definitions in the compilation context. And so the server in particular is able to in a very like high performance way, Compile queries from all these data applications in order to generate metrics, queries on the fly.
What are we talking about here? This is like a lot of words. Let’s look at. Let’s look at code, right? Okay. Here’s an example query we’re going to do is we’re going to select new users by country and day. [00:31:00] And the thing we’re looking at, and in particular here, it’s just like a macro, it’s a macro called metric and a package called metrics, like limited amounts of magic here.
It’s a macro that generates SQL, right? dbt is very good at that. And so if we run this query from either like an ad hoc analysis tool or BI tool or a data science, like no book, whatever it is, dbt and specifically the dbt server can compile this query executed against the database and give you. A table of data that has those dimensions, that specified time grain for that metric, right?
Like new users in this case.
[00:31:41] The dbt server #
Drew Banin: So we talked about the dbt server. Some of the things to call out here it’s built to be like incredibly performant, right? That’s based processing is one thing, but being on the critical path of analytics is another. So we’re building this thing to be incredibly performant and reliable and extensible to use cases that extend even beyond like metrics calculations.
It’s a lot of ways it’s dbt [00:32:00] compilation as a service. It’s an integration point for tools that want to interface with the dbt compilation. And I think actually at this point, I’ve stopped long enough. It’s time to show it to you. Vince, do you mind hooking me up here with the, yeah, thanks a lot.
So I, to look at the screen, what we’re doing here is we’re selecting from this metric and in this case customers, and specifically, we’re going to look at it by. I’m sorry, we’re going to get a bite by year. Okay. So we’ll run this query. This gets compiled by the dbt server and executed against snowflake in this case, the data warehouse, and we get tabular data back, which is like the number of customers each of those years.
But now we can look at it across a dimension in this case, it’s like a toy example, but it’s order a total band, right? So is it a small or large order, again, like big data, but we’re on the fly, changing dimensions. We’re changing time grains in this case. Now we can look at this data. And in a second, we can look at it by the day, too.
So there’s actually like quite a bit beyond this [00:33:00] as well, but in terms of vignettes and what we can show you at this point this is like the big thing, right? So you can specify these dimensions on top of metrics that are defined in version control and. And even beyond that, we can help with a lot of the things I touched on earlier, like helping avoid average averages or computing, like seven day rolling averages, period to date, etctera, etctera.
There’s so much more to say about how this all works and Vincent. Excellent. And can you, it slides back up, please. Thank you so much. Okay. There’s a lot more to say about how this all works. And we will do so do time, but for now I want to talk about. Why this moment and why dbt it’s like any successful approach to solving this problem is going to need to be standardized and you pick witness and mature and open with dbt version one and the energy in and around this community.
[00:33:47] In the future #
Drew Banin: We’re in great position to really fundamentally change the way that organizations interface with their data. So the open core foundations of dbt it’s ubiquity in the ecosystem and the maturity that comes with. That means that we can [00:34:00] overcome the activation energy required to instill a new system of measurement together in the future.
I think that metrics are going to feel standardized within an organization. There will not be discrepancies between what the marketing team and the sales team considers to be a lead. And that would be defined under version control. And there will be change management from when that definition invariably changes because it does, and it will that’s okay.
These metrics are going to be exposed in consistent ways across data applications. So whether you’re doing data science or machine learning or BI or dashboarding, whatever it is, you’ll be able to tap into the same shared metric definitions. And you won’t have to worry about inconsistency across different tools in terms of business intelligence.
We can all imagine the ways in which it will be a boon to be able to define this, these metrics once and expose them to many different tools. But beyond that, I think it’s going to have a broader impact on what business intelligence feels like. So if we think about self-service with competence, to me, that feels like a federated [00:35:00] experience.
It’s less go to the one place to see all the dashboards. And it’s more like data in the context of where you’re doing the work. Yeah, data in the context of place you doing the work. So like quickly, the way I feel about this the way I think about this, it’s like a product management tool, right?
So we’re doing product management. It’s like tickets and are they open or closed? And what are the blockers? And so on. And then post release. I’m going into a different tool to actually understand the performance of the thing that we just released. But I would love to see that as a KPI for the epic that has been closed, like that would be exciting to me.
I think it’s a possibility, and this is no way like changes the importance of the things that we consider business intelligence today. But it’s a new way for people to have that starting. That curiosity to go from icy, what the metric is to then drilling in deeper and tools that are purpose-built for that kind of experience.
I think that’s really exciting. Okay. So standards, we see lead to [00:36:00] collaboration and entertains in this exchange of ideas and locally in the near term, I think that will get us metric consistency. I should one kind of idea around what. Longer term macro trend could feel like, maybe like business intelligence feels different, but the reality is we’re in this like innovation soup moment.
So the things that we’re building here are like totally working and they’re solving this metric consistency problem. But I think they’re going to be use cases that were not. At least I’m not clever enough to unlock quite yet. And so my ask for all of you is if you were to use this thing and you’ve got ideas about how it could help you and your team I want you to get in touch.
And if you want to integrate with this thing, if you are building a data application, we haven’t already talked to you about it. I want to talk to you about it. I want to figure out all the ways that that this thing can help us change the way fundamentally that people interface with data.
So we’ve covered a lot of ground today. I think we started like 5,000 years ago. We touched on the latest commit or so on a couple of dbt riffles as of like yesterday. And so I really shared a lot and I think I might be pressing time a little bit, but there was one [00:37:00] more thing. So we talked about this new dbt server earlier, and I know that I’m being big and we’re going to have a lot more to share soon.
We talked about how can power metric consistency. But this thing, this dbt server is a dbt code running machine, and it’s totally capable of powering develop development, use cases for dbt as well. And so I only want to show this as a vignette and here you go, this is what it feels like to run a dbt client that communicates with a dbt server.
The client here that we’ve installed is super easy to install. Okay. Sorry. I was looking at a clock. Okay. The client that I installed in that we showed in that video, it’s super easy to install. It has like minimal Python dependencies. It doesn’t even need to be written in Python because it’s fundamentally communicating with dbt over the network as a server.
[00:37:51] Timeline #
Drew Banin: Thank you, Vince. So that’s a vignette. We’re going to have a lot more to say about this. We want to put it in your hands, but we’re [00:38:00] not like all the way there yet. Closed beta currently have. It’s the metrics layer. The dbt CLI just showed off. We’re going to look to open up this beta early next year.
And ideally launched lunch. A lot of the stuff needed to come. So it might ask for all of you is if you want to play with this, if you want to integrate with it, if you have ideas about how we could do it even better, please give us a shout. We’re excited to talk to you.
That’s the metrics. Thanks everyone for coming in. I’m looking forward to jumping into the chat and talk with you about it.
Last modified on: Apr 19, 2022