Table of Contents
- • No silver bullets: Building the analytics flywheel
- • Identity Crisis: Navigating the Modern Data Organization
- • Scaling Knowledge > Scaling Bodies: Why dbt Labs is making the bet on a data literate organization
- • Down with 'data science'
- • Refactor your hiring process: a framework
- • Beyond the Box: Stop relying on your Black co-worker to help you build a diverse team
- • To All The Data Managers We've Loved Before
- • From Diverse "Humans of Data" to Data Dream "Teams"
- • From 100 spreadsheets to 100 data analysts: the story of dbt at Slido
- • New Data Role on the Block: Revenue Analytics
- • Data Paradox of the Growth-Stage Startup
- • Share. Empower. Repeat. Come learn about how to become a Meetup Organizer!
- • Keynote: How big is this wave?
- • Analytics Engineering Everywhere: Why in the Next Five Years Every Organization Will Adopt Analytics Engineering
- • The Future of Analytics is Polyglot
- • The modern data experience
- • Don't hire a data engineer...yet
- • Keynote: The Metrics System
- • This is just the beginning
- • The Future of Data Analytics
- • Coalesce After Party with Catalog & Cocktails
- • The Operational Data Warehouse: Reverse ETL, CDPs, and the future of data activation
- • Built It Once & Build It Right: Prototyping for Data Teams
- • Inclusive Design and dbt
- • Analytics Engineering for storytellers
- • When to ask for help: Modern advice for working with consultants in data and analytics
- • Smaller Black Boxes: Towards Modular Data Products
- • Optimizing query run time with materialization schedules
- • How dbt Enables Systems Engineering in Analytics
- • Operationalizing Column-Name Contracts with dbtplyr
- • Building On Top of dbt: Managing External Dependencies
- • Data as Engineering
- • Automating Ambiguity: Managing dynamic source data using dbt macros
- • Building a metadata ecosystem with dbt
- • Modeling event data at scale
- • Introducing the activity schema: data modeling with a single table
- • dbt in a data mesh world
- • Sharing the knowledge - joining dbt and "the Business" using Tāngata
- • Eat the data you have: Tracking core events in a cookieless world
- • Getting Meta About Metadata: Building Trustworthy Data Products Backed by dbt
- • Batch to Streaming in One Easy Step
- • dbt 101: Stories from real-life data practitioners + a live look at dbt
- • The Modern Data Stack: How Fivetran Operationalizes Data Transformations
- • Implementing and scaling dbt Core without engineers
- • dbt Core v1.0 Reveal ✨
- • Data Analytics in a Snowflake world
- • Firebolt Deep Dive - Next generation performance with dbt
- • The Endpoints are the Beginning: Using the dbt Cloud API to build a culture of data awareness
- • dbt, Notebooks and the modern data experience
- • You don’t need another database: A conversation with Reynold Xin (Databricks) and Drew Banin (dbt Labs)
- • Git for the rest of us
- • How to build a mature dbt project from scratch
- • Tailoring dbt's incremental_strategy to Artsy's data needs
- • Observability within dbt
- • The Call is Coming from Inside the Warehouse: Surviving Schema Changes with Automation
- • So You Think You Can DAG: Supporting data scientists with dbt packages
- • How to Prepare Data for a Product Analytics Platform
- • dbt for Financial Services: How to boost returns on your SQL pipelines using dbt, Databricks, and Delta Lake
- • Stay Calm and Query on: Root Cause Analysis for Your Data Pipelines
- • Upskilling from an Insights Analyst to an Analytics Engineer
- • Building an Open Source Data Stack
- • Trials and Tribulations of Incremental Models
Analytics Engineering for storytellers
Originally presented on 2021-12-06
Browse this talk’s Slack archives #
The day-of-talk conversation is archived here in dbt Community Slack.
Not a member of the dbt Community yet? You can join here to view the Coalesce chat archives.
Full transcript #
[00:00:00] Elize Papineau: Oh, how are you doing? Thank you for joining. Your data is flowing and dbt is running. Thank you, nonetheless, for attending Coalesce . My name is Elize Papineau. I’m a senior analytics engineer at dbt Labs. As both a token fan and a dbt fan I’m incredibly enthusiastic to host this session. Analytics engineering for storytellers presented by Winladriel of Rivendale the cousin of Gwen Windflower, who is an analytics engineer at dbt Labs.
Before we get started, let’s do some quick housekeeping. All chat conversation is taking place in the #coalesce-ae-storytellers channel of dbt’s Slack. If you are not a part of the chat you have turned to join right now, visit community.gitdbt.com and search for CoLab #coalesce-ae-storytellers.
When you enter the space, we encourage [00:01:00] you to ask questions, make comments, and react in the. We are not physically together, but we’re all here together in Slack. So let’s keep the conversation going. After the session, Winladriel will be available in the Slack channel to answer your questions, and we encourage you to post those questions throughout the check throughout the session in that channel.
So let’s get started
[00:01:23] Winladriel: Greetings. Just pondering my orb here, purple of course. I’m going to drill data elf. My cousin Winnie asked me to come here today to talk to you about something that we both have a passion for, which is analytics engineering. Now you may not realize that analytics engineering is wildly popular among the elf populations of middle earth, but I think by the end of this talk will understand why. Elves being immortal and all we value stories and song above all the material riches of the earth the door floors and their [00:02:00] jewels, etc, it’s stories and songs for us.
And what we’re here to absorb today is that analytics engineering is fundamentally about stories. And so to absorb this, we’re going to discuss the foundational relationship between data and stories. Then use an example of a story from my world to illustrate this truth. And lastly, we’ll talk about how these principles apply to your own work in your world day to day how this is the actual practical approach that my cousin Winnie uses as a consultant with dbt labs.
[00:02:33] Why do we look at data? #
[00:02:33] Winladriel: Okay. So to understand what stories have to do with data though, we first need to understand why we look at data in the first place. One of the biggest reasons is to measure right, to understand what has happened or something that is happening. And a similar reason is also to predict try to extrapolate out what’s going to happen based on what happened before.
And then lastly, sometimes just because it’s interesting even when we don’t have influence over a decision a compelling presentation of data is just fascinating to us [00:03:00] as humans or elves. So given these relatively straight forward motivations for looking at data why does this statement from my dear personal friend Boromir ring true. We have extremely powerful technical tools at our disposal. And yet it is a common statement and sentiment you’ll see all across the Dataverse. And I think it’s because the technical part at this point is in fact easy with the modern data stack, we actually can pull some data real.
But what’s hard is knowing what to pull, what to build and why that’s the much more difficult problem. And the root of that problem is that humans can’t read data. And frankly, neither can elves I hate to admit this is why there is such a term as data literacy. There, there is nothing native.
Intuitively consumable about data to the human or Elvin mind what we [00:04:00] can understand our entities for instance, a customer, a country association. So a customer being in a certain kind of. Events, for instance, a customer in a certain country has a session on our app and places, an order and variations.
So aggregating for instance the revenue month over month of these sessions and orders from a customer in a certain country, that we can undertake. So if we swap some synonyms in here and instead of entities, we think about characters, your Frodo’s your Gollum’s though or associations such as a fellowship as relationships Thinking instead of events, but of experiences, things that these characters go through and the changes that they put those characters and their relationships through if we take these synonyms together, then we understand we actually understand his story, not dating.
[00:04:56] Humans understand stories #
[00:04:56] Winladriel: We understand story and now why this is [00:05:00] important. It is maybe not obvious to you. As an elf, we’re mortal beings who live immersed and natural splendor, beauty, all that is fair. But I understand that, for a mortal life can be a bit of a battle. You gotta go to work.
You got to do a job to survive. But the secret I’m here to share though, is that the same artistry that we use to craft an ode, to, to my ancestors Baron and Lithion can be extrapolated out into a process that works just as well for understanding. Customers purchasing I don’t know, Juul, let’s say I’m not sure what people buy on your mortal internet.
But this very same pattern will work. And if we step back here and understand look back at why we look at data in the first place. We can see that all of these things, the interior components of these stories are patterns to stories and patterns is what we can understand to measure, to predict, to find threads of interest [00:06:00] and within stories themselves.
If we zoom out and compare stories to other stories we find there are these meta patterns the same rhythms. Beats and processes and transformations that we see in one story can often be applied to another. And this is why storytelling is so essential to analytics. A great example of this from your world is this is the hero’s journey from Joseph Campbell.
It is the basis for many famous what I’m told are called movies. And it is also a great example of the story we’re going to walk through from my world today. And it goes something like this, somebody who’s in the normal day-to-day world that go through various obstacles and growth, they overcome them and they reach some kind of more desired state..
[00:06:47] Customer experience journey #
[00:06:47] Winladriel: What’s interesting is this isn’t fact, the exact same story as this one, which might be a little more applicable to your day-to-day work. So just take this metal patterning a little bit further and go a little bit [00:07:00] deeper and explore just how fundamental storytelling is to analytics engineering.
We’re going to walk through telling what of most famous tails of my world, which is Frodo and the ring of. And to do we’ll use a tool that we, the elves developed in collaboration with the dwarf Lords for recording some scrolls of their merchants dealings with selling precious goods to the kingdoms of men.
We call it dwarves buried treasure. So we’ll be using that today to reconstruct the data, to tell the story of Frodo and the ring of power. Okay. So first we have to keep in mind that our data is coming from a huge variety of sources. And this is probably a pretty familiar scenario to a lot of you.
[00:07:41] Overview of the data #
[00:07:41] Winladriel: If you do analytics engineering day to day. And our goal here is going to be, to create unified record of characters and events that we can combine to tell our stories. In the essence of the storytelling approach is thinking in these terms, right? And entities and events, characters, places, and experiences, [00:08:00] and try to work out how we can bring the data together and craft these, not how we cope with the tables and columns we’re getting from.
Are these various sources? So as I said, our data comes from wildly different sources. They have very different schemas. This data on screen here. This is from the wizard’s white council’s records. They’re very old school. It’s highly normalized. It’s very Kimball star schema. You know how it wizards are.
But we also have stuff like from the Shire records where it’s very important to hop it’s average number of breakfasts foot sized, crucial, not super applicable to the story that we want to tell here to. Similarly we have some data that we want from the records of the Rangers, the men of the north and we have some other stuff that we’re not so interested in.
[00:08:48] Unioning, denormalizing and joining #
[00:08:48] Winladriel: So the first step for us is going to be unioning denormalizing and joining to create these unified concepts that. Our needs not the source [00:09:00] systems. So yeah we’ve staged these sources and even in staging them, we’re starting to use the language that will you want. And that’s the really crucial part of this step.
We’re doing this immediately as far upstream as we can to start bringing together. These modular pieces into concepts that we want in this case characters. And we’re using that language as early as possible characters. That’s really important. Another really important point about language here is that not only is our end product, the data itself, going to tell a story, but the code itself is telling a story is.
Okay. In this creative storytelling approach to analytics engineering it’s really important that we label each step in the process with specific and descriptive language. And you’ll also note that we’re leveraging things like macros in this union relations macro here to wrap complex code in descriptive language.
Not only does that, how’s and make it easier to use that [00:10:00] logic multiple times. It makes it clearer, what that logic does. The result of all of this when combined with the naturally declarative style of SQL is we get a code base that can be read by anybody regardless of their technical background, to understand how the data is changing and progressing through this sort of meta story that we’ve put together.
[00:10:22] Characters #
[00:10:22] Winladriel: I don’t need to know SQL to be able to read the labels on these CTEs and understand that we narrow the characters to the desired columns. Then we join in weapons. Then we join in locations, etc. This is really important. So bringing all this together, we end up with this really wide cohesive characters, mark, and this is what we want.
We are building a characters, mark, based on everything that we want to see, regardless of where it’s coming from. We want each row to tell some kind of story into itself. We’re not making arbitrary distinctions about what’s a fact, and what’s the dimension we’re simply concerned with characters here.
[00:10:59] Events #
[00:10:59] Winladriel: Now I put a [00:11:00] similar process through to create this events, again, came from a huge variety of sources. I had to talk to rata gas, the brown, his notes are like just completely crazy. They’re all on like tree bark. The records of Gondor or very neat. But they’re just, they go way, way back. So, a lot of data to sift through, Gondor definitely like a lot of big data going on there. And we were able to create this cohesive event log of everything that happened during the story that is called in your world, Lord of the Rings and how these things affected the fellowship of the ring.
And it’s the combination of these two things, characters and events that we can combine them in different ways in generate stories. We take this attitude of we’ve created characters in events, and they’re going to move our characters through the events and generate the stories that we’re concerned with.
[00:11:50] Story: Pippin’s tale #
[00:11:50] Winladriel: So here’s an example of joining our event data to our character data in order to fan it out to a character event. And thus we’re able to [00:12:00] see the entire story being told of my favorite Hobbit ever Pippin Peregrine Tuk. We can see him getting possession of a new Minori and dagger in the Barrow downs. His famous title, Fool of a Tuk you know what eventually becoming a guard of the Citadel of Gondor or it’s all right here from this combination of these two tapes. Similarly, we can take another view, an angle on top of our data and use it to aggregate and visualize and understand a different aspect of the story.
That way, for instance, we can see work slain by the side of each battle by each member of our fellowship, and we can draw conclusions from this, like aragorn kills a lot of works, which is, that’s great. Like I, Aragorn’s an awesome dude, orcs are bad, but it’s a lot of murder, but it’s, you know how to do it.
The point being, not that Aragorn has killed a ton of people. The point being that while the data and the story that we’re [00:13:00] looking at here in this example might be a little bit silly or different than your day-to-day work, the structures and the process are actually the same. We can imagine Pippin’s tale being the story of a customer, interacting with your product.
We can imagine these to be orders by demographic over time, right? These structures and these visualizations are the same, even if the specifics of the stories we’re telling her. So this is actually a very practical approach to doing analytics engineering, and in order to apply these principles to our work day to day I’m going to walk through three principles.
[00:13:39] 3 Principles for Analytics Storytelling #
[00:13:39] Winladriel: We can use to, to take this approach and make it work for us in the real world. Okay. Number one. Put down the broom pick up the pen. Okay. We talk a lot in analytics engineering about how naming is fundamentally important. And and hard naming things is hard is one of our [00:14:00] big slogans here at dbt labs.
And naming and language are a fundamental part of the process of analytics engineering. So it’s time we bring that same importance to bear on how we talk about a name, our own work. Here’s an example from the great Vicki boy, because of a tweet that captures a sentiment, you’ll see all over the data world, which is that, you get prepared to do a sophisticated data science or build pipelines from scratch. And then dig it up cleaning is this huge part of your job and people talk about different types of cleaning and how to approach cleaning. And I think this is all indicative of the fact that. Cleaning is not the right name for this. We’re not just making sure that things are the right data type or accounting for negative values that should be positive or dividing sense and to introduce, or any of these basic things that are part of the transformation process.
We are crafting characters and places and events. [00:15:00] It is an inherently creative work. We are. Adding new columns calculating shaping things in order that the stories that are in the data can rise out better and be more visible and understandable to us. So let’s give that work, the appreciation that it’s due and understand it as authorship.
Secondly think in characters, places and events, not in tables and columns and systems, the sooner you can get your head around approaching a database creatively in terms of. You decide and you shape what you want, the entities and the characters and the places and the events and how they relate to each other to be, and start forcing those source systems into that vision. Instead of the other way, around the sooner you can start completely leveling up the impact that you have at your [00:16:00] work. In doing this and taking this approach, as I said a little bit earlier, some terms like fact and dimension, I think start to lose a little bit of their importance and value, right?
Because we’re thinking in stories, our marts are supposed to be stories. So our users mark should be the story of our users. Our orders Mart should be the entire story of our orders. And it doesn’t matter if we need to de normalize stuff from users into orders, or if we need to fan users out based on orders to, to understand the cohesive picture of that.
What we need to be doing is thinking about building these cohesive views and then thinking about how we can combine them in different ways to really focus in. On a specific aspects of the stories that can help us either be interesting or measure or predict. As we talked about at the beginning there is a excerpt here from one of the wizards of your world.
The great Benn Stancil from his most recent article. [00:17:00] Talking about how analytics engineering should be a mundane role tedious maintenance, unglamorous, and yet it’s not people are racing to do this work. He posits that it has to do with the community. The community is fantastic. The tool is fantastic, that’s for sure, those are definitely factors, but I think more than anything, it’s what the tool unlocks it’s storytelling, right? It’s that the work is been labeled mundane, but is not. In fact it is a creative storytelling and that is a vibrant human skill and as better and better abstractions make advanced analysis and data science, more accessible and make complex data engineering, more standardized and easier to access.
Analytics engineering becomes more and more valuable because it is fundamentally a design skill. It’s not unlike UX for a data practitioners and thus it remains interesting and difficult to abstract away or to [00:18:00] automate. And that’s what to me is so exciting about this field and why getting into this storytelling approach is so valid. So lastly, because this storytelling approach is still valuable. It’s very important to unify your organization around the stories. We talked a little bit at the beginning about how. And because of the modern data stack, we have essentially limitless possibilities about what we can build. And because of that, we have to be disciplined as custodians of the organization. Story is about how and what we build. And it’s primarily our responsibility to ensure that people are creating and developing and sharing and understanding within the same world.
It’s better to go the harder route of getting three different departments working together instead of the same rich and complex and challenging story, then to craft three separate narratives or focused [00:19:00] solely on letting people write their own stories. Now this last point, that’s undoubtedly still a battle at most orgs, but with this storytelling approach in hand I think it is, as this first genre, creating paradigm, shifting generation of data, storytellers, and information authors that we are as this first generation of analytics engineers.
I think that is our story to write. Thank you all for coming and listening to my talk. And now I will answer some questions if you have some good questions coming in from the Slack I have my host Elize come back here. I can go get Winnie, my cousin back.