Table of Contents
- ⢠No silver bullets: Building the analytics flywheel
- ⢠Identity Crisis: Navigating the Modern Data Organization
- ⢠Scaling Knowledge > Scaling Bodies: Why dbt Labs is making the bet on a data literate organization
- ⢠Down with 'data science'
- ⢠Refactor your hiring process: a framework
- ⢠Beyond the Box: Stop relying on your Black co-worker to help you build a diverse team
- ⢠To All The Data Managers We've Loved Before
- ⢠From Diverse "Humans of Data" to Data Dream "Teams"
- ⢠From 100 spreadsheets to 100 data analysts: the story of dbt at Slido
- ⢠New Data Role on the Block: Revenue Analytics
- ⢠Data Paradox of the Growth-Stage Startup
- ⢠Share. Empower. Repeat. Come learn about how to become a Meetup Organizer!
- ⢠Keynote: How big is this wave?
- ⢠Analytics Engineering Everywhere: Why in the Next Five Years Every Organization Will Adopt Analytics Engineering
- ⢠The Future of Analytics is Polyglot
- ⢠The modern data experience
- ⢠Don't hire a data engineer...yet
- ⢠Keynote: The Metrics System
- ⢠This is just the beginning
- ⢠The Future of Data Analytics
- ⢠Coalesce After Party with Catalog & Cocktails
- ⢠The Operational Data Warehouse: Reverse ETL, CDPs, and the future of data activation
- ⢠Built It Once & Build It Right: Prototyping for Data Teams
- ⢠Inclusive Design and dbt
- ⢠Analytics Engineering for storytellers
- ⢠When to ask for help: Modern advice for working with consultants in data and analytics
- ⢠Smaller Black Boxes: Towards Modular Data Products
- ⢠Optimizing query run time with materialization schedules
- ⢠How dbt Enables Systems Engineering in Analytics
- ⢠Operationalizing Column-Name Contracts with dbtplyr
- ⢠Building On Top of dbt: Managing External Dependencies
- ⢠Data as Engineering
- ⢠Automating Ambiguity: Managing dynamic source data using dbt macros
- ⢠Building a metadata ecosystem with dbt
- ⢠Modeling event data at scale
- ⢠Introducing the activity schema: data modeling with a single table
- ⢠dbt in a data mesh world
- ⢠Sharing the knowledge - joining dbt and "the Business" using TÄngata
- ⢠Eat the data you have: Tracking core events in a cookieless world
- ⢠Getting Meta About Metadata: Building Trustworthy Data Products Backed by dbt
- ⢠Batch to Streaming in One Easy Step
- ⢠dbt 101: Stories from real-life data practitioners + a live look at dbt
- ⢠The Modern Data Stack: How Fivetran Operationalizes Data Transformations
- ⢠Implementing and scaling dbt Core without engineers
- ⢠dbt Core v1.0 Reveal āØ
- ⢠Data Analytics in a Snowflake world
- ⢠Firebolt Deep Dive - Next generation performance with dbt
- ⢠The Endpoints are the Beginning: Using the dbt Cloud API to build a culture of data awareness
- ⢠dbt, Notebooks and the modern data experience
- ⢠You donāt need another database: A conversation with Reynold Xin (Databricks) and Drew Banin (dbt Labs)
- ⢠Git for the rest of us
- ⢠How to build a mature dbt project from scratch
- ⢠Tailoring dbt's incremental_strategy to Artsy's data needs
- ⢠Observability within dbt
- ⢠The Call is Coming from Inside the Warehouse: Surviving Schema Changes with Automation
- ⢠So You Think You Can DAG: Supporting data scientists with dbt packages
- ⢠How to Prepare Data for a Product Analytics Platform
- ⢠dbt for Financial Services: How to boost returns on your SQL pipelines using dbt, Databricks, and Delta Lake
- ⢠Stay Calm and Query on: Root Cause Analysis for Your Data Pipelines
- ⢠Upskilling from an Insights Analyst to an Analytics Engineer
- ⢠Building an Open Source Data Stack
- ⢠Trials and Tribulations of Incremental Models
Analytics Engineering for storytellers
Browse this talkās Slack archives #
The day-of-talk conversation is archived here in dbt Community Slack.
Not a member of the dbt Community yet? You can join here to view the Coalesce chat archives.
Full transcript #
[00:00:00] Elize Papineau: Oh, how are you doing? Thank you for joining. Your data is flowing and dbt is running. Thank you, nonetheless, for attending Coalesce . My name is Elize Papineau. Iām a senior analytics engineer at dbt Labs. As both a token fan and a dbt fan Iām incredibly enthusiastic to host this session. Analytics engineering for storytellers presented by Winladriel of Rivendale the cousin of Gwen Windflower, who is an analytics engineer at dbt Labs.
Before we get started, letās do some quick housekeeping. All chat conversation is taking place in the #coalesce-ae-storytellers channel of dbtās Slack. If you are not a part of the chat you have turned to join right now, visit community.gitdbt.com and search for CoLab #coalesce-ae-storytellers.
When you enter the space, we encourage [00:01:00] you to ask questions, make comments, and react in the. We are not physically together, but weāre all here together in Slack. So letās keep the conversation going. After the session, Winladriel will be available in the Slack channel to answer your questions, and we encourage you to post those questions throughout the check throughout the session in that channel.
So letās get started
[00:01:23] Winladriel: Greetings. Just pondering my orb here, purple of course. Iām going to drill data elf. My cousin Winnie asked me to come here today to talk to you about something that we both have a passion for, which is analytics engineering. Now you may not realize that analytics engineering is wildly popular among the elf populations of middle earth, but I think by the end of this talk will understand why. Elves being immortal and all we value stories and song above all the material riches of the earth the door floors and their [00:02:00] jewels, etc, itās stories and songs for us.
And what weāre here to absorb today is that analytics engineering is fundamentally about stories. And so to absorb this, weāre going to discuss the foundational relationship between data and stories. Then use an example of a story from my world to illustrate this truth. And lastly, weāll talk about how these principles apply to your own work in your world day to day how this is the actual practical approach that my cousin Winnie uses as a consultant with dbt labs.
[00:02:33] Why do we look at data? #
[00:02:33] Winladriel: Okay. So to understand what stories have to do with data though, we first need to understand why we look at data in the first place. One of the biggest reasons is to measure right, to understand what has happened or something that is happening. And a similar reason is also to predict try to extrapolate out whatās going to happen based on what happened before.
And then lastly, sometimes just because itās interesting even when we donāt have influence over a decision a compelling presentation of data is just fascinating to us [00:03:00] as humans or elves. So given these relatively straight forward motivations for looking at data why does this statement from my dear personal friend Boromir ring true. We have extremely powerful technical tools at our disposal. And yet it is a common statement and sentiment youāll see all across the Dataverse. And I think itās because the technical part at this point is in fact easy with the modern data stack, we actually can pull some data real.
But whatās hard is knowing what to pull, what to build and why thatās the much more difficult problem. And the root of that problem is that humans canāt read data. And frankly, neither can elves I hate to admit this is why there is such a term as data literacy. There, there is nothing native.
Intuitively consumable about data to the human or Elvin mind what we [00:04:00] can understand our entities for instance, a customer, a country association. So a customer being in a certain kind of. Events, for instance, a customer in a certain country has a session on our app and places, an order and variations.
So aggregating for instance the revenue month over month of these sessions and orders from a customer in a certain country, that we can undertake. So if we swap some synonyms in here and instead of entities, we think about characters, your Frodoās your Gollumās though or associations such as a fellowship as relationships Thinking instead of events, but of experiences, things that these characters go through and the changes that they put those characters and their relationships through if we take these synonyms together, then we understand we actually understand his story, not dating.
[00:04:56] Humans understand stories #
[00:04:56] Winladriel: We understand story and now why this is [00:05:00] important. It is maybe not obvious to you. As an elf, weāre mortal beings who live immersed and natural splendor, beauty, all that is fair. But I understand that, for a mortal life can be a bit of a battle. You gotta go to work.
You got to do a job to survive. But the secret Iām here to share though, is that the same artistry that we use to craft an ode, to, to my ancestors Baron and Lithion can be extrapolated out into a process that works just as well for understanding. Customers purchasing I donāt know, Juul, letās say Iām not sure what people buy on your mortal internet.
But this very same pattern will work. And if we step back here and understand look back at why we look at data in the first place. We can see that all of these things, the interior components of these stories are patterns to stories and patterns is what we can understand to measure, to predict, to find threads of interest [00:06:00] and within stories themselves.
If we zoom out and compare stories to other stories we find there are these meta patterns the same rhythms. Beats and processes and transformations that we see in one story can often be applied to another. And this is why storytelling is so essential to analytics. A great example of this from your world is this is the heroās journey from Joseph Campbell.
It is the basis for many famous what Iām told are called movies. And it is also a great example of the story weāre going to walk through from my world today. And it goes something like this, somebody whoās in the normal day-to-day world that go through various obstacles and growth, they overcome them and they reach some kind of more desired state..
[00:06:47] Customer experience journey #
[00:06:47] Winladriel: Whatās interesting is this isnāt fact, the exact same story as this one, which might be a little more applicable to your day-to-day work. So just take this metal patterning a little bit further and go a little bit [00:07:00] deeper and explore just how fundamental storytelling is to analytics engineering.
Weāre going to walk through telling what of most famous tails of my world, which is Frodo and the ring of. And to do weāll use a tool that we, the elves developed in collaboration with the dwarf Lords for recording some scrolls of their merchants dealings with selling precious goods to the kingdoms of men.
We call it dwarves buried treasure. So weāll be using that today to reconstruct the data, to tell the story of Frodo and the ring of power. Okay. So first we have to keep in mind that our data is coming from a huge variety of sources. And this is probably a pretty familiar scenario to a lot of you.
[00:07:41] Overview of the data #
[00:07:41] Winladriel: If you do analytics engineering day to day. And our goal here is going to be, to create unified record of characters and events that we can combine to tell our stories. In the essence of the storytelling approach is thinking in these terms, right? And entities and events, characters, places, and experiences, [00:08:00] and try to work out how we can bring the data together and craft these, not how we cope with the tables and columns weāre getting from.
Are these various sources? So as I said, our data comes from wildly different sources. They have very different schemas. This data on screen here. This is from the wizardās white councilās records. Theyāre very old school. Itās highly normalized. Itās very Kimball star schema. You know how it wizards are.
But we also have stuff like from the Shire records where itās very important to hop itās average number of breakfasts foot sized, crucial, not super applicable to the story that we want to tell here to. Similarly we have some data that we want from the records of the Rangers, the men of the north and we have some other stuff that weāre not so interested in.
[00:08:48] Unioning, denormalizing and joining #
[00:08:48] Winladriel: So the first step for us is going to be unioning denormalizing and joining to create these unified concepts that. Our needs not the source [00:09:00] systems. So yeah weāve staged these sources and even in staging them, weāre starting to use the language that will you want. And thatās the really crucial part of this step.
Weāre doing this immediately as far upstream as we can to start bringing together. These modular pieces into concepts that we want in this case characters. And weāre using that language as early as possible characters. Thatās really important. Another really important point about language here is that not only is our end product, the data itself, going to tell a story, but the code itself is telling a story is.
Okay. In this creative storytelling approach to analytics engineering itās really important that we label each step in the process with specific and descriptive language. And youāll also note that weāre leveraging things like macros in this union relations macro here to wrap complex code in descriptive language.
Not only does that, howās and make it easier to use that [00:10:00] logic multiple times. It makes it clearer, what that logic does. The result of all of this when combined with the naturally declarative style of SQL is we get a code base that can be read by anybody regardless of their technical background, to understand how the data is changing and progressing through this sort of meta story that weāve put together.
[00:10:22] Characters #
[00:10:22] Winladriel: I donāt need to know SQL to be able to read the labels on these CTEs and understand that we narrow the characters to the desired columns. Then we join in weapons. Then we join in locations, etc. This is really important. So bringing all this together, we end up with this really wide cohesive characters, mark, and this is what we want.
We are building a characters, mark, based on everything that we want to see, regardless of where itās coming from. We want each row to tell some kind of story into itself. Weāre not making arbitrary distinctions about whatās a fact, and whatās the dimension weāre simply concerned with characters here.
[00:10:59] Events #
[00:10:59] Winladriel: Now I put a [00:11:00] similar process through to create this events, again, came from a huge variety of sources. I had to talk to rata gas, the brown, his notes are like just completely crazy. Theyāre all on like tree bark. The records of Gondor or very neat. But theyāre just, they go way, way back. So, a lot of data to sift through, Gondor definitely like a lot of big data going on there. And we were able to create this cohesive event log of everything that happened during the story that is called in your world, Lord of the Rings and how these things affected the fellowship of the ring.
And itās the combination of these two things, characters and events that we can combine them in different ways in generate stories. We take this attitude of weāve created characters in events, and theyāre going to move our characters through the events and generate the stories that weāre concerned with.
[00:11:50] Story: Pippinās tale #
[00:11:50] Winladriel: So hereās an example of joining our event data to our character data in order to fan it out to a character event. And thus weāre able to [00:12:00] see the entire story being told of my favorite Hobbit ever Pippin Peregrine Tuk. We can see him getting possession of a new Minori and dagger in the Barrow downs. His famous title, Fool of a Tuk you know what eventually becoming a guard of the Citadel of Gondor or itās all right here from this combination of these two tapes. Similarly, we can take another view, an angle on top of our data and use it to aggregate and visualize and understand a different aspect of the story.
That way, for instance, we can see work slain by the side of each battle by each member of our fellowship, and we can draw conclusions from this, like aragorn kills a lot of works, which is, thatās great. Like I, Aragornās an awesome dude, orcs are bad, but itās a lot of murder, but itās, you know how to do it.
The point being, not that Aragorn has killed a ton of people. The point being that while the data and the story that weāre [00:13:00] looking at here in this example might be a little bit silly or different than your day-to-day work, the structures and the process are actually the same. We can imagine Pippinās tale being the story of a customer, interacting with your product.
We can imagine these to be orders by demographic over time, right? These structures and these visualizations are the same, even if the specifics of the stories weāre telling her. So this is actually a very practical approach to doing analytics engineering, and in order to apply these principles to our work day to day Iām going to walk through three principles.
[00:13:39] 3 Principles for Analytics Storytelling #
[00:13:39] Winladriel: We can use to, to take this approach and make it work for us in the real world. Okay. Number one. Put down the broom pick up the pen. Okay. We talk a lot in analytics engineering about how naming is fundamentally important. And and hard naming things is hard is one of our [00:14:00] big slogans here at dbt labs.
And naming and language are a fundamental part of the process of analytics engineering. So itās time we bring that same importance to bear on how we talk about a name, our own work. Hereās an example from the great Vicki boy, because of a tweet that captures a sentiment, youāll see all over the data world, which is that, you get prepared to do a sophisticated data science or build pipelines from scratch. And then dig it up cleaning is this huge part of your job and people talk about different types of cleaning and how to approach cleaning. And I think this is all indicative of the fact that. Cleaning is not the right name for this. Weāre not just making sure that things are the right data type or accounting for negative values that should be positive or dividing sense and to introduce, or any of these basic things that are part of the transformation process.
We are crafting characters and places and events. [00:15:00] It is an inherently creative work. We are. Adding new columns calculating shaping things in order that the stories that are in the data can rise out better and be more visible and understandable to us. So letās give that work, the appreciation that itās due and understand it as authorship.
Secondly think in characters, places and events, not in tables and columns and systems, the sooner you can get your head around approaching a database creatively in terms of. You decide and you shape what you want, the entities and the characters and the places and the events and how they relate to each other to be, and start forcing those source systems into that vision. Instead of the other way, around the sooner you can start completely leveling up the impact that you have at your [00:16:00] work. In doing this and taking this approach, as I said a little bit earlier, some terms like fact and dimension, I think start to lose a little bit of their importance and value, right?
Because weāre thinking in stories, our marts are supposed to be stories. So our users mark should be the story of our users. Our orders Mart should be the entire story of our orders. And it doesnāt matter if we need to de normalize stuff from users into orders, or if we need to fan users out based on orders to, to understand the cohesive picture of that.
What we need to be doing is thinking about building these cohesive views and then thinking about how we can combine them in different ways to really focus in. On a specific aspects of the stories that can help us either be interesting or measure or predict. As we talked about at the beginning there is a excerpt here from one of the wizards of your world.
The great Benn Stancil from his most recent article. [00:17:00] Talking about how analytics engineering should be a mundane role tedious maintenance, unglamorous, and yet itās not people are racing to do this work. He posits that it has to do with the community. The community is fantastic. The tool is fantastic, thatās for sure, those are definitely factors, but I think more than anything, itās what the tool unlocks itās storytelling, right? Itās that the work is been labeled mundane, but is not. In fact it is a creative storytelling and that is a vibrant human skill and as better and better abstractions make advanced analysis and data science, more accessible and make complex data engineering, more standardized and easier to access.
Analytics engineering becomes more and more valuable because it is fundamentally a design skill. Itās not unlike UX for a data practitioners and thus it remains interesting and difficult to abstract away or to [00:18:00] automate. And thatās what to me is so exciting about this field and why getting into this storytelling approach is so valid. So lastly, because this storytelling approach is still valuable. Itās very important to unify your organization around the stories. We talked a little bit at the beginning about how. And because of the modern data stack, we have essentially limitless possibilities about what we can build. And because of that, we have to be disciplined as custodians of the organization. Story is about how and what we build. And itās primarily our responsibility to ensure that people are creating and developing and sharing and understanding within the same world.
Itās better to go the harder route of getting three different departments working together instead of the same rich and complex and challenging story, then to craft three separate narratives or focused [00:19:00] solely on letting people write their own stories. Now this last point, thatās undoubtedly still a battle at most orgs, but with this storytelling approach in hand I think it is, as this first genre, creating paradigm, shifting generation of data, storytellers, and information authors that we are as this first generation of analytics engineers.
I think that is our story to write. Thank you all for coming and listening to my talk. And now I will answer some questions if you have some good questions coming in from the Slack I have my host Elize come back here. I can go get Winnie, my cousin back.
Last modified on: Apr 22, 2022