Table of Contents
- • No silver bullets: Building the analytics flywheel
- • Identity Crisis: Navigating the Modern Data Organization
- • Scaling Knowledge > Scaling Bodies: Why dbt Labs is making the bet on a data literate organization
- • Down with 'data science'
- • Refactor your hiring process: a framework
- • Beyond the Box: Stop relying on your Black co-worker to help you build a diverse team
- • To All The Data Managers We've Loved Before
- • From Diverse "Humans of Data" to Data Dream "Teams"
- • From 100 spreadsheets to 100 data analysts: the story of dbt at Slido
- • New Data Role on the Block: Revenue Analytics
- • Data Paradox of the Growth-Stage Startup
- • Share. Empower. Repeat. Come learn about how to become a Meetup Organizer!
- • Keynote: How big is this wave?
- • Analytics Engineering Everywhere: Why in the Next Five Years Every Organization Will Adopt Analytics Engineering
- • The Future of Analytics is Polyglot
- • The modern data experience
- • Don't hire a data engineer...yet
- • Keynote: The Metrics System
- • This is just the beginning
- • The Future of Data Analytics
- • Coalesce After Party with Catalog & Cocktails
- • The Operational Data Warehouse: Reverse ETL, CDPs, and the future of data activation
- • Built It Once & Build It Right: Prototyping for Data Teams
- • Inclusive Design and dbt
- • Analytics Engineering for storytellers
- • When to ask for help: Modern advice for working with consultants in data and analytics
- • Smaller Black Boxes: Towards Modular Data Products
- • Optimizing query run time with materialization schedules
- • How dbt Enables Systems Engineering in Analytics
- • Operationalizing Column-Name Contracts with dbtplyr
- • Building On Top of dbt: Managing External Dependencies
- • Data as Engineering
- • Automating Ambiguity: Managing dynamic source data using dbt macros
- • Building a metadata ecosystem with dbt
- • Modeling event data at scale
- • Introducing the activity schema: data modeling with a single table
- • dbt in a data mesh world
- • Sharing the knowledge - joining dbt and "the Business" using Tāngata
- • Eat the data you have: Tracking core events in a cookieless world
- • Getting Meta About Metadata: Building Trustworthy Data Products Backed by dbt
- • Batch to Streaming in One Easy Step
- • dbt 101: Stories from real-life data practitioners + a live look at dbt
- • The Modern Data Stack: How Fivetran Operationalizes Data Transformations
- • Implementing and scaling dbt Core without engineers
- • dbt Core v1.0 Reveal ✨
- • Data Analytics in a Snowflake world
- • Firebolt Deep Dive - Next generation performance with dbt
- • The Endpoints are the Beginning: Using the dbt Cloud API to build a culture of data awareness
- • dbt, Notebooks and the modern data experience
- • You don’t need another database: A conversation with Reynold Xin (Databricks) and Drew Banin (dbt Labs)
- • Git for the rest of us
- • How to build a mature dbt project from scratch
- • Tailoring dbt's incremental_strategy to Artsy's data needs
- • Observability within dbt
- • The Call is Coming from Inside the Warehouse: Surviving Schema Changes with Automation
- • So You Think You Can DAG: Supporting data scientists with dbt packages
- • How to Prepare Data for a Product Analytics Platform
- • dbt for Financial Services: How to boost returns on your SQL pipelines using dbt, Databricks, and Delta Lake
- • Stay Calm and Query on: Root Cause Analysis for Your Data Pipelines
- • Upskilling from an Insights Analyst to an Analytics Engineer
- • Building an Open Source Data Stack
- • Trials and Tribulations of Incremental Models
Don't hire a data engineer...yet
“Stef, you’re responsible for retention” is what the CEO said to me when he dropped by my desk on my first day as the founding analyst at QuizUp. With no idea what that really meant, I accepted the challenge.
Getting your data to serve your organization fast enough and at scale can seem daunting. It’s tempting to see this as an engineering problem. You might even try to recruit a data engineer to solve this. But that is not your first step.
Your first problem is a culture problem. How might we make analytics a part of the company culture? Something that engineers care about as much as product managers? In this talk, I’ll walk through what I think it takes to build the mind sets, internal relationships, and knowledge to empower and develop a powerful data team. A team of curious, question-asking, story-telling analytical minds, that have enough technical capabilities to hack together complex and scattered data sets and extract and deliver insights. A group of people recently coined purple people.
Browse this talk’s Slack archives #
The day-of-talk conversation is archived here in dbt Community Slack.
Not a member of the dbt Community yet? You can join here to view the Coalesce chat archives.
Full transcript #
Jillian: [00:00:00] Hello, everyone and welcome. This is Jillian, your purple muse here. I’m on the community team at dbt Labs. I hope that you still got some excitement left. That is next session with the oh so cheeky title, Don’t Hire a Data Engineer…Yet. If you haven’t yet come join the conversation in Slack #coalesce-no-de-yet. And without further ado, it is my absolute treat to introduce you to our speaker, Stefania Olafsdottir. Stef is the CEO and co-founder of Avo, a data planning and governance platform for product analytics. Today, she’s here to share with us how her own journey on the frontline of analytics helped shape the vision for Avo and fueled Stef’s passion for making data culture, a priority.
If you like what you hear today, be sure to check out Stef’s podcast, The Right Track, where she interviews leaders across data engineering and product. It’s definitely saving time to take live questions. So please bring it up in the Slack. And with that, I will [00:01:00] pass the mic over to you, Stef.
Stefania Olafsdottir: All right. Thank you, Jillian. Am I live? Is that happening? I assuming the silence means yes. Yes. Okay. Hi everyone. I am incredibly excited to be here. It is a mind blowing lineup. It’s just been an incredible talk after an incredible talk and I’m super stoked also after the metrics layer hype that was going on earlier. Thank you, Drew, it made my day even better than it was, and it was already pretty good because I was doing this. Very honored to be among all of these great folks. I love the dbt community. And it’s something unique. It’s a group of people just that’s very inspiring and it’s super inspiring how everyone supports each other. So thank you dbt team for organizing this incredible event and specially Jillian. Special shoutout to you Jillian for all the support in preparing for this and [00:02:00] being literally and figuratively purple.
I am here very excited to convince everyone today to not set data engineers up for failure. I hope you’ll stick around on that journey with me and please join the Slack channel, like Jillian was mentioning, Coalesce No DE Yet, I believe. I will stick around after the talk. I am in Iceland as I just posted in the channel.
So it’s already, I think after midnight, but I am a night owl, so it’s perfect. I also carved out some time this week to engage with everyone on Slack, Twitter, LinkedIn, wherever you all might be, because I really want to start a conversation. Yeah. So to start off, a quick who am I?
So I am a mathematician and a philosopher. That’s what I studied. Then I went into genetics doing a lot of data engineering there. I was correlating physical traits with DNA mutations and doing distributed [00:03:00] computing and all sorts of stuff like that. And we were doing what the world at that time was calling big data, but what everyone in that industry at that time hated that they were calling it.
But we were doing, whatever they were calling it. From there, I went on to be the founding analyst at QuizUp and I am going to focus a lot of my takeaways today on that journey as a case study for what I’m arguing for, do not set the data engineer on for failure. And after that, I co-founded a company called Viska, which was gamified microlearning for employees. And very soon after that, the passion that we developed at QuizUp for data quality came back to me and we ended up going over to Avo. So Avo is the analytics, planning and governance platform specifically for product analytics. And we help the product managers, developers, and data science collaborate to release better analytics for every product release, but let’s get into this.
Stef, you [00:04:00] are responsible for retention is what the CEO is casually dropped into a conversation when he dropped on my desk on my first day, as the founding analyst at QuizUp. And with no idea what that really meant for my role, I accepted the challenge. It felt a little bit out of reach because I’m an analyst, or a data scientist or a data engineer or whatever we call it. But I did accept the challenge. Like I said, I had just come from the world of genetics working in a semi academic environment, doing all the stuff that I was talking about earlier. So I’ve been doing a lot of data engineering, but I actually hadn’t been calling it that. I know there are already some great conversations going on this conference about what do we even call our roles. So I related heavily to that and just want to echo so heavily everything that Emily was talking about in her call, I ask every person that I interview on The Right Track, What do people call themselves in your role. And do you think a data scientist is a helpful title.
But at QuizUp, I [00:05:00] quickly learned that while I’d need to do a lot of data engineering to build a functional data culture, I would also regularly need to put my data engineering hat off back on the hat rack and focused on the people and the processes. And that was like a huge part of my learning experience over at QuizUp.
And that is a lot of the content of my talk today. I want to talk about how you can set your data engineer up for failure. And particularly I want to focus on when we’re asking data engineers to join our company and solve this as an engineering problem, solve our product data quality problems as attack problem. I think we’re setting them up for failure. And I think we need to change that. We need to inject product analytics, it’s quality, it’s relevance, and it’s reliability into the product release process instead of considering this a problem that should be solved after the fact with data engineering. And this is a [00:06:00] reference to that sort of Stef, you’re responsible for retention.
[00:06:03] Setting your data engineer up for failure #
Stefania Olafsdottir: Because I didn’t have any, I wasn’t injected in the product release process. So just a few set of traps basically that I want to map out. I guess getting your data to serve your organization fast enough and at scale can seem very daunting and it is tempting to see this as an engineering problem. You might even try to recruit a data engineer which, I did. I tried to recruit a data engineer to solve our problems along the way. But, I learned along the way that we’re so far from our first step and our first step really was a culture problem. But I want to first off, basically is asking data engineering to solve all your product data calls.
And another trap is making analytics and data quality an afterthought. I think we should really tried to try to nick the cat in the butter. Is that a phrase? I’m going off [00:07:00] script here. And try to fix that problem and I’ll go a little bit deeper into that conversation.
And then the third one, I want to highlight just generally, another trap, try to solve your data quality problems reactively with data engineering. I really think that we need to start getting proactive about this because trying to solve your data quality problems reactively with data engineering alone is like herding cats. A key thing to have in mind here is that product analytics is an ever moving target.
And I’ll go into that also a little bit deeper into that in the talk. So we need to be proactive and attack the root problem, that data creation process. I do believe that we need a lot of data engineering along the way. And we do need a lot of tooling. But an important part of this strategy is to inject data quality into the product development processes.
We need to focus on bridging the gap between the data producer and the data consumer which in this context for me means the product engineers who actually [00:08:00] write the code to generate analytics events that represent the user experience so those are the data producers and the product managers who are the data users.
The role of the early data team needs to be to bridge that gap between product engineering and the business. And so you have to be that bridge yourself until you’ve built it and can remove yourself as the middleman. But, so how might we make analytics a part of the product release process?
[00:08:27] How is product analytics data different? #
Stefania Olafsdottir: To start that conversation off and set that stage, I want to highlight how product analytics data is different from other data. I’m going to start with a truism here. I think we all probably agree in modern product companies, organizations, product insights are fundamental to the business strategy, measuring and impacting the end user experience with product analytics is literally what makes or breaks today’s businesses.
And then feeding of course, all of that data into all of the engagement tools and all that juicy stuff. But literally, how is it different? The thing [00:09:00] again, with product analytics data, it is an ever moving target. It is unlike many other datasets and most other datasets because its structures change with every single product release that is different from most of our third-party data, for example, Stripe data or Salesforce data, you tried to set that up and you’re not constantly changing the structures of those datasets. But you are doing that with your product because your product is in constant molting in itself. And so your product analytics data is also in constant molting and it’s also different from most of our operational datasets.
Let me just add this point here. It’s also different from most of the other sort of operational datasets. For example, the things that run your application, like the databases that run your application because there’s a close connection between the data producer and the data consumer in those operational data set management.
So [00:10:00] it’s tricky. And just to hammer in that fact about how much of an evermore moving target this is. Product organizations update their product analytics, data structures, two to 20 times per month. And even more sometimes this is actually based on data about our customers, but it’s also based on like the conversations that we had with hundreds of product managers, data, scientists, and engineers on how they were managing their analytics releases.
We wouldn’t be on this journey if we hadn’t confirmed that before that. And this means basically we can’t start a data engineering project to fix the data for our product analytics. It’s not a one-time fix. It also doesn’t work to have a data engineer responsible for fixing the data after every release. The data engineer will always be in reactive mode. They will always be chasing their tail. They [00:11:00] will always be responding to data issues as long as there are product releases happening in the organization. And so instead we need to build infrastructure and processes to empower product managers and product engineers to shift fast with updated product analytics data for every single product release. And they need to be able to do that without breaking the data and without slowing down the product release while they make sure they don’t break the data. Again, of course you need a lot of data engineering for in those early stages of building out a data culture. But solving data quality needs to start by building the bridge between product managers and product developers before the data gets released. And I think this is such a fundamental part of being an early data team member. And I think a lot of companies, and I see this a lot.
I’ve gone through consulting with hiring with a lot of companies. And a lot of them throw together a job application for data engineers with not a lot of thinking through how they might need to change their cultures. So I just want to add this cutie here. I think, [00:12:00] this is how I felt a lot of the time when I was trying to get our data quality up to standards.
Being in this reactive mode and just taking in or, getting the question from a product manager, like how did this release go? And you’re like, what release? I’m sorry, I don’t know what you’re talking about. And they’re like, oh, that release, that released yesterday. And they’re like did you, and you ask them, did you add analytics for it?
[00:12:23] The unit test analogy #
Stefania Olafsdottir: And they say to you, didn’t you? But you’re not in the loop as a data engineer or as a data scientist. And so this matters so much. And I want to make one analogy. I want to make an analogy to unit tests. I think managing product analytics data should be like managing your unit tests.
It should be done without a middleman. In the modern software development life cycle of continuous deployment, we write our own unit tests. We don’t have a middleman. Ensuring that our code works is part of releasing a product. In the old [00:13:00] days, a test engineer might have been responsible for testing other people’s code before releasing it.
But we’ve learned that it’s way more efficient to write the unit tests for your own code as you write that code. It doesn’t scale to have some person dropping in to some other person’s code, to write tests for it before it gets released. So we close that loop and software developers write their own unit tests for their own code and that’s how it works.
And but it wasn’t always like that. Like I said in the old days it was different. And I think we are going through a similar transition with product analytics, data quality, making sure your data quality is okay. So I think the same applies for product analytics. Product analytics is a part of releasing a product.
In a more modern organization, we update our product data with every product release. The tracking plan gets updated for every release. It doesn’t scale to have a data engineer or data scientist manage the data structures and the data quality of your product releases. But today [00:14:00] there is a middleman so often in this process and they’re often hanging out in the cooler, and they’re left there managing the data quality for product analytics.
And then they’re scrambling to get in there. And I really think that it shouldn’t be like that. And like Jillian talked about in the intro, I’ve been interviewing a lot of folks on The Right Track. And a lot of my passions comes from this data culture perspective.
And I always ask people, what is your org structure? Who does data report to and how do they work together with the product team? And I think it is synonymous with success with a successfully run product organization that they’ve gone through. The transition that I’m going to talk to you about later of navigating from centralized data teams to some sort of a self serve analytics model to some sort of an analytics governance trying to fix that status.
So ultimately I think we need to inject product data quality into the product development release [00:15:00] cycle. Please. I would love us to bridge the gap between the data producer and the data consumer, and again, to hammer in what I mean in this context with data producer and data consumer. Data producer as the product engineer who writes code to generate analytics event that represent our user experience. And the data consumers, the product manager, who is the end-user of the data, and just strive to remove the middleman.
[00:15:23] From data engineering to self-serve analytics governance #
Stefania Olafsdottir: And so how do we get from data engineering that doesn’t scale to self-serve analytics governance.
I want to walk you through our journey at QuizUp, scaling the company to a hundred people and a hundred million users. And so we went through that journey of centralized analytics team, a centralized data team. I actually like to call it centralized BI team because it makes it sound even more antiquated.
But the problem there was basically decision-makings bottleneck by human throughput. And I will go deeper into that. And then we went onto trying to build self-serve analytics, where decision-making [00:16:00] was bottlenecked by lack of data quality and data literacy. And I’ll also go deeper into that.
And then we went into the centralized analytics governance model. I’m trying to have us as data engineers or data scientists control the schemas of release and asked to be in the loop to make sure we don’t ship data that is bad. But that meant a new problem. Product releases were bought backed by schema management.
And so that’s also very frustrating. And so we even took it further and I’m also going to go deeper into that. And to introducing self-serve analytics governance where teams were shipping fast without compromising. I laugh at this because this sounds so scary to anyone who has ever managed data and had to fix a lot of data, like just, offering some people or offering people that haven’t worked with data searchers to create their own data structures can sound really scary when you’re the one that have to clean up the mess after it.
But I really want to [00:17:00] advocate for us adding some trust to our team members and trying to build a more inclusive culture. So the most important focus for us was always on bridging the gap between the business and the product engineering. And yes, we did a ton of data engineering, but the most impactful thing we did for data quality was to bridge this gap really.
And so if I go deeper into what this means we started off, I started off as a data team of one, Answering questions, being proactive about finding new things, help make decisions. That was my to-do basically. The problem was we needed to answer more questions faster and so we hired more data scientists and we grew into sort of a centralized team of four.
The problem was still that decision making was still bottlenecked by the human throughput. The four analysts that were there or data scientists or data engineers, no one could answer data questions without hacking together some datasets. Only the data scientists knew where to find the data and what to do with it.
[00:18:00] Most of the team was fully dependent on data scientists using data engineering, to munch the data for them. And it just did not scale. Yeah, the solution, we’ll hire more data people. Sorry about that. But to quickly give you insights into what we were trying to answer, like what questions were we looking into?
So this is an example of a, so QuizUp had a lot of value, a lot of value for a lot of people. Like I said, it had a lot of users. It was the fastest growing app in the app store at the time, we’d reached a million users in its first five days was, which was yeah, it was the fast growing app in the app store at the time. It was later actually that record was later beat by Flappy Bird.
I don’t know if anyone here remembers Flappy Bird. But that was a quite addictive game. Very upsetting. And so this is an actual picture of people that met playing the Big Bang Theory topic on QuizUp. So QuizUp was a trivia game where you met with random people around the world. You could also challenge your friends.
And a lot of communities got built through this. And a lot of like our CEO got invited to weddings, [00:19:00] numerous occasions, so it was huge. And ultimately, we were tasked with finding the value in QuizUp which was a really exciting challenge. And what we were doing is we were trying to answer questions, like did they play with a friend or a random person in the world? Did that impact their experience? How many questions did they get? Did that impact their experience? Can we create a first game onboarding experience where you just proportionately easier questions or a bot that you play with that you maybe win. So you always win your game.
Look, all of these things we were trying to look into and we did all sorts of stuff like random forest analysis to look into all of these things and how they impacted things like retention. And this one, of course like this, did someone propose to them as a question that we were trying to answer. But these are all the questions.
There was so much other stuff going on here, but we were ultimately trying to really uncover the value of QuizUp so that we could scale the product. And so again we were [00:20:00] situated in that centralized BI team model. But we started to mold, the problem was still that the data people was the bottleneck and we decided to start to mold together.
The culture towards self-serve analytics. We were going to support people in looking things up themselves. When we’re doing that, we were also begging product engineers to set up tracking for our user experience so that people beyond data scientists would have some sort of a point and click interface to answer these questions.
We’re answering at least the questions that they needed for their day-to-day jobs. But again, the problem was that the decision making was still completely bottlenecked by the lack of data quality and data literacy. And that’s what we found when we went into the self serve analytics model.
And the data was inconsistent and chaotic. It still took a data scientist to munch the data until we could make sense of it. And we did a ton of data engineering to hack together datasets and build pipelines to future proof the most recent data issue. We would very often find [00:21:00] out that the analytics tracking was so broken, with missing an event, missing a property, inconsistencies between the platforms. We would actually have to ask the product engineering to fix the tracking, release the product again, and wait for the next release to have enough data so that we could analyze the success of the original release.
This could take ages, it could take ages to analyze how PR how well a product release. And then it was often too late and we’d end up releasing based on gut, making the next decision based on our gut with a decision that might even have contradicted the results when they were finally in. And so sometimes the results went painfully and analyzed indefinitely, which is super painful.
Yeah, so this was the state. So the next thing we did, we introduced analytics governance, meaning if there was a product release, the product engineers would involve the data scientists and we would design [00:22:00] analytics tracking for them to implement. The engineers would implement the data design and the data science team would verify that the analytics tracking was actually working as expected.
We then introduced another problem, of course. The problem that we now introduced that product releases were bottlenecked by analytics governance. Product releases bottlenecked by schema management and we were ultimately forced to choose between product delivery speed and reliable insights for every single product release.
Because it took a lot of back and forth to do this process that I was describing just earlier, we ended up going the route of introducing tooling and processing processes to be centralized the analytics governance. That was a little bit scary, like I was talking about. Because it’s scary to leave data structures in the hands of people that haven’t touched data since before.
We optimized, however, for two key things, we optimized for product engineering productivity and for data quality. So [00:23:00] scary, like I was saying. So this is, we are here now and the self serve analytics governance state, but ultimately when we got there and I will cover the tools and the processes that we built specifically to get from stage three to four on the next few slides, but ultimately what we ended up was a state where there was just such a good connection between the product engineers and the sort of data needs and the business needs, and the decision needs from product management, from product design and product strategy.
And we had such good tooling that we ended up finally getting to the state where teams were actually shipping fast without compromising on data quality. And I wish I could have worked in that environment for a really long time, but QuizUp was acquired really quickly after that. We built all of this over a long and painful period of time.
And since then I’ve helped other people do it with Avo. So that’s good. And, but this was like a dream state to work in and I’ll cover the [00:24:00] impact also. B ut so process wise. So if we cover this again, going from data engineering that doesn’t scale to self-serve analytics governance to getting the product managers and the product developers closer together. So process wise, we focused on injecting product analytics, meaning it’s quality, the relevance and the reliability into the product releases. One of the most impactful processes that we did was the purpose meeting.
The purpose meeting is a platform to align stakeholders on the purpose of a product release and to map out the highest impact and the lowest effort tracking we can execute to understand the success of the release. The stakeholders in this instance are the product engineers, the iOS engineers, the Android engineers, the backend engineers, and a data scientist.
I’m a product manager as well, and a product designer. We would just pull up a whiteboard. For [00:25:00] remote people, this would be like a Figma or a MURAL. And we would map out the release goal, the metrics for the user journey success and design the literal data structures. This saved us an incredible amount of time and headache and chasing down databooks in reactive mode.
And what I mean also by when we managed to be centralized this, is eventually when we had done this many purpose meetings, it ended like one of my favorite moments when like product engineers would pull us in. Like that was rewarding. I can tell you that, like they would come in and be like, I’m super excited about this next release.
And I wanna make sure we’re measuring it because I want to know the impact of the job that I’m doing. So that was really rewarding. And I’ll talk about that in the end, impact section, but ultimately and eventually, this got just trained into the data culture and into the data literacy.
And we were there as, reviewers occasionally. But people ended up being really responsible for their own datasets and that was [00:26:00] just incredibly rewarding and freed up a lot of our sort of space and time to work on the more proactive and exciting, cool stuff, I guess also.
And so the other thing is really around fixing your tools. Tooling wise, we asked the data science team, you know, filled with data engineering hats and analysts and all those people. We worked with the product engineering team, specifically the product engineers got excited about building tooling for their own work around this.
And so we worked with them to build a ton of tools. We built out a JSON schema to represent our tracking and we source controlled it on Git. We built out Python scripts to generate types of analytics wrappers based on the JSON schema. So we generated code for Swift, and for objective C for iOS, for example, and for Kotlin and Java for Android.
And we did all of that to make the developers lives easier, to make them less frustrated about having to implement analytics and it’s so easy to [00:27:00] mess it up. And then we built protocols around version management of the schemas, which is like a hugely complex process. Our project for product organizations that are working in parallel a lot of product teams working in parallel on schema changes that cost conflicting type safety. But we rebuilt a lot of processes around that. It was a painful process and it took years. And then we also built sort of tracking pre-release validation tools like an in-app debugger and a dashboard that would show us all of the events that are coming in and whether they are correct or partially correct, or just never correct.
So all of these tools we’ve built to make sure that all this is working. But really, I would say like tooling aside the most important thing we built was the bridge between the product engineers and the product managers to minimize data scientists as a dependency in the product release. And ultimately we were changing culture.
The tooling we built was the foundation The tooling we built were built on the foundation of being able [00:28:00] to change that culture. The culture change we made was the foundation for getting product engineers a buy in for caring about data quality. So our success really was the combination of the amazing tools that we built and super proud to have worked with those incredible people on building those things.
And then the cultural alignment we created by injecting product analytics into the product release. Don’t get me wrong. It took a lot of effort to build all of these things. But it was rewarding to see the success of it. And I also tell you this, I felt like an imposter the entire time.
I never knew what I was doing. If you don’t know what you’re doing, I relate heavily to that and feel free to reach out to talk about that. But we’ll do what we can basically. So ultimately we built a sustainable bridge between product management and product engineers. With a data team now instead of a supporting role, a pillar under the bridge, so to speak. So instead of being the gatekeeper and the bottleneck [00:29:00] we support and it was incredible, and this change is not doable if the mindset is that a data engineer is responsible for data quality, it just doesn’t work. So a quick sort of recap on the purpose meeting. It is a 30 minute sit down, eventually. It takes a little bit longer, maybe in the first few attempts, right? Just like anything, when you’re trying a new board game, it takes a little bit longer to learn it, but ultimately it turned into a 30 minute sit down to align stakeholders on the purpose of the product release and map out the highest impact and the lowest effort tracking to understand the success and this is a framework to use.
Will also be sharing out, I can share some links about this also recommendations to run this. So the ultimate framework is you map out the goals, you align on the goals. What does the success look like? You map out the metrics and you commit to them and like, how will we measure how successful we are with this release?
And then you map out the literal data structures, what analytics events will we need for [00:30:00] our success metrics? And the reason why I liked to do this with all of the stakeholders is for goals, it’s really impactful to have all of these different people in the room at the same time. Because it just increases so much clarity on why are we tracking this.
And even if we take a step back, we sometimes walk out of a purpose meeting with actually, maybe we can hold off on this product release. I’m not sure we understand the goal of it. So it’s like psychoanalysis for your product releases. And it just helps everyone get aligned and just mapping out the goals right there.
I feel like already that increases the data quality because the data that the product engineers already now care about why we’re doing this. And so they already care about like, why do I care about making this analytics event? Correct. And what assumptions do I have here based on those goals that can be made can help me make better decisions about the data structures that I implemented into the code.[00:31:00]
The metrics part is also really impactful to do together with all of these stakeholders because I’ve seen for example there’s a scenario where you ask where you have a gut feeling that it might be valuable to release something behind an AB test, because that will be the only way to measure whether this thing is better than another thing. How we did that or how we facilitated that was by facilitating dragging out of the developers and the product designer, how we could measure the success. And then when they came up with a really good metric, we could say I guess we probably won’t be able to measure that without releasing an AB test.
Do you want to know this metric? Really? Do you really want to know the answers here? If the alignment was that we really did want to know the answers, the AB test suggestion would ultimately come from product engineers. And the reason why I’m bringing this up is there’s often hesitation to run things behind an AB test because it means you [00:32:00] have to duplicate your code and you have to maintain a lot of different product versions at the same time.
It’s just more overhead. So obviously we don’t want to do it. It’s a little bit more difficult and we’re already making things a lot difficult by developing. So I thought that part of the metrics or the purpose meeting was always really impactful and then designing the data structures and why we want to do that with all of these stakeholders in the room as well is because when iOS engineers and Android engineers and weapon engineers and backend engineers are implementing data points for the same user experience or similar user experience, their code might look very good.
And they might not have access to logging the same analytics events. And particularly when you look at the metadata, the properties or the dimensions, as we’ve been talking about here with the metrics layer IT is ideal to prevent a lot of back and forth around having to redesign the analytics events or asking the iOS engineer to reimplement their tracking because we find out [00:33:00] like a week later that Android can’t actually implement this data structure like this in the code. It’s just so valuable to align on all of these very early in the process. And I actually recommend it very early on the product development process, not as an afterthought.
So I try to sneak in there really early and to the product release process with a purpose meeting. And I always also recommend trying to start with a team that’s already quite excited about doing something related to. Yeah. But I quickly want to go over the impact that we saw of bridging this gap between the private engineers and the product managers.
After a long journey of building this culture where data quality was injected into the product release process, we were finally at a stage where teams could shift fast without sacrificing data quality. And there are two metrics that I’m particularly proud of that indicate our impact on the culture and efficiency.
The first impact metric is I would say the product developers interest in data like I was talking about earlier. I think we went from something like two developers looking at charts, occasionally [00:34:00] opening up amplitude or Mixpanel or asking data questions in Slack or something to something like I don’t know, like 70% of the developers were just posting charts after they released and asking questions and the unlikeliest product developers were, and this is about inclusion.
Developers are pretty generally intelligent people. And so I think a lot of the mistake we often do is like we dismiss their interest in doing great work. Heartache. And I think, as soon as we started including them, for example, why we’re doing these purpose meetings, they become quite passionate also about understanding the success of their own releases.
And it was just huge to see this change. And I have to say you can just imagine what the impact on data quality is. One, the raw data producers, the product engineers, when they have converted into data consumers, when we’re bringing the data producers in the data, consumers into the same person, it’s a game changer for data reliability when you bridge that gap between data producers and data consumption. So that is incredible. Great. Like they, they [00:35:00] basically went from seeing analytics as a frustrating task that they had to complete for a data scientist in to seeing it as one of the most impactful tools that they have to understand the impact of their work.
And so the other metric that I’m also proud of is the proportion of time that we ask data professionals have for doing the data science-y cool stuff. I would say, before we did all of this, we were probably in debt of how much we were doing proactive stuff. We were just pretty reactive.
And so we spent the majority of our time answering the basic questions to unblock someone who couldn’t answer those questions on their own. And so when we had built these levels, what we’re doing effectively is we’re building levers so that people can just, do more with less.
When we have built those levers and shifted the culture, we were able to use, like a 50% of the time [00:36:00] on things like developing prediction models for retention to use it as input into our ads or an in-app purchases or developing recommendation engines, or it was just finally, we were able to do the things that ultimately people hire data scientists for sometimes.
They hire them for exciting projects. But they often don’t realize that they have to do a lot of foundational work first. But these are the two things that make me proud. And the first one is if we think about this as like leading north star metrics or something like that, they’re both input into an ultimate goal of helping the company make better decisions.
So I would say, the proportion of developers looking at data when we get that metric up, what we will see as a side-effect is way better product quality or product data quality thus faster decision-making based on data, etc. And the same thing for the proportion of the data team’s time they have available [00:37:00] for proactive, cool stuff. That’s a similar thing. So again, to be impactful, I’m not suggesting that we throw away our engineering or data engineering hats, but I want to say that we want to and need to wear many more hats, and the folks that are supporting that first hire of a data person or first 2, 3, 4 hires,
I think it’s just going to be very impactful if we are aware. Data engineering in the early stages or data science in the early stages it means so much more than moving data between outside sources and inside sources and things like that. It just means bridging gaps between people building relationships and things like that.
So to take that away, I want to say, start with the culture, not the data engineers. Of course you’ll use data engineering along the way, but build relationships and bridge gaps. I would also start small. Start with the [00:38:00] people who are already excited. I like to sometimes use the analogy of if there’s a vegan person that wants to help the world convert to vegan the most difficult thing you could do is go to the people that want 90% of their diet to be meat. That’s going to be a very difficult conversion. So very similarly to what our friends over at our lab, we’re talking about I believe it was yesterday rather than Monday and the silver bullet and the analytics flywheel, find your evangelist.
I really agree with that takeaway. And with the vegan analogy, like it’s easier to start with someone who just doesn’t know how to get rid of cheese from their diet or something. So it started with someone who was like data curious and then and keep sparking people’s data curiosity and emphasize the analyst’s soft skills be purple, if you will, shout out Jillian and Anna from the dbt team, and then build a key habit. Try the purpose meeting and then just try doing it for [00:39:00] every feature release. So that wraps up my message for this. So I want to thank you. I do have it have one more thing. We’re actually introducing our version 1.00 also of the Avo audit dbt package.
So talking about empowering people to discover the issues with their data quality. The dbt package that we’ve built it’s open source and it runs on your data warehouse to detect spikes and drops and your event volume. And we have support for more issue types coming soon. I’m super proud to be on the Avo on the dbt, so thank you dbt for that. And thank you dbt team for supporting us on this journey and thank you, dbt community also for giving us a lot of support and advice on building macros and stuff like that. And you can learn more about the release blog post, which is currently live. So I will now head over to questions whether that’s going to be [00:40:00] in Slack or where I’m going to be. Over to you, Jillian.
Last modified on: Oct 12, 2022