Table of Contents
- • No silver bullets: Building the analytics flywheel
- • Identity Crisis: Navigating the Modern Data Organization
- • Scaling Knowledge > Scaling Bodies: Why dbt Labs is making the bet on a data literate organization
- • Down with 'data science'
- • Refactor your hiring process: a framework
- • Beyond the Box: Stop relying on your Black co-worker to help you build a diverse team
- • To All The Data Managers We've Loved Before
- • From Diverse "Humans of Data" to Data Dream "Teams"
- • From 100 spreadsheets to 100 data analysts: the story of dbt at Slido
- • New Data Role on the Block: Revenue Analytics
- • Data Paradox of the Growth-Stage Startup
- • Share. Empower. Repeat. Come learn about how to become a Meetup Organizer!
- • Keynote: How big is this wave?
- • Analytics Engineering Everywhere: Why in the Next Five Years Every Organization Will Adopt Analytics Engineering
- • The Future of Analytics is Polyglot
- • The modern data experience
- • Don't hire a data engineer...yet
- • Keynote: The Metrics System
- • This is just the beginning
- • The Future of Data Analytics
- • Coalesce After Party with Catalog & Cocktails
- • The Operational Data Warehouse: Reverse ETL, CDPs, and the future of data activation
- • Built It Once & Build It Right: Prototyping for Data Teams
- • Inclusive Design and dbt
- • Analytics Engineering for storytellers
- • When to ask for help: Modern advice for working with consultants in data and analytics
- • Smaller Black Boxes: Towards Modular Data Products
- • Optimizing query run time with materialization schedules
- • How dbt Enables Systems Engineering in Analytics
- • Operationalizing Column-Name Contracts with dbtplyr
- • Building On Top of dbt: Managing External Dependencies
- • Data as Engineering
- • Automating Ambiguity: Managing dynamic source data using dbt macros
- • Building a metadata ecosystem with dbt
- • Modeling event data at scale
- • Introducing the activity schema: data modeling with a single table
- • dbt in a data mesh world
- • Sharing the knowledge - joining dbt and "the Business" using Tāngata
- • Eat the data you have: Tracking core events in a cookieless world
- • Getting Meta About Metadata: Building Trustworthy Data Products Backed by dbt
- • Batch to Streaming in One Easy Step
- • dbt 101: Stories from real-life data practitioners + a live look at dbt
- • The Modern Data Stack: How Fivetran Operationalizes Data Transformations
- • Implementing and scaling dbt Core without engineers
- • dbt Core v1.0 Reveal ✨
- • Data Analytics in a Snowflake world
- • Firebolt Deep Dive - Next generation performance with dbt
- • The Endpoints are the Beginning: Using the dbt Cloud API to build a culture of data awareness
- • dbt, Notebooks and the modern data experience
- • You don’t need another database: A conversation with Reynold Xin (Databricks) and Drew Banin (dbt Labs)
- • Git for the rest of us
- • How to build a mature dbt project from scratch
- • Tailoring dbt's incremental_strategy to Artsy's data needs
- • Observability within dbt
- • The Call is Coming from Inside the Warehouse: Surviving Schema Changes with Automation
- • So You Think You Can DAG: Supporting data scientists with dbt packages
- • How to Prepare Data for a Product Analytics Platform
- • dbt for Financial Services: How to boost returns on your SQL pipelines using dbt, Databricks, and Delta Lake
- • Stay Calm and Query on: Root Cause Analysis for Your Data Pipelines
- • Upskilling from an Insights Analyst to an Analytics Engineer
- • Building an Open Source Data Stack
- • Trials and Tribulations of Incremental Models
Scaling Knowledge > Scaling Bodies: Why dbt Labs is making the bet on a data literate organization
Originally presented on 2021-12-06
This is an overview of how the dbt Labs Data team is structured, how we interact with the greater org, the general operations, and the expectations/responsibilities that are helping us become a self-service organization.
Browse this talk’s Slack archives #
The day-of-talk conversation is archived here in dbt Community Slack.
Not a member of the dbt Community yet? You can join here to view the Coalesce chat archives.
Full transcript #
Jillian: [00:00:00] Okay. I’m so glad to see that y’all took my advice and are joining us as we off Coalesce with none other than dbt Labs, head of Data Erica Louie. "ric" is joined here today by her team, so please join us and the conversation in the channel Coalesce dbt data team, where we encourage everyone to ask questions, not just of the speaker, but of one another as well, leave comments or react to what you’re learning.
Be mindful these threads for conversations and do not be shy with your memes and emojis. If you having joined slack, it’s not too late to do so go to community.getdbt.com and click join community and follow the steps inside. For this session, leave your questions in the Slack and "ric" and Andrew and Brandon, we’ll be following up in the chat after the session wraps.
"ric" has a packed agenda today. So I’m going to keep this short and hand it right over to her. "ric", the stage is yours.
Erica Louie: All right. [00:01:00] Can everyone hear me? Yeah. Cool. All right, cool. Thank you, Jillian, for the kind introduction. Good morning, afternoon, evening, everyone, wherever you are. I am incredibly excited and honored to be the first official Coalesce 2021 session.
You may know me from last year. I was one of the hosts, but today I am presenting on one of my favorites: self-service analytics. I really hope that by the end of this talk, you store these points in the back of your mind as you navigate this incredible and exciting week. But before I jump into my presentation, I have to introduce my amazing data team.
They are truly the backbone of all things, internal data at dbt Labs, and they make my life so much easier. First off we got Andrew who is our product analytics engineer, and Brandon, who is our marketing analytics engineer and chat. Please say hello. They are my chat champions, and they’re going to be answering all of [00:02:00] your live questions at the Coalesce dbt data team channel.
And for me, hello, my name is Erica Louie. My friends call me "ric" or ""ricky"", all lower case — dbt and I have that in common. And I’m the Head of Data at dbt Labs. My pronouns are she her and I’m currently located in New York City for the week, but I based in San Diego or in my van Bessie, wherever she takes me.
A quick background on you though. I studied political science and Italian culture, and I did not have any prior professional experience in the data industry before I joined dbt Labs as their first junior data analyst in September of 2017. Given my backup. I firmly believe that anyone can learn how to use data.
This belief was really the catalyst of chartering dbt learn and why I think that self-service is possible. If you give your colleagues the right resources and mentorship. And in this talk, I’m going to make the [00:03:00] argument that self-service analytics is controversial because it is hard, but the collective benefits for your company, colleagues and data team greatly outweigh that upfront and work, especially as you scale your organization.
And the, let us session overview first, I’m going to give some context of how our team is structure, what our responsibilities are. And then we’re going to get into the core of this, of what does self-service analytics look like? Why we chose this model, how we executed it, and of course, the challenges that we’re currently solved.
[00:03:34] Team Structure #
Erica Louie: So for some context, our team structure, the question here is what are our responsibilities to the org and suing identity crisis. So this is the overview of our team’s scaled structure. We are semi decentralized, so that means that data team members are embedded on other teams, but they report directly to me.
I always like to say that they are the storytellers of their embedded [00:04:00] teams, products or business area, and they’re also data partners to their team, meaning that they advise on the team’s roadmap by using data to help inform these decisions. And of course they also help train their embedded team to be more self-service with data.
We do have a set team mission, which is a same as dbt Labs mission, and that is to create and disseminate organizational knowledge to the entire company. We are a team of three people responsible for serving data to about a 180 person company, and we follow eight data as a product model, meaning that we believe our customers are our fellow colleagues and our product is the company data.
So that means that we also aim to ensure that our product and the feature sets around our product are trustworthy, reliable, and helpful. Our goal is to ensure that our colleagues, interaction and journey with data is frictionless and constantly reiterated on. And by doing so we also aim to empower the company to make the best decisions as possible, as fast as [00:05:00] possible.
So as you can imagine, this data as a product model really goes hand in hand with self-service and, if you would like to have more information on the data as a service versus data as a product my team will link some helpful resources in the Slack channel.
[00:05:17] What Does Good Self-Service Look Like? #
Erica Louie: Okay. So now you have some context, our team structure, their responsibilities to their embedded teams into their greater org. But now let’s talk about what does self-service analytics look like, but more importantly, what does good self-service analytics look like? And to abbreviate a quote from a friend of the pod, Benn Stancil "It is experienced that matters, not the functionality". As long as people are comfortable with that experience and trust the results it produces, we can call it self-serve.
So, what does good self-service look like? To me, it means that outside team members can independently and painlessly answer their questions in your BI tool. That means that there’s thorough [00:06:00] documentation on your data, meaning across your dbt project, and within your BI tool, this also insinuates that the structure of both are intuitive and easy to navigate.
It also means that if they need support, your colleagues have access to asynchronous resources if your team isn’t available. This is especially important if your company is distributed across multiple different time zones. Async resources could look something like how to get set up with data BI trainings, overview of all of our platforms and the analytics ecosystem and how to use them. And also how to ask for help slash making a data request. But, most importantly, if they are trying to answer their questions in your BI tool, when they built that vis they trust the answer that they’re getting. They trust your data and they also have confidence in their abilities to use it.
Another good example of self-service are colleagues, posting company update and instinctively referencing a dashboard or chart that they created. And lastly, it also means that the majority of your data quest [00:07:00] are deeper strategic questions or infrastructure work. So examples of that could be, "Hey data team. I was looking at my Monday morning dashboard and we’re seeing a spike in weekly active users. I have some hypotheses on why, but can we meet to discuss?" or "Hey data team. We’re launching this feature. What metrics do you think we should be tracking to measure success? This will also involve probably adding in a new data source.
All of these are indicators of good data, culture, and a good self-service model.
[00:07:32] The Benefits of Self-Service #
Erica Louie: Now, I know we have some skeptics in the audience, but let me talk about the benefits of self-serve for your team operations. It is more effective to scale knowledge than to scale bodies. That means that rather than spending time on hiring and growing your team at the same rate as your company, you invest upfront time and improving the data experience.
We actually have one product analytics. Hi, Andrew partnered across seven product [00:08:00] teams. And that is because our product managers and even some of our engineers know how to create their own dashboards. They know how to pull numbers if they need to. This means that it reduces time spent on reactive work, like pulling up numbers or spinning up a dashboard.
That’ll probably use one time. And also one thing I’m really passionate about is that empowers your peers. You’re all working towards the same goal. So if your colleague can answer their own questions quickly, they can make faster decisions. Self-service allows your peers to go deep on their level of expertise, area of expertise and then even deeper when they can see the data themselves, there’s a spark.
And once you experience seeing someone get it, you’re going to get addicted. And selfishly it allows the data team to spend more time on proactive work or exciting high value tasks, such as building scalable infrastructure, digging deeper into investigative work, or even zooming out to think about long-term strategy.[00:09:00]
But chats, I have a question for folks who work on data as a service teams, what is your experience balancing data requests? While also trying to dig deeper into understanding strategy or diving into some investigative work. I’m just curious.
Now, let’s talk about fit. Who is this model? A natural fit for? Probably young to midsize startups. My theory is that if you can build this culture early on, then self-service can continue to scale in your organization. This will naturally be a harder fit for larger companies who historically use the data as a service approach. My suggestion here is to actually start on a smaller scale to test and reiterate on what cell service could look like for the greater organization or even for a few selected teams.
You can do this via a subset of colleagues or an embedded team, but there needs to be buy-in and you can send them a recording of this talk to help your.[00:10:00]
[00:10:01] Why dbt Labs Chose Self-Service Analytics #
Erica Louie: So why did we choose this model? Well, Some background on us. Most folks in leadership pale from strong analytics backgrounds. And if leadership is invested in data, it’s easy for that culture and belief to then tricle down.
We all, we are also hiring people from the data space. Every department has at least one member that hails from a data background and our job postings, regardless of their role frequently looks for someone with a background in data. Also our team’s mission is the same as dbt Labs mission.
So this model made sense to us and to the rest of the organization when we started to make a push for it. We also started this model fairly early on. We chartered a dedicated data team in August of 2020, and then we began like this bigger company-wide self-service initiative in spring of 2021 when we were about a hundred people.
But before I move on chat, I’m like curious here have to hear how many of y’all came from a data background, but your current role isn’t directly on a data team. [00:11:00] How do you think that has benefited you and your team?
Now, I just want to make it clear. I’m not saying that all hires should have prior data experience, but it’s super beneficial when non data teams have at least one member that has had a background in data this actually allows them to empower their teams to be more self-service and how to think about exploring data to draw their own insights.
Okay. Now let’s get into the juicy parts. I’ve done a lot of talking about the benefits of good self-service. Who is this model? A natural fit for versus a harder fit for and why this model was felt natural for us. But now let’s talk about how we executed it.
[00:11:40] The [dirt] Road to Self-Service Analytics #
Erica Louie: All right. The road to self-service. And I say dirt road, because this is the foundation and where we’re continuously reiterating on how we can improve.
So how to build a dirt road to self-serve 101?
First we made a plan and we set growth goals that we executed on that plan. And after we are [00:12:00] done setting it up, then we wanted to communicate it to the wider org and also to incoming folks, setting expectations. And for the last two, as you’re going through this process, ensuring that you’re leading this with empathy, patience, reiteration, and affirmation.
So when planning our strategy, I zoomed out and I asked myself, all right, what is the current versus ideal journey that folks go to when they interface with data? And then, so how then can we set those folks up for success? What resources are we currently missing for them not to be successful?
Erica Louie: And then as we build this out, how do we communicate these new expectations to the greater organization and also to incoming team members? And. How do we know for being successful? So I started off with the most important question, which would ultimately set the tone for the rest of the initiative. What is the current customer journey when interacting with our data and what is the ideal [00:13:00] journey?
[00:13:00] Current Data Consumer Journey #
Erica Louie: Okay, so this is what the onboarding data consumers journey looked like. They arrived, they had a one hour onboarding session with me about GitHub and Looker. Now, how could they possibly learn everything about Looker and GitHub within an hour? And then after that, some folks would want to set up the dbt project locally, others would want to use our BI tool to build reports, which was confusing cause he was undocumented and not intuitive. And they’d also have to ping our internal analytics, Slackchannel to ask or set up help. Or if they didn’t really know how to send a data request. Was it through the internal analytics channel? Was it Slack DM? Is there a form somewhere? It often takes someone multiple days or weeks to start their data journey, because there’s just no clear documentation on where to begin.
So planning part two: how do we go from the current to ideal data journey? With all customer journeys you want to establish personas.
So we created two personas for data consumers. We [00:14:00] got the data analysts, which is about 85% of our company. And that means that you’re probably not going to be building infrastructure like in our dbt project or in our Looker repo, or solely here to really view look reports potentially make your own, maybe write a few queries in here.
And then we have the analytics engineer. They’re interested in building infrastructure, they’re writing their queries. They want to contribute to our dbt project and our Looker project. These establishing these two personas is important because their data experiences are going to be different.
So, now that you’ve established these personas, how do you set each up for success? This means, what do they need access to what training resources will be most helpful. And also what other resources should they have their eyes on? So if they’re an analytics engineer, they should really have their eyes on our dbt style guide, our Looker style guide, and maybe even getting set up internal analytics on the CLI.
And then as you go through that, then walk through the journey yourself to anticipate what questions are going to come up. And then lastly, you want to [00:15:00] think of success metrics. Ask yourself, what does the ideal world look like? In the first go, because you will reiterate, set reasonable expectations because you may just rework your entire self-service strategy, and that’s okay.
[00:15:15] Ideal Data Consumer Journey #
Erica Louie: So this one is our updated customer journey. When folks first onboarded they’d have a new onboarding session with me and then they’ll know what resources are available, what path they should be following. And then we can also use those resource docs to curate that path for them. Or, we’ll create these resources as a reference that they can use weeks or months later when they’re ready to finding that utilize.
And then, so this was our actual Q3 roadmap for self-service in the beginning of. July of this year. So we set a few reasonable goals that we wanted to see at the end of Q3. We wanted to see fewer setup questions in our internal analytics channel, we wanted to see an increase in weekly active users and our BI tool to fit 265%.
I believe we were at 50% at the time. We [00:16:00] also wanted our feedback score and pir BI tool to average, to at least a. We send out a quarterly feedback forum at the end of every quarter, just to see how folks are feeling about their data experience. And we also wanted at least one data champion on each embedded team by the end of the quarter.
So that means that an embedded team member who has created some sort of evergreen dashboard in our BI tool or some sort of data tool that they use to make their jobs easier or, they’ve contributed to our dbt project or to our Looker project.
All of that planning, all of that goal setting. Now let’s start executing.
[00:16:35] Execution #
Erica Louie: First set the groundwork! Identify those missing resource docs. We believe that everyone should feel equipped to explore data on day one. So we looked for themes within our most frequently asked questions. And after that, we did an overhaul of our data workspace to be used as a company-wide resource.
So this is where you go to collect any sort of questions that you have. This is the spot. It also allows our customers to start their data [00:17:00] journey for when they’re ready and it cuts down any gray areas of how do I make a request? How do I set up, what tools do we use? And in the end, this is what our data workspace looks like.
Erica Louie: As you can see, we found those themes of what folks were asking us, and then we segmented them into different headers and sections along with the resources that rewrote and Q3.
So then after you write your resources, then think about documentation on your data. So that meant that we documented our dbt project. So all of our data sources and also attach visualizations of their ERDs, we also prioritize documenting our Marts models because they’re the ones that touch your BI tool and have really the most business logic, and also documentation within your BI tool and about your BI.
In parallel with that, you also want to record demo training videos. So we did one for overview of each of our platforms within our analytics ecosystem and also an overview of our BI tool, how to use it business [00:18:00] contacts. And as you go through this process, what we do is we make a note and reflect on every data request.
We ask ourselves, why couldn’t this person answer it themselves? What resources weren’t available or not clear. And then what we do is we throw those notes within an evergreen document in our resource page. And then we’ll come back to it and we’ll notice themes within some of these reflections. And then we’ll add it to our self-service roadmap in the next.
So these are examples of our Looker training videos, which is our BI tool. We also keep feedback on literally everything and more in our blue tape doc, including improving self-service analytics. If you’re curious about what our blue tape doc is about feel free to ask my team in the.
So now we made a plan. We executed now, how do you communicate these to the organization and also to incoming folks? We made an announcement to the company and, we actually weren’t concerned about pushback because folks wanted to learn how to use data. [00:19:00] You would actually be surprised by how many non data team folks are actually motivated to be more self-service with data, but they aren’t given the opportunity to do that.
[00:19:11] Onboarding for New Colleagues #
Erica Louie: But the hard part. How do we communicate these expectations to incoming colleagues? I changed the original GitHub and Looker overview session to how data works at dbt Labs. This is actually one of my favorite parts of my job. I love meeting every single person at this company and teaching them that data can be intimidating, but we’re here to help.
At the end of the session, I want them to walk away knowing that the data workspace as a resource for them to reference when they’re ready to begin using data and exploring data who is on the team, how we operate and our responsibilities to the org and them. This is also important if they have specific department related questions.
So if they have a question about product, they can go to Andrew. If they have questions about marketing, they can go to Brandon. I also, most importantly, really want them to know which persona they fall under. So at least they have a starting point slash guided path already set for [00:20:00] them. I also want them to know our expectations, so what they can expect from us and what we expect from them.
I always say that we don’t create dashboards for people, but if you do want to create your first dashboard, we will be here to help you. But most importantly, at the very end of this, I want them to know that we’re on their side and we’ll happily collaborate and aid them along their data.
So, now that we’ve made those plans, we execute, we set those expectations as you’re going through this process, leading with empathy, patience, and reiteration. So how do we ensure that our colleagues are enjoying their data journey?
You hate to hear it, but you can build as many resources and training materials as you want. But if your colleagues are too scared to ask questions, then what is the point? Think about how you felt when you first started your data career.
Data is hard and it’s only getting more complex. You should empathize with that. In my [00:21:00] onboarding sessions, I’m very open about my data journey and how hard it was if I could do it. So can they I always like to also tell them that they are the subject matter experts. They know more about what’s in that data than we do. We’re just here to help them use it.
And I know that you spent a lot of time writing those Async resources, but that does not mean that you should default to the linking docs. We are partners to our colleagues. Requests shouldn’t be transactional. If they ask you a question, make sure that they understand, make sure that you understand why do they understand this? How can I build this relationship with this person when they’re trying to start their data journey? And lastly reiterate, reiterate from a product perspective.
If you’re frequently seeing unhappy customers, then maybe you should change your approach. And even if it’s the best self-service model that you can possibly create, it can always get better. I urge everyone to send out feedback forms so you can get an idea of where in their journey are they experiencing was friction. And also what is barring them having [00:22:00] a good experience.
[00:22:02] Take a Moment to Celebrate #
Erica Louie: Lastly, take a moment to celebrate when they make a dashboard, when they make a PR, when they build a data tool for the company, tell them thank you for leading into self-service for taking the initiative and for making the data team’s life a little easier.
And I know y’all are curious about, did we hit our goals? And we did. We met them and we also exceeded them. So we saw 78% of the company at the end of Q3 or weekly active users at some point. We also saw our BI feedback score was a 4.5 versus a 3.5 from previous quarters. And in Q4 and internal analytics, we’ve received zero setting up questions.
And a word from our customers. These are just a few examples of what self-service has looked like since the end of Q3. We’ve seen folks creating their own PRS or projects independently creating their dashboards and data tools, the relationships between data team members and their bedded peers, growing symbiotically and onboarding folks [00:23:00] really loving the way that we give everyone access to data to explore themselves. And, a huge bonus that we didn’t expect was actually seeing peers answer each other’s questions.
[00:23:10] Challenges #
Erica Louie: But, with all good things there come challenges. We are rapidly expanding in Q4, we’re really thinking about how we can build training resources for folks with varying levels of data experience.
So rather than starting from, this is how you use our BI tool and explore data. What is data? How do you learn how to use sequel? And that’s going to be a collaboration with the training team. We’re also holding data office hours for some supplemental life support. I foresee that probably coming into like specific segmented office hours, like marketing data, office hours, customer success, data, office hours, etc.
We’re also trying to keep our resources evergreen and constantly updated. And we’re also trying to think about knowledge, share of analytical assets and conversations. So thinking about what dashboards currently exist for business areas, what insights have already been drawn? What questions have remained on [00:24:00] answered.
The solution that Andrew actually came up with was an analytics framework. If you’re curious about what that is, feel free to ask Andrew in the chat.
And some closing remarks that I want you all to walk away. Self-service at your organization is going to look differently than self-service at dbt Labs. But you should really think about the value of your data team, what you want them to use, what you want them to be spending their time on and why haven’t they been able to do that?
I also encourage all of you to believe in your non data colleagues and to urge them to also believe in themselves and always remember that self-service should be a symbiotic relationship between your team and data consumers.
But that’s not all folks. This wouldn’t be a real Coalesced session. If I did not. Add in one last slide quote from Kanye West himself, he sponsors our data team, not really. We are hiring, we’re hiring three different flavors of data analysts for our team. These jobs are not posted on open roles. So if you are interested in any of [00:25:00] these three, feel free to send me a Slack DM, an email.
Our team is super fun. We were at the most fun at retreat, all three of us. We took pasta making classes together. We send funny memes. It’s a good time. Come join us!
And lastly, thank you so much for coming. I hope you all enjoy the rest of your experience at Coalesce. It’s really like a fantastic lineup better than Coachella 2022, for sure.
And that is all folks. I’ll see y’all in slack. I’ll be in the Slack channel answering any additional questions. These are my socials, Slack, Twitter, email. Thank you so much, everyone![00:26:00]