Table of Contents
- • No silver bullets: Building the analytics flywheel
- • Identity Crisis: Navigating the Modern Data Organization
- • Scaling Knowledge > Scaling Bodies: Why dbt Labs is making the bet on a data literate organization
- • Down with 'data science'
- • Refactor your hiring process: a framework
- • Beyond the Box: Stop relying on your Black co-worker to help you build a diverse team
- • To All The Data Managers We've Loved Before
- • From Diverse "Humans of Data" to Data Dream "Teams"
- • From 100 spreadsheets to 100 data analysts: the story of dbt at Slido
- • New Data Role on the Block: Revenue Analytics
- • Data Paradox of the Growth-Stage Startup
- • Share. Empower. Repeat. Come learn about how to become a Meetup Organizer!
- • Keynote: How big is this wave?
- • Analytics Engineering Everywhere: Why in the Next Five Years Every Organization Will Adopt Analytics Engineering
- • The Future of Analytics is Polyglot
- • The modern data experience
- • Don't hire a data engineer...yet
- • Keynote: The Metrics System
- • This is just the beginning
- • The Future of Data Analytics
- • Coalesce After Party with Catalog & Cocktails
- • The Operational Data Warehouse: Reverse ETL, CDPs, and the future of data activation
- • Built It Once & Build It Right: Prototyping for Data Teams
- • Inclusive Design and dbt
- • Analytics Engineering for storytellers
- • When to ask for help: Modern advice for working with consultants in data and analytics
- • Smaller Black Boxes: Towards Modular Data Products
- • Optimizing query run time with materialization schedules
- • How dbt Enables Systems Engineering in Analytics
- • Operationalizing Column-Name Contracts with dbtplyr
- • Building On Top of dbt: Managing External Dependencies
- • Data as Engineering
- • Automating Ambiguity: Managing dynamic source data using dbt macros
- • Building a metadata ecosystem with dbt
- • Modeling event data at scale
- • Introducing the activity schema: data modeling with a single table
- • dbt in a data mesh world
- • Sharing the knowledge - joining dbt and "the Business" using Tāngata
- • Eat the data you have: Tracking core events in a cookieless world
- • Getting Meta About Metadata: Building Trustworthy Data Products Backed by dbt
- • Batch to Streaming in One Easy Step
- • dbt 101: Stories from real-life data practitioners + a live look at dbt
- • The Modern Data Stack: How Fivetran Operationalizes Data Transformations
- • Implementing and scaling dbt Core without engineers
- • dbt Core v1.0 Reveal ✨
- • Data Analytics in a Snowflake world
- • Firebolt Deep Dive - Next generation performance with dbt
- • The Endpoints are the Beginning: Using the dbt Cloud API to build a culture of data awareness
- • dbt, Notebooks and the modern data experience
- • You don’t need another database: A conversation with Reynold Xin (Databricks) and Drew Banin (dbt Labs)
- • Git for the rest of us
- • How to build a mature dbt project from scratch
- • Tailoring dbt's incremental_strategy to Artsy's data needs
- • Observability within dbt
- • The Call is Coming from Inside the Warehouse: Surviving Schema Changes with Automation
- • So You Think You Can DAG: Supporting data scientists with dbt packages
- • How to Prepare Data for a Product Analytics Platform
- • dbt for Financial Services: How to boost returns on your SQL pipelines using dbt, Databricks, and Delta Lake
- • Stay Calm and Query on: Root Cause Analysis for Your Data Pipelines
- • Upskilling from an Insights Analyst to an Analytics Engineer
- • Building an Open Source Data Stack
- • Trials and Tribulations of Incremental Models
Analytics Engineering Everywhere: Why in the Next Five Years Every Organization Will Adopt Analytics Engineering
This talk is about envisioning the world of five years from now when every organization has an Analytics Engineering department.
It answers two main questions: Why should we care about the widespread adoption of analytics engineering? What types of organizations will we see this growth in?
Browse this talk’s Slack archives #
The day-of-talk conversation is archived here in dbt Community Slack.
Not a member of the dbt Community yet? You can join here to view the Coalesce chat archives.
Full transcript #
Elize Papineau: [00:00:00] Hello. And thank you for joining us at Coalesce. My name is Elise Papineau and I am a senior analytics engineer at dbt Labs. I’ll be the host for this session. The title of this session is Analytics Engineering Everywhere: Why in Five Years Every Organization Will Adopt Analytics Engineering, and we’ll be joined by Jason Ganz.
Ganz was an early adopter of dbt and finds himself primed to peer into possible data futures. Before we start, please take a moment to join the conversation, taking place in the #coalesce-ae-everywhere channel of dbt slack. If you’re not a part of the chat you’ve joined right now.
Visit community.getdbt.com and search for the #coalesce-ae-everywhere. When you enter this space, we encourage you to interact there throughout the [00:01:00] whole talk, ask other attendees questions, ask the presenter questions, share relevant AWS outage means and just share your reactions in general in the channel.
After the session, the speaker will be available in that channel to answer your questions. And we encourage you to ask them throughout the talk. All right. Over to you again.
Jason Ganz: I folks I am Jason Ganz, the developer experience manager here at dbt Labs. And I’m so thrilled to talk to you today about analytics engineering, everywhere, or why in five years, every organization with the data team will adopt analytics, engineering, the talk so powerful that they may have brought down AWS trying to stop it, but we’re here anyway.
Now. If you’re at this talk, you probably have some questions about analytics engineering. What is analytics engineering? How is it important for [00:02:00] my organization? That’s a very ambitious thesis to try and prove in a 30 minute talk. And that’s it to that. I say number three is not a question. But before we get into the big stuff and the big ideas around analytics engineering, I want to start with the story.
That’s a little more personal about when I first really knew that analytics engineering was not just something that was impactful for me and the organizations that I had worked in, but that analytics engineering was going to be everywhere. And that was actually last year at Coalesce. Last year’s Coalesce was an incredible event because it was the first ever analytics engineering conference.
The first time for the community to really get together and start sharing our knowledge with each other. And some now foundational talks, sprung up, talks about how to organize their, the teams, [00:03:00] how enterprises were using dbt to totally reshape how they use data. And some thoughts on the future of where data and data teams were going.
And if you were there, the energy was palpable. I know I was thinking at home, I wasn’t working in for-profit tech at the time, but when I saw that I knew analytics, engineering is such a powerful force, but it wasn’t just. After last year’s Coalesce, there was a three times increase in the number of analytics, engineering job posts on the dbt side.
Clearly there was a huge amount of energy around analytics engineering. But at the same time, if you’re not deeply embedded into this world, it can sometimes feel a little diffused as to what exactly this is why the energy is so high and why your organization should care about it. And so that’s what we’re here to talk about today.[00:04:00]
[00:04:00] Why Analytics Engineering? #
Jason Ganz: But to get there, we needed a little context because the story behind the analytics engineering to really be understood, needs to be examined at a number of different levels. We need to look way zoomed out at the industry level of the market and technological forces that brought us here to. We need to look at the organizational level of how organizations actually change.
Once they’ve adopted analytics engineering into their data systems. And we need to look at the skill of all of how analytics engineers are actually working on a day-to-day basis and what that means for people becoming analytics, engineers. Now, these are all big topics. These are all big topics, any of which could easily fill it up.
Oh. And talk. So we’re not going to decisively answer these today, but what I hope we will do today is show why analytics engineering is something [00:05:00] to watch out for, to believe in and to invest in learning about and sharing our knowledge with the community. Okay. So let’s start broad.
Turning back to 2008, 2010, when the current internet giants were starting to take their place as the dominant as the dominant forces in the modern economy, this was the era when Facebook, Amazon, Google were going from companies that were interesting to companies that were really showing that they were changing how business is done.
And there were a couple of thoughts about. Why that was so important. In fact, there is a huge amount of energy to try and figure this out. And so first was the importance of being technical, of being able to harness the digital tools of the internet and to write software that would be able to scale your business.
The second was the idea of being data-driven. These companies had more [00:06:00] data than any organization ever had before, and they were using them to actually change how their businesses make decisions. And so as this was happening, something started to happen, which is the idea that it’s very important for us to be technical.
And it’s very important for us to be data-driven. Then at the intersection of being data-driven in being technical must be something extra important, something that we should really invest time and energy in figuring out what that thing is. And so to start, there was an effort to define this, and we had an initial definition, which is.
[00:06:44] The Creation of the Data Scientist #
Jason Ganz: Okay. So being technical means you have a traditional computer science background, how to use Python and other full programming and being data-driven means that, advanced algorithms, neural networks, perhaps deep learning [00:07:00] these were the skills that if you were using them, then you were considered, I did the science.
Or you were at least doing data science and this was incredibly high use case for incredibly high leverage for the use cases where data science is impactful. We’ve seen today organizations transforming how they do prediction algorithms, recommender systems, all sorts of applications today are being made smarter and enriched by data.
But as we were thinking about how organizations become data-driven, there started to be a different set of questions and people started to ask, what about the data problems? Everyone else needs to solve? These are questions like how many customers do I have in each country? What are our most used features?
Who are top performing salespeople. These are questions that [00:08:00] were not as well suited to be solved by data science, workflows. They’re just a different set of questions. And these questions went to the data analysts. And data analysts are super well primed to be able to engage with business users figure out how to find this data and how to deliver it in a way that creates impact for organization.
But something started to happen, which is data analysts got a lot of questions. In fact, more and more questions than they could ever answer. I like to think of this is running on the analysis treadmill, where you keep getting questions you keep coming in and you keep answering them and they keep coming in.
[00:08:43] Data Analysts vs Data Engineers #
Jason Ganz: And what part that was because the data analysts didn’t necessarily have a ton of impact on the data infrastructure around them. And that was the domain of the data engineers who write the data pipelines are responsible for getting the data into systems and maintain the [00:09:00] quality of the infrastructure.
Jason Ganz: And this was the steady state for a period of time until a new. Set of tools starting to emerge a set of tools enabled by modern data warehouses. Stick kicked off by red chip snowflake, big query, the modern data stack allowed for tools that could change the data workflow, forever tools like data loaders, which allowed us to load in information from our various systems.
With just a few clicks of the buttons tools like Looker and mode, the modern business intelligence platform that started to make the dream of self-service business intelligence into a reality. And then of course, dbt , which allowed you to transform your data in your warehouse purely by writing modular SQL. [00:10:00] When this started to happen it changed how organizations use data because all of a sudden data analysts and SQL users were able to build their own data models and be able to extract meaning by themselves. And as this started to happen and more and more organizations began adopting. This workflow, people started to think that maybe there’s a different way of viewing that all important intersection of technical and data-driven and maybe.
Technical actually means being able to convert business logic into code writing, good, clean sequel, following software engineering, best practices using get hub, things like that, and being data-driven make it means thinking, and clean logical systems. It means being able to engage with your business users and understand the data that they actually walked in.[00:11:00]
And if you’re doing that. What you’re doing is analytics engineering. And it looks engineering was an incredibly exciting evolution of the data workflow because it allowed analysts and people who write a SQL to start building up and including organizational knowledge. Into into their data products.
[00:11:28] Analytics Engineers #
Jason Ganz: And there’s been a lot of attempts to explain what analytics engineers actually are. If you were here last year you heard the theory that analytics engineers are actually just pissed off data analysts. If you listen to talk yesterday, you heard the term war keeper throwing around one term.
That’s increasingly gaining prominence. Is the idea that the analytics engineers are the librarians of the modern data organization. [00:12:00] And what that means is you might go when you have a data question, you might go to an analytics engineer and a librarian, they might say, oh, here we have a full stack of compiled books ready for you and point you to where those resources already exist.
They might say, oh, that’s an interesting question. Here’s some prior research that’s been done on that and point you towards some model tables that exist, which can point you in the right direction and then work with you to get the answers. Or they might say that’s a great question. We’ve never thought of that before.
[00:12:40] The Analytics Engineering Workflow #
Jason Ganz: And then they’ll actually roll up their sleeves and get in there and help you do the work yourself. The analytics engineer workflow changes the game for doing data work because it’s modular, testable and repeatable breaking up monolithic SQL scripts that produce [00:13:00] metrics into individual dbt models, where you can examine them and make changes on the fly.
Have your tests have your version control in place. You’re not going to break anything and keep delivering that. And now turning back to the sets of data problems. We can see that because analytics engineering addresses these set of questions around modeling your data, any organization who has data, and that wants to draw insights from it is going to benefit from adopting analytics, engineering workflows, and principles.
And the other thing that’s very important and exciting about the analytics engineer. Workflow, it builds on itself. As you build up your dbt project and your analytics engineering systems, you’re able to actually like Erica talked about yesterday, scale your knowledge beyond scaling the number [00:14:00] of humans in your data organisms.
Because you’re able to bring in new data sources, model them, and then have that you’re able to overtime grow the insights that you’re able to deliver from your organization. One of the things that Tristan said to me early on when I was learning analytics, engineering that has always stuck with me is that we like to solve hard problems.
Analytics engineer engineering lets you solve a hard data problem and then have that solution available to you forever so that you can go on and solve the next hard problem. And the next hard problem. And the one after that, no, I wouldn’t be here today. If I didn’t believe that analytics engineering was a hugely high leverage practice for organizations to adopt. But that being said any big [00:15:00] organizational change comes with potential failure modes. And there are some to watch out for, with analytics engineering, particularly in terms of its scope.
And it lets engineering can be adopted in a way that’s too broad where analytics engineers are insufficiently differentiated from analysts. And then this can cause confusion as it’s unclear who is supposed to be working on which problems. It can also be brought in a way that is over specific, where inlets engineering just exists as a tiny slice between the analysts and the data engineering team.
And when that happens, it might still be effective, but it’s losing some of the cross-functional magic that makes analytics, engineering so impactful. We’re all working as a community to figure out the best ways to adopt. Analytics engineering at different sizes of organizations. And this is an ongoing conversation.
If you’re interested in this [00:16:00] type of thing, I highly recommend you join us in the towards analytics engineering channel on the Slack, where we talk about these things all the time. But then I want to make clear is that analytics engineering is not eating the data to you. In fact, what analytics engineering is going to do is it’s going to make everyone on the data team, be able to spend more time doing the things that they care most about and our highest leverage. This means that analytics engineers are going to a lot of data analysts to spend more time working with stakeholders, driving business value. Data scientists are going to be able to spend more of their time building models and really going deep on the machine learning side and less time trying to organize their data and data engineers can focus on building the critical infrastructure it’s going to make the whole organization keep up.
[00:16:58] Other Data Roles #
Jason Ganz: What is great [00:17:00] about data analytics engineers is that because they sit relatively towards the middle of this scope of the organization and linked engineers are a great first hire and analytes engineer is going to be pretty comfortable. Stepping into the role of an analyst and working with their stakeholders.
And they’re going to be able to set you up with some infrastructure to build for the long term. At the same time as organizations scale analytics, engineering continues to be a high leverage way of modeling your business data and pulling out organizational insights. From the smallest startup to the biggest enterprise, we can see huge value in adopting the analytics, engineering workflow, but it’s not just business.
Our world is facing a huge number of challenges like the pandemic and climate change, where it’s very important for us [00:18:00] to be able to solve them that we actually know what’s going on in the world. And analytics engineering is uniquely primed to help us gather, collect, and draw insights from that data in order to make change, I want to be clear analytics engineering.
Intac are not a panacea for these problems, but they are a way that we can help address. So some of the issues here and to showcase what this might look like, I want to actually show two examples. One from when analytics engineering was not adopted. And one from when it was just to show what it looks like in practice and the difference this.
[00:18:47] The Perils of Bad Data #
Jason Ganz: So to start, I want to go back to the the year of 2020 when the UK government was attempting to figure out how they [00:19:00] were going to report on COVID cases. Like many of us do, they fell back on the systems that they knew and use. This is a very reasonable thing. In this case, they happen to building a database that was powered all by Excel spreadsheets, but then something started to happen.
Jason Ganz: They started to notice that actually their case counts were too low. They were under-reporting the number of cases. And the reason for this is it’s actually really why. Because they were reporting their case counts in Excel. They were exceeding the daily limits and the case counts were being truncated.
Now the real ones among you will wonder how they possibly managed to hit the Excel row of it when kids counts, never got that high. And the answer is that they were using XLS files where the row count is [00:20:00] actually just 31,000 rows. So any cases above 31,000 rows were being cut off. This is one of the clearest indicators that I’ve ever seen of how having outdated data systems can lead to real wordly harm.
It’s very important that we deliver this data to people accurately, and honestly, And to showcase an example of when this went really well. I want us to turn to San Francisco and around the same time period now around this time DataSF, which is the data science organization of the San Francisco city government started trying to solve the problem of how to use PPE, how to track PPE and make sure their stock calls weren’t running well.
And that they knew how much. To do that. They turned to the modern data stack snowflake in dbt to show and [00:21:00] create a new site where people could actually track and see the stocks in close much closer to real time by having this they’re able to accurately distribute their gear and make people safer during the pandemic.
This was a hugely exciting use of the technology. And we were really grateful to have a data SF, present this at dbt meetup last year which we’ll be sharing the link to in the chat. And I highly encourage you to go check it out now, before we go, I want to leave you with with a final message. It’s possible that when we were talking about last year’s call ass or the rise of the early days of dbt , you were thinking, wow, that sounds really cool.
I wish I could have been around in the good old days of analytics engineering when things were just getting started. I’ve got great news for you, which is that these are the good old days of [00:22:00] analytics engineering. There are so many organizations that haven’t yet been exposed to the power that analytics engineering can bring.
There are so many foundational problems about how we use this set of the modern dealer stack. And then all of the other things that we can build on top of it, Kristin and Martine were talking yesterday about how this is a 50 year journey for us to begin to really start to know what we can do with this technology now.
Oh, It’s just a great time to start getting involved in the analytics engineering world. And so I’m going to say please stick around with us in the dbt community, whether that’s joining the conversation on slack reading and contributing to our new developer blog or attending and hosting a meetup, there are so many ways to get involved and continue making share that analytics engineering truly does go everywhere.[00:23:00]
Elize Papineau: Thank you for that talk. That was great insight and love to see the journey and get the context of where we are in this longer journey.
[00:23:15] Q&A #
Elize Papineau: We did get one question in the Slack channel that I think is great for these last five minutes here. Leo of Fulsom asks. How much of statistics do we expect an analytics engineer to understand, and what is required versus what is nice to have.
Jason Ganz: That is a great question. So to start with, I want to say that this can vary by organization. Hopefully a link was shared to Anna’s great article on jobs to be done in the data organization and knowing, and statistics is definitely a job to be done in a data organization for analytics engineers. I tend to think in the smaller analytics team at a [00:24:00] startup. Cause that’s just where my background is. And so if I was looking to hire an analytics engineer, At a company of that scale, I would want them to have a good, intuitive understanding of statistics, but I wouldn’t really expect them to have any sort of formalized statistics training.
I wouldn’t expect them to be super mathy, but more important thing would be able to write goods equal or in good sequel and critically know how to tell good stories with data, talk to analytics practitioners, and build the other products that were useful for them.
Elize Papineau: Fantastic. Okay. Let me see. We’ll give the chat a few more seconds here to see if there’s any other live Q&A questions. Doesn’t look like it.
[00:25:00] Okay. So I think we’ll go ahead and Emily, there we go. Perfect timing. So Ian folly asks: in the amazing future where analytics engineers are more prevalent, do you have any ideas about how they get structured or assigned within an organization?
Jason Ganz: That is a fantastic question. So this is one that really depends on the size of the organization and particularly analytics engineering at enterprise scale, I think is something that organizations are still figuring out.
And it really depends on whether it will be more of a centralized team. Perhaps there’ll be embedded analytics, engineers on different teams. It’s likely that there’s not going to be one specific right answer. I think there’s a data mash talk tomorrow that will be hinting at some ideas for how larger enterprises can work this out.
But [00:26:00] it’s definitely still a bit of an open question. How analytics engineering fits into the org chart.
Elize Papineau: Yeah, and this is ever reflects my experiences that I’m seeing on some current projects as well. Okay. It looks like we have time for maybe two more questions here. Currens ask is CAE a role in the future alongside a CEO, a CTO, etc.
Jason Ganz: That is a good question. I know our Fred, I can get this close to making it through this talk with other band a Ben Stansell reference, but we we have to accept the reality as it’s a Ben wrote a great post on this, talking about the idea of a chief analytics officer. It’s definitely possible that more and more orgs will start to adopt this.
I think we’ll start to see analytics engineering broken out as a function and mature a bit before this becomes really widespread, but probably some for looking orange, we’ll start getting towards it in the near future.
Elize Papineau: Great. All right. And then our [00:27:00] final question from the chat is from Ric, which is when will you share your Spotify?
Jason Ganz: Well, I suppose Rick can answer that because the deal was a hundred signups on on, on, on that day when I posted it. So I was scared when Ric picked up on that, because she’s the one with the answer of how many how many signups. All right. When we hit 100 data, she will keep me honest.
Last modified on: Apr 19, 2022