Table of Contents
- • No silver bullets: Building the analytics flywheel
- • Identity Crisis: Navigating the Modern Data Organization
- • Scaling Knowledge > Scaling Bodies: Why dbt Labs is making the bet on a data literate organization
- • Down with 'data science'
- • Refactor your hiring process: a framework
- • Beyond the Box: Stop relying on your Black co-worker to help you build a diverse team
- • To All The Data Managers We've Loved Before
- • From Diverse "Humans of Data" to Data Dream "Teams"
- • From 100 spreadsheets to 100 data analysts: the story of dbt at Slido
- • New Data Role on the Block: Revenue Analytics
- • Data Paradox of the Growth-Stage Startup
- • Share. Empower. Repeat. Come learn about how to become a Meetup Organizer!
- • Keynote: How big is this wave?
- • Analytics Engineering Everywhere: Why in the Next Five Years Every Organization Will Adopt Analytics Engineering
- • The Future of Analytics is Polyglot
- • The modern data experience
- • Don't hire a data engineer...yet
- • Keynote: The Metrics System
- • This is just the beginning
- • The Future of Data Analytics
- • Coalesce After Party with Catalog & Cocktails
- • The Operational Data Warehouse: Reverse ETL, CDPs, and the future of data activation
- • Built It Once & Build It Right: Prototyping for Data Teams
- • Inclusive Design and dbt
- • Analytics Engineering for storytellers
- • When to ask for help: Modern advice for working with consultants in data and analytics
- • Smaller Black Boxes: Towards Modular Data Products
- • Optimizing query run time with materialization schedules
- • How dbt Enables Systems Engineering in Analytics
- • Operationalizing Column-Name Contracts with dbtplyr
- • Building On Top of dbt: Managing External Dependencies
- • Data as Engineering
- • Automating Ambiguity: Managing dynamic source data using dbt macros
- • Building a metadata ecosystem with dbt
- • Modeling event data at scale
- • Introducing the activity schema: data modeling with a single table
- • dbt in a data mesh world
- • Sharing the knowledge - joining dbt and "the Business" using Tāngata
- • Eat the data you have: Tracking core events in a cookieless world
- • Getting Meta About Metadata: Building Trustworthy Data Products Backed by dbt
- • Batch to Streaming in One Easy Step
- • dbt 101: Stories from real-life data practitioners + a live look at dbt
- • The Modern Data Stack: How Fivetran Operationalizes Data Transformations
- • Implementing and scaling dbt Core without engineers
- • dbt Core v1.0 Reveal ✨
- • Data Analytics in a Snowflake world
- • Firebolt Deep Dive - Next generation performance with dbt
- • The Endpoints are the Beginning: Using the dbt Cloud API to build a culture of data awareness
- • dbt, Notebooks and the modern data experience
- • You don’t need another database: A conversation with Reynold Xin (Databricks) and Drew Banin (dbt Labs)
- • Git for the rest of us
- • How to build a mature dbt project from scratch
- • Tailoring dbt's incremental_strategy to Artsy's data needs
- • Observability within dbt
- • The Call is Coming from Inside the Warehouse: Surviving Schema Changes with Automation
- • So You Think You Can DAG: Supporting data scientists with dbt packages
- • How to Prepare Data for a Product Analytics Platform
- • dbt for Financial Services: How to boost returns on your SQL pipelines using dbt, Databricks, and Delta Lake
- • Stay Calm and Query on: Root Cause Analysis for Your Data Pipelines
- • Upskilling from an Insights Analyst to an Analytics Engineer
- • Building an Open Source Data Stack
- • Trials and Tribulations of Incremental Models
Data Paradox of the Growth-Stage Startup
Delivering high impact, high velocity, high quality data products in highly volatile data contexts.
Browse this talk’s Slack archives #
The day-of-talk conversation is archived here in dbt Community Slack.
Not a member of the dbt Community yet? You can join here to view the Coalesce chat archives.
Full transcript #
Barr Yaron: [00:00:00] My name is Barr and I work in product here at dbt Labs. I’ll be the host of this session. The title of this session is Data Paradox of the Growth-Stage Startup led by Emily Ekdahl analytics engineer at Palmetto, a company on a mission to lead the world towards a clean energy future.
First, some housekeeping, all chat conversation is taking place in the Coalesce growth startup data channel of dbt. If you’re not part of the chat, you have time to join right now, visit our Slack and search for #coalesce-growth-startup-data. When you enter the space, I’m excited about this. Talk as companies grow quickly, often there is a shock to the system.
The number of data assets grows quickly. The teams that need data support grows quickly and often the data team itself scales to keep up. I’ve worked in data at large Fang companies and tiny stores. [00:01:00] And although there are some commonalities, many of the needs are just different. We will be learning today about a way that data teams can handle the pressures of delivering a high quality data product in a rapidly and constantly changing context.
Emily has a background in data and analytics engineering with prior experience in ed tech and health tech startups. When she’s not coding, you can find her at live music. Traveling hiking with her dogs, practicing yoga and reading and listening to podcasts.
After the session, Emily will be available in the Slack channel to answer all of your questions.
However, we encourage you to ask questions or chat at any point during the session. She will get to all of it after let’s get started and I’ll pass it over to you, Emily.
Emily Ekdahl: Thanks, Barr.
Hello. And welcome to Data Paradox of the Growth-Stage Startup. [00:02:00] I’m Emily . I’m an analytics engineer at Palmetto. My pronouns are she her. And you can find me on the dbt, slack and Twitter at emekdahl.
It’s Monday morning, all hands at Jaffleshop Inc. Great news. Your executive says we’ve received another round of funding. We’re going to grow the business by 10 X. We’re going to release a, B and C new features. We’re also looking into new strategies for user acquisition and retention, by the way, we’re hiring to support all this growth, software engineers, data engineers, analytics, engineers, and analysts.
And do you think my job’s a little more secure. That is good news. Also, I need to support accurate and reliable reporting for all this growth. I need to make sure I [00:03:00] pull in all those events for those new features so we can tie in and determine whether or not they’re actually driving the growth we hope they are.
And I need to hire a whole bunch of new well hire some. And also I need to onboard even more people onto the new data platform.
[00:03:19] Business Problem #
Emily Ekdahl: This is all very exciting. And yet there are a few things to consider while your business deals with jaffles your product and platform is more like spaghetti.
It’s no one’s fault. Really. It’s more like creative chaos. There’s so much growth. It’s really exciting. You’re well positioned. You have a modern data stack.
It’s just that you can barely keep up with the pace of things as they currently stand much less all the plan growth of the business, product, platform, and your teams,
By the way, those investors who gave you hundreds of millions of dollars, they’ve been [00:04:00] looking into your financials and reporting, and they’ve noticed a disconcerting trend. Your net promoter score is down almost 10% over the prior year. They want to understand the reasons behind the trend and they want to know how you plan to turn it around.
Setting aside for a moment that contentious conversation on her own net promoter scores, jaffle shop has determined. This is an important measure of their customer satisfaction. And if you forget what an NPS score is, here’s a visual for you..
By the way that NPS score data, it’s siloed in a vendor website and it’s hard to analyze in their tools much less, get it integrated with your data so you can actually drive insights for the.
[00:04:50] The Data Paradox #
Emily Ekdahl: This is the heart of the data paradox. We need to meet immediate business needs. We need to plan for rapid growth in our [00:05:00] data, volume, variety, and velocity. We need to scale our data tools and our data platform, and we need to onboard many new people onto the platform. Hopefully we do this all without burning.
Here’s some of the pain points. I just outlined when it comes to data, product development and growth stage companies.
In addition to those, there are also pain points around process. Often engineering workflows are non-standard there’s either too little or too much documentation and fiefdoms of tribal knowledge. Also because you’re iterating so quickly on your tools and processes. Onboarding is often a multi-day setup.
That’s new every time you do it.
[00:05:53] What to do? #
Emily Ekdahl: So what am I going to do as aJaffleshop engineer? Am I going to hide the truth? Am I going to find a new job that’s [00:06:00] maybe more stable, but potentially less exciting? Or am I going to build something? I’m an engineer. Of course, I’m going to build some. In fact, I’m going to streamline the entire analytics engineering stack with a single software package.
[00:06:17] PALM #
Emily Ekdahl: And that software package is called Palm. You can think of PALM like a cross project CLI. The beauty of Palm is that it abstracts away the complexities of the underlying stool, the underlying tools in your stack. And what that allows you to do is onboard people really quickly onto the platform and have everybody operating at the same really high level in terms of how they use your tools, regardless of whether or not they understand some of the underlying technologies.
So let’s go to a quick demo and I’ll show you how it works.
Alright, here we [00:07:00] have the classic dbt Labs demo project. Jaffleshop , and to get started, I am going to run Palm in it and because I’m in a dbt project, I can add the plugin flag and dbt. Just like that. I’ve initialized Palm in my project and you can see here, there’s a small config. If I wanted to say protect the main branch to prevent people from testing on it, I could do that.
However, for the purposes of this demo, I’m going to avoid that. Alright. Now thinking about some of our pain points, one of our big points, one of our big pain points was testing. Item potent test runs and seeing that our data behaves the same way when we’re developing as it does when we’re in production.
So let’s install some macros. These are specific for dbt projects and what they allow us to do is [00:08:00] intelligently name our schemas for testing, and then clean them up.
And we’ll see how that works with. But first let’s containerize the project, reduce our onboarding pain and reduce the amount of conversations we have that involve it worked on my machine.
Okay. And. Just for now. Because I was developing this demo at the time we’re going to use 21. We’re very excited about one by the way. Can’t wait. All right. And here I’m going to accept the defaults for package management. My project is containerized. That’s how simple it was.
Okay. So looking here, what have we done?
Because it went so fast. Sometimes it’s hard to take it all in. We’re pointing to the dbt labs image. And we’re going to run a simple script that allows us to get some things set up in [00:09:00] terms of package management and testing. The really cool part that I want to make sure to highlight for you is this part right here, Palm is smart about looking for your creds, wherever you have stored them, and it will find them and integrate them so that your project works with Docker.
Okay. So now let’s try it out.
This is a command that we created to help us be more productive and also to help us simulate RCI environment. It’s in the plugin by the way. So you can take advantage of it right away. And one of the things I want to do is talk through all the pieces of it and think what value this provides for us as data teams.
So we are very excited for build and we plan to incorporate it into all of our commands, but just for now, it’s useful for our purposes to think through all these steps. First of all, we’re going to start clean so we can trust the [00:10:00] results of our. We’re going to install our, any dependencies, make sure our database is seated and then we’re running, run and test twice, which, we can run up to five times in succession and that does a better job of simulating the CII system, but I didn’t have to manually do any of those runs.
And then finally we clean up automatically based on our intelligent branch naming. And we do that all in Docker. So you can see here that I’m running and testing and succession, and obviously I haven’t made any changes. So Jaffleshop runs as expected.
But that’s how easy it is to containerize your project.
And you’re already one step towards helping people work well in your dbt project. Now thinking through a few other things, that command was cool. What else could I do? What else might I want to do? I run Palm help as a normal part of my work. This is really great for me because it’s a hit list of the most important commands [00:11:00] that my coworkers and I use on a regular basis in a project.
It’s really great if it’s your first time or if it’s been a while or say, if you wake up in the middle of the night and your on-call open-source projects get really robust after a period of many years over, if you think about, there’s probably a handful of things that you really need to do on a daily.
And for example, right now I’m looking to build a new command so I can see here there’s Palm scaffold. So let me try that. And chaining help commands is a pretty common workflow for me. And again, it gets back to that piece that we talked about reducing context shifting, because I know everything I really need is going to be here at my fingertips.
I haven’t had to leave this context or look at any doc. I just keep going, regardless of whether I’m working in dbt or Airflow or great expectations. So here I’m going to make a command.[00:12:00]
Cool. And that’s how easy it was to scaffold a command. Let’s take a look. You can see here, I’ve got a basic echo going.
And we’re going to run it just to task and see that.
That’s a good start. But what would I want to do is something like this. Let’s take a look at existing examples and think it through.
One of the things I like to do is look at the commands that are already available to me in the Palm dbt plugin. And again, this is open source. So these are all available to you as.
And for this case, we’re going to use an example of the modeling command, what I was talking to you about best practices and how we want everyone to operate at the same high level on our projects, regardless of whether it’s their first day on the job, or they’ve been at Palmetto or say jaffle shop for many years, we implement commands like this.
You pass a couple of simple [00:13:00] arguments, like your name, your model type, and then this code generates everything for you. And exactly the location. We want it, the format we want it. So that we’re all working in the same way in our projects.
[00:13:19] Palm in Airflow #
Emily Ekdahl: All right. Now let’s go back and talk a little bit more about how some of our pain points have been resolved and all the various things we can do. So a little bit of code gen goes a long way, but we can do a lot more than that. For one, we can do that whole containerization piece inside Airflow as well.
And this has been huge, especially for our data scientists who the dag is just a piece of what they’re trying to deliver. And in some cases historically had been. This allows them to quickly spin up with one command a new day, including Dockerization in a really standard way. So if at [00:14:00] any point we need to go in and help them debug, we know where the logs are.
We know how to troubleshoot because everything is standardized and easy to understand.
Thinking back to my pain points as a grow stage engineer for one Palm just helps me develop faster. A lot of the busy work and repetitive work, we have automated away. So I can focus on adding business value. Also again, we talked about these common problems in growth stage startups with data testing with Paul might have item potent test runs, and I can rely on the results of my test runs in my development cycles a lot more in terms of what will be present in production.
I can also avoid collisions and other common problems that happen when you’re developing with teammates on a branch.[00:15:00]
We also saw how palm helps with cogeneration. Pardon me? That’s code generation and standardizes new features. So this makes it really easy for us to say, grab that NPS score data and bring it into our data platform. And it avoids us writing one-off or new every time data orchestration solutions.
With process Palm really helps because you can codify the best ways to work in your project. It allows everyone to work at that same Hightouch.
Also, rather than relying on documentation, which will drift or be buried or out of date, Palm becomes your code documentation in the sense that everyone is constantly referring to it.
Every time they run a command and that’s an agile best practice. And then [00:16:00] every time you make an update, all everyone has to do is PIP install. And then they’ve got the latest. Or rather run Palm update, which is another command that you get with palm.
You saw how quick it was for me to get it set up in dbt.
Imagine if you extended that across your stack and the dbt project I got up and running in less than five minutes. If you set up your entire analytics stack, if you can imagine how, even if you had a team member who had spent two hours with HR that morning, They could have their entire computer spun up and then be taking tickets by lunchtime.
Okay. As engineers, we love productivity hacks. We love the technology magic, but what is the business impact?
Two business days end to end solution delivered for NPS scores, where it used to take two weeks. And that includes the data orchestration, the data transformations, the data [00:17:00] quality testing.
And the data landed in Looker so that people can explore and derive insights from it.
And that’s real business value. Cause then your business can move faster to act on whatever insights they derive and make changes to improve their NPS score. Also, we estimate that we have saved more than a full-time employees worth of hours over the course of a single year by automating. Our onboarding and key workflows.
There’s simply a lot of conversations that we never have in terms of
how do I develop well in this project asking for best practices on pull requests, they just happen.
So how can you get started with Palm?
[00:17:50] How you can get started #
Emily Ekdahl: As a reminder, this is an open source project. So PIP install, Palm and Palm dbt. Then it’s simple to initialize [00:18:00] Palm in your repo. And if you have a dbt project, remember to use the flag and add dbt , if you forget, it’s okay. You can always go back and add it as a plugin.
Paul looks for your profiles, ammo and configures your Docker project accordingly. When you run the Palm containerize and then you either use the existing commands that are available to you, or you make your own. And that’s how simple it is to enjoy faster dev cycles and expedited delivery. We are an open source project.
We are so excited to interact with you. Please join us at github/palmetto/palmcli. We can’t wait to hear your thoughts to collaborate with you. Also, there’s a plugin. We welcome your contributions, your reaction. Or your submissions for new plugins. We can’t wait to see what you build and how you might use Palm across your stack.[00:19:00]
I’d be remiss if I didn’t remind you that Palmetto is hiring. We are a clean energy tech company and we have some great benefits. Carly, my chat champion is posting the open roles in the chat. So please take a look and if you’re a fit, we would love to hear.
In closing, I want to recognize and thank all of the early contributors of Palm, the Palmetto data and analytics team, as well as our patient early users, the operational analytics team, the managers of both of those teams and our CTO for their support of us. As we open sources project, we could not have done it without you.
And also I want to recognize the broader open source. We are so grateful for the ways in which we are empowered by these tools every day. And we’re so excited to give back to the community. Now please join me in the slack channel so we can chat more. And again, if you want to find me in the dbt [00:20:00] Labs slack or on Twitter, you can find me at @emekdahl.
Thank you very much for your time and attention.
Last modified on: Apr 19, 2022