Analytics engineer 🤝 data analyst: How dbt Cloud enables our workflow
Aug 12, 2024
ProductAt dbt Labs, our data team is a centralized unit that collaborates with various departments through specialized pods. We're part of the corporate pod, which supports finance, G&A functions, and organization-wide initiatives. Our work is all about empowering our internal customers with the data they need while also maintaining our systems like Snowflake and Git repos.
As a data analyst, Paige is all about that "last mile"—transforming organized data into actionable insights through queries and visualizations. She also loves playing data detective, solving the mystery when data looks off.
Lauren, on the other hand, spends most of her day in dbt Cloud, fixing jobs, setting up source connections, and occasionally diving into BI tasks. Her work ensures that our data pipeline runs smoothly, and when things break, she’s the one who gets them back on track.
The overlap in our roles is where the magic happens. We both have access to the entire data stack, which means we can troubleshoot and solve problems together, ensuring that nothing slows down our projects. Our shared knowledge and collaboration make us a powerful duo, capable of tackling anything that comes our way.
We'll dive into how dbt Cloud has changed our workflow, the game-changing features we can't live without, and how we've used dbt Mesh to streamline complex projects. We'll also share our tips for debugging data issues and why we believe in the power of collaboration and self-service in data teams.
So, whether you're an analytics engineer, a data analyst, or just curious about how we work, stick around. We have plenty to share, and we hope you'll find our insights helpful as you navigate your own data journey.
What are the data team pods?
The data team at dbt Labs is a centralized data team of analytics engineers and data analysts. Together with folks from finance and revenue operations, we form pods that support different parts of the business.
- The product pod: supports engineering, product, and design.
- The go-to-market pod: supports marketing and sales.
- The corporate pod: supports finance and G&A functions, plus organization-wide initiatives.
Since our team is centralized, we support our own data team projects, like maintaining Snowflake and Git repos. We also help enable self-service, so our internal customers can answer their questions about our data. Speaking of self-service, we know it’s a spicy topic and we have a LOT to say on this, but we’re going to put a pin in it for now.
What is your job and what parts of the data stack do you cover?
Paige: As a data analyst, I think of it as the last mile. I take data that represents business entities and processes. After the data is beautifully organized by the analytics engineers, I combine the data, adjust the shape, run queries, and make charts to answer business questions, or even identify what business questions to ask.
I also love to be a data detective. If somebody comes to me and says that their data looks weird, I get to dig through the logic, the lineage, and the sources to figure out if it’s a pipeline issue, or something changed in the behavior of our customers that we need to address.
Lauren: Don’t forget about Git! You approve all of my PRs. The “enabling Lauren” layer
is super important. I spend most of my day in dbt Cloud, where I’m fixing broken jobs and figuring out where our problems are, as well as building net new pipelines. There’s some cross-over into data engineering land, where I’ll set up new source connections. Sometimes I hang out in business intelligence (BI) land and do some basic reporting in our BI layer, but charts aren't my friends, so I try to stay away from that.
What's your day-to-day like?
Lauren: The first thing I do when I wake up in the morning is check Slack and look at urgent things happening in my DMs. This usually means that some job is broken, and that's where I really start my day.
We have a massive job that has existed since the beginning of our project. It runs most of our models, and as you can imagine when you're running 3,000 models, it becomes very cumbersome. It takes forever and when it breaks, you have to restart it. It's the worst moment of the day. I've been working hard to break that up into more digestible chunks so that we can curb some of our Snowflake spend and make some of our data run once a week, or every 12 hours, instead of every four hours, like our production job.
Paige: I also tend to start in Slack. Not so much to learn about what’s broken, but to make sure I see any urgent needs from my executive-level stakeholders. When they need data, they often need it right away.
I also have a set of Slack channels I check to look at discussions. I like to add anything I can to help people and share insights. I’ve been at dbt Labs for almost two years, which is like a lifetime at a startup, so I have a lot of institutional knowledge. One of our core values at dbt Labs is to contribute to the knowledge loop, which is why the dbt community exists.
Lauren: And you review my PRs!
Paige: Yes! That’s my favorite part. You can learn so much about SQL and the data by reviewing code. Another fun thing about discussing code together is that we also get to talk about where logic lives. Does it live in a BI tool, or does it belong in dbt, where it’s version controlled, or the Semantic Layer? Should it be a metric? This is where the blurriness and overlap between our jobs is awesome.
What benefits do you see from the blurriness between your positions?
Lauren: While I may spend most of my time in dbt Cloud, and Paige spends most of her time in Snowflake and Hex, we both have access to the entire stack. That means we can look into any part of the pipeline and address issues, all while conferring with each other.
We also share so much context, because we are working on projects together. So for
PR reviews, or pairing on solutions, we have a built-in partner with much-needed context for top-notch code reviews. On top of that, we can easily stand in for each other as needed, so the progress and maintenance of our projects doesn’t need to pause. Build in redundancy!
What did your workflow look like before and after today’s dbt Cloud?
Paige: I started working with data in 2004. The organizations I worked for used Oracle databases, and there were many times when our things were SQL scripts in random places, like on someone's laptop, or a shared drive. We'd have command line scripts that'd run the SQL files one after another very carefully. If anything got out of order, or something got deleted or changed, it was a disaster. I remember building a data pipeline in PERL once. That was a fun one. I remember a whole day getting derailed because somebody accidentally dropped the zip code column out of the database. It's always addresses, right?
I did a short stint as a software engineer and learned about engineering principles, and it all made so much sense to me. When I returned to working on a data team, I tried implementing version control in Git for our most complex SQL queries. It was better than nothing, but far from what it could be. When I discovered dbt, I immediately knew how valuable it was. Folks had finally figured out how to apply software engineering principles and best practices to data work.
The company I worked at used dbt Core, so I spent a lot of my time wrestling with Python environments, dealing with Git, and making sure I was in the right repo state. I had to rebuild my local repo many times, and I had a giant list of Git commands I’d follow based on how I’d screwed up the repo. It was much better than we had before, but it wasn’t a smooth process. It was time-consuming.
After dbt Cloud, the workflow is seamless. As soon as I think about what I want to know or do, before the thought even feels complete, I'm already in dbt Cloud Explorer. I look at CLL, hit the "open in the IDE" button, refresh from main, create a new branch, and run queries or make changes so fast I feel like a superhero. Working in dbt Cloud makes me feel like the computer is an extension of my brain instead of something I have to sweet talk for an hour and a half before I can get to work. These features are game-changers, and I can’t imagine working without them.
What are the game-changer features of dbt Cloud?
dbt Explorer
dbt Explorer has changed the way that we investigate. If there is a bug in our data or there's a funky piece of data, we can go into dbt Explorer and figure out exactly what's happening.
The image below shows days in the fiscal year being transformed in that particular mode. It's going downstream and there's an error. We already know that in the lineage, which is so rad. I don't need to look in any of these other pieces because those are just pass-throughs. It's using the column, but I don't need to investigate those particular models because those aren't transforming that particular column. Column-level lineage (CLL) has changed our lives.
Rerun prod job from failure
Rerun prod job from failure is also a game-changer. Our prod job runs every four hours and takes about two and a half hours to run. When we had to rerun it from the start, we had to tell people that they’d need to wait three hours. Now we can just rerun from failure, and it might take an hour or thirty minutes. It's changed the way that we communicate with our internal customers.
Compiled tab and copy code button
The compile button is great because sometimes we just want to look at the SQL in the model and then take it somewhere in the BI tool where we can make small changes and put it in a chart right away. It's great because we can hit the compile tab and see the raw SQL, copy it, and then we’re back in the BI tool with all the code. It's so much faster than what we used to do.
Rerun CI job from failure
I also love the rerun from failure for the CI job. When we make a change and submit a PR, we run the CI jobs to make sure it won't break everything if we merge that code. As the CI job runs, maybe there's something else in the project that isn’t working but isn’t related. It'll fail, but it won't fail because of my code. In the past, I'd have to go in and put a space in a document somewhere so I could commit it and have the CI job run again. Now I just have to click a button. It's so much faster to resolve that kind of problem.
Defer to Production
Deferred to Prod means that if you don't have the production, the models that are upstream of what you're working on in your development schema, then it'll look to the production schema to get the data from there. This is a great time saver because every week we used to have to remind everybody who was going to do any development that they had to run this command. Otherwise the data would be stale in their schemas or we'd have dropped them if they were stale.
Instead, now we just have a toggle. As soon as we toggle it on, it’s already doing all that work and we never have to think about it. And you don't have to run the upstream models. You can just turn on Defer to Prod and already have those upstream models built for you on top of production data. They already exist and you're just using what's already built. It's so quick.
How has dbt Mesh changed your workflow?
Paige: We’ve meshified revenue recognition and accounting and put it into a new downstream project. Our main project has public models that we can build upon. But with dbt Mesh, it's focused. It is for specific customers and business users, so its impact is significant but narrow. We won’t break marketing data while fixing accounting models. I’m used to immediate gratification being a data analyst. That can feel challenging when I’m making a change to the repo and dbt, but dbt Mesh makes it super easy.
Lauren: We can use versions, which creates a new version of your model. With dbt Mesh, you can point to either version, so we have parallel pipelines that are new code and old code. They're both running and we're able to see the differences between the two without creating a new version.
We have cross-project dependencies, and since we can’t change the big project, we can make versions and reflect them in the downstream Mesh project. We’re making incremental changes, but they don’t have to affect every single model. Instead of being overwhelmed by 2,500 models, accounting folks can go into dbt Explore in the Mesh project and see a subset of models relative to them and their business needs.
How does dbt Cloud enable stakeholders and self-service?
Paige: On the data team, we have weekly office hours called self-serve data team office hours. It’s for folks at dbt Labs who are working on something and get stuck or have a question. They're often trying to figure out how to analyze something but they don't know exactly where the data is. I'll share my screen in dbt Explorer and show them how to search for that data and how to find the models. They can then do it for themselves next time.
Lauren: One of my favorite things is helping somebody figure out how to do a PR to get the data they need to get their work done. It still goes through our PR review and we have CI jobs that are in place so that everything's protected. This process allows people to go in and answer their questions if they have the skills and the time.
What's been your favorite moment as a team of two?
Lauren: A few months ago we took over an accounting project, which was a beast of a project. I didn’t know anything about accounting, but we went from zero to one hundred fast. The other day, Paige was talking to the lead accountant and had memorized all of these very specific codes, and I knew exactly what she was talking about. We have so much shared knowledge, that our sleuthing now has the power of two.
Paige: One of my favorite things is when I can’t figure something out and I'm deep in a rabbit hole, I can call Lauren and ask if we can get on a Zoom and see if she can help me figure it out. When we get on the call, the first thing she’ll say is, “You are doing a great job. We are doing a great job.” It's always so reassuring.
Data Busters
Lauren: Paige, you modified this list of Rules for Debugging and it's amazing. Can you walk through what they are?
Rules of debugging
- Understand system
- Make it fail
- Quit thinking and look
- Divide and conquer
- Change one thing at a time
- Keep an audit trail
- Check the plug
- Get a fresh view
- If you didn’t fix it… it ain’t fixed!
Paige: The rules of debugging are from a book called Debugging by David J. Agans. It's about how to debug systems in software and hardware. I was reading the book one day and realized that I do all these things in data work. But it's a little different from what he was talking about in the book and what they mean for us as data professionals.
I took the rules and I unpacked each one. Understanding the system is about figuring out the whole pipeline end-to-end. Where is the data coming from? Who's entering the data? What's it for? What's the business process it's for? Through all the transformations, all the way up to the other end. What's cool is that I shared this with Lauren, and she figured out we use dbt Cloud to do all these things.
Rules of finding a data ghoul
- Use dbt Explore to understand the system: Familiarize yourself with the end-to-end data flow, from data sources to your reporting tools, and everything in between.
- Use dbt Cloud scheduler to make it fail: Create a controlled, replicable scenario where the problem appears, leveraging test data sets or simulated queries if necessary.
- Use the dbt Cloud debugging logs to quit thinking and look: Analyze error messages, logs, and any other system feedback carefully. This could reveal patterns or anomalies related to the problem.
- Use the dbt Cloud IDE to divide and conquer: Break down the problem into smaller parts. Segment your data, modularize your code, and isolate your variables to help identify where the issue lies.
- Use the dbt Cloud IDE to change one thing at a time: When altering code, queries, or data processing methods, change one element at a time to understand the effect of each change.
- Use the dbt Cloud IDE to keep an audit trail: Document your observations, hypotheses, actions taken, and their results. This facilitates collaborative debugging and prevents the same ground from being covered twice.
- Use dbt Explore to check the plug: Verify the basics. Are the data sources available? Is the data being loaded correctly? Are the correct versions of software/tools being used?
- Use dbt Cloud Git integration to get a fresh view: When you hit a wall, seek external perspectives. Colleagues may spot something you've missed, or suggest a new approach.
- Use dbt Cloud scheduler and tests to monitor your jobs. If you didn't fix it, it ain't fixed: After implementing a solution, ensure the issue is genuinely resolved. Validate your solution under different scenarios and monitor performance over time.
If there’s one thing you want people to know, from a data analyst and analytics engineer perspective, what would it be?
Paige: There’s a super important path to go down before you decide to automate something. Is the investment worth it?
Lauren: Why are you SO passionate about this concept??
Paige: Our stakeholders don’t know how long things take to build. We’ll get requests to automate things that take someone 10 minutes every month. We should be proactive about protecting our time and think about what the value is of accomplishing that task. Many times people are just curious, but don’t need all that automation.
Lauren: My advice is to Google things. It's so important, especially in this industry. I think people get intimidated. When I was in grad school, I thought that everyone knew how to code. I thought it was perfect all the time and everyone knew all of these functions. I had no idea how I was going to remember all this stuff. The answer is you just get good at Googling. We're all Googling things! It's our job as developers to Google and be better at Googling than non-developers.
Join us at a dbt Meetup
dbt Meetups are more than just events—they're opportunities for the dbt Community to come together, share knowledge, spark new ideas, and refine the craft of analytics engineering. Our conversation highlights the power of collaboration and underscores how essential tools like dbt Cloud are for helping teams work smarter and more efficiently. Much of what we've shared in this blog was part of a recent presentation we gave at a recent dbt Meetup in San Francisco. If you're interested in learning more about analytics engineering, exchanging experiences with fellow practitioners, and having some fun along the way, we highly recommend checking out an upcoming Meetup.
Last modified on: Oct 16, 2024
Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.