Table of Contents
- • No silver bullets: Building the analytics flywheel
- • Identity Crisis: Navigating the Modern Data Organization
- • Scaling Knowledge > Scaling Bodies: Why dbt Labs is making the bet on a data literate organization
- • Down with 'data science'
- • Refactor your hiring process: a framework
- • Beyond the Box: Stop relying on your Black co-worker to help you build a diverse team
- • To All The Data Managers We've Loved Before
- • From Diverse "Humans of Data" to Data Dream "Teams"
- • From 100 spreadsheets to 100 data analysts: the story of dbt at Slido
- • New Data Role on the Block: Revenue Analytics
- • Data Paradox of the Growth-Stage Startup
- • Share. Empower. Repeat. Come learn about how to become a Meetup Organizer!
- • Keynote: How big is this wave?
- • Analytics Engineering Everywhere: Why in the Next Five Years Every Organization Will Adopt Analytics Engineering
- • The Future of Analytics is Polyglot
- • The modern data experience
- • Don't hire a data engineer...yet
- • Keynote: The Metrics System
- • This is just the beginning
- • The Future of Data Analytics
- • Coalesce After Party with Catalog & Cocktails
- • The Operational Data Warehouse: Reverse ETL, CDPs, and the future of data activation
- • Built It Once & Build It Right: Prototyping for Data Teams
- • Inclusive Design and dbt
- • Analytics Engineering for storytellers
- • When to ask for help: Modern advice for working with consultants in data and analytics
- • Smaller Black Boxes: Towards Modular Data Products
- • Optimizing query run time with materialization schedules
- • How dbt Enables Systems Engineering in Analytics
- • Operationalizing Column-Name Contracts with dbtplyr
- • Building On Top of dbt: Managing External Dependencies
- • Data as Engineering
- • Automating Ambiguity: Managing dynamic source data using dbt macros
- • Building a metadata ecosystem with dbt
- • Modeling event data at scale
- • Introducing the activity schema: data modeling with a single table
- • dbt in a data mesh world
- • Sharing the knowledge - joining dbt and "the Business" using Tāngata
- • Eat the data you have: Tracking core events in a cookieless world
- • Getting Meta About Metadata: Building Trustworthy Data Products Backed by dbt
- • Batch to Streaming in One Easy Step
- • dbt 101: Stories from real-life data practitioners + a live look at dbt
- • The Modern Data Stack: How Fivetran Operationalizes Data Transformations
- • Implementing and scaling dbt Core without engineers
- • dbt Core v1.0 Reveal ✨
- • Data Analytics in a Snowflake world
- • Firebolt Deep Dive - Next generation performance with dbt
- • The Endpoints are the Beginning: Using the dbt Cloud API to build a culture of data awareness
- • dbt, Notebooks and the modern data experience
- • You don’t need another database: A conversation with Reynold Xin (Databricks) and Drew Banin (dbt Labs)
- • Git for the rest of us
- • How to build a mature dbt project from scratch
- • Tailoring dbt's incremental_strategy to Artsy's data needs
- • Observability within dbt
- • The Call is Coming from Inside the Warehouse: Surviving Schema Changes with Automation
- • So You Think You Can DAG: Supporting data scientists with dbt packages
- • How to Prepare Data for a Product Analytics Platform
- • dbt for Financial Services: How to boost returns on your SQL pipelines using dbt, Databricks, and Delta Lake
- • Stay Calm and Query on: Root Cause Analysis for Your Data Pipelines
- • Upskilling from an Insights Analyst to an Analytics Engineer
- • Building an Open Source Data Stack
- • Trials and Tribulations of Incremental Models
How to build a mature dbt project from scratch
Ever wondered how to build up a dbt project from the ground up? How do you know when good enough is good enough?
Using work from dbt Labs professional services as a backdrop, we’ll explore how to measure the maturity and quality of your project as it blossoms into a fully grown pipeline.
Browse this talk’s Slack archives #
The day-of-talk conversation is archived here in dbt Community Slack.
Not a member of the dbt Community yet? You can join here to view the Coalesce chat archives.
Full transcript #
Barr Yaron: [00:00:00] Hello, and thank you for joining us at Coalesce. My name is Barr and I work on product here at dbt labs. I’ll be hosting this session, which is about how to build a mature dbt project from scratch. We have the one and only Dave Connors, delivering this talk. First, some housekeeping. All chat conversation is taking place in the coalesce build dbt project channel of dbt Slack.
If you’re not part of the chat, you have time to join right now, search for Coalesce build dbt project when you enter the slack channel. Dave, whom you’ll be hearing from today is an analytics engineer on our professional services team. Why should you trust Dave? Well, he’s very convincing. In middle school
His friends started a band he was not in, but he convinced them to name itthe Dave Connors Experience. But also he works every day with companies [00:01:00] around the world to help them set up their dbt projects. He knows what he’s talking about and he goes well beyond setting up projects. He really knows how to get projects to the next dbt projects to the next level.
I really wish that I got to work closely with Dave when I was getting started on dbt. And I’m jealous of those who get to. After the session, Dave will be available on the Slack channel to answer your questions. I saw Pat asked earlier for an autograph in the Slack too. This is not like an in-person event where you’re supposed to stay quiet during the talk.
Although I don’t think I need to say that to this audience. We encourage you to ask other attendees questions, ask Dave about the meaning of life. Make comments or react at any point in the Slack channel. He’ll respond to all the Q&A on Slack after his session. So let’s get it started and over to you, Dave.
Dave Connors: Thank you Barbara, for that kind introduction. I want to thank you, especially for that amazing pronunciation of mature. I think we can all stand to benefit from that. So as Barr said, I am going to spend the remainder of the [00:02:00] 30 minutes here talking about how to build a mature DBC project from scratch.
[00:02:04] The dbt Lifecycle #
Dave Connors: And again, my name is Dave Connors. I’m an analytics engineer here at dbt labs. I use he him pronouns calling in live from Chicago, Illinois, and I do sit on that consulting wing of dbt labs on the professional services team. And on that team, my teammates and I have had this unique opportunity to work on an unusually high number of dbt projects.
Organizations ranging from fortune 500 companies to tiny startups and truly everything in between. From this vantage point, we get to have this unique understanding of the dbt adoption curve, how companies actually implement and expand their usage of dbt. Starting with that very first dbt init starting a project.
Wading through this kind of question mark zone landing at some vague moment in the future where there are perfect and unassailable insights about a business coming out of their project. And with each new project that we work on, we find ourselves with a very uniquely complicated mix of features here with the [00:03:00] explosion in popularity, in dbt.
And the near constant release of new features and capabilities within the too, it’s very easy for data teams to go down the rabbit hole of dbt’s shiniest new feature before prioritizing the simple ones that will likely be the most immediately impactful to their organization. And so, as we worked through this mix of projects, we noticed that there tend to be distinct stages that teams go through on this dbt journey. And we come to think of these stages as representing dbt project maturity.
[00:03:31] dbt Project Maturity #
Dave Connors: So what do I mean by maturity in this context? dbt project maturity can be described as an aggregate measure of the completeness and depth of the set of features used.
So feature completeness here, kind of this first tenant is a simple binary yes or no that describes whether or not your project uses more features. Am I adding a test to my model? Am I documenting my model? A more complete project and therefore a more mature one is able to answer yes to [00:04:00] more of these questions. Feature depth
on the other hand is an important kind of check here to feature completeness. Unchecked growth of features can lead to unnecessary and sometimes really hard-to-unwind complexity within a dbt project. Feature depth here measures whether you effectively are using dbt’s features to solve your particular data problem and get as much value out of dbt as you possibly can.
So for example, a medium maturity project, may configure their first test and introduce testing into their workflow. An even more mature project would have a testing strategy and apply tests in a unified way across all of their models in their project to get as much value out of the testing capability
as possible. This idea of project maturity has also bubbled up from our dbt community as well. So Will Weld here who is a really, really thoughtful and talented member of our dbt community to whom I truly honestly owe co-authorship credit on this talk responded to an [00:05:00] open question from our community team
about what resources would be most helpful for new users on his team to ramp up on dbt. And he said he would love to have a maturity curve of an end-to-end dbt implementation. There are so many features in dbt, but it would be great to understand what is the minimum set of dbt features for a base-level implementation, and then which things are extra credit.
And I promise he was not paying for this quote. We are both thinking about this idea of maturity alongside each other. And he’s very much not wrong, but so many exciting things that dbt can do can be a lot to take in and a little bit hard to find the signal in the noise. Okay. So here’s the plan for my remaining 25 ish minutes here. We’re going to walk through this basic maturity curve of a dbt project using the two halves of the definition that I introduced a moment ago.
So on the X axis here, we have feature completeness, the number of things our dbt project can do. On the Y axis here, we have feature depth or the level of [00:06:00] sophistication and quality within the use of each individual feature. And we’re going to trace a single project’s life cycle from teeny tiny baby infant dbt project, all the way up to a full-on contributing to his 401k adult project here.
Our goal is going to be to pause at each stage of this life cycle, investigate which dbt features are the most valuable at that point in time and why, and end up with a full rubric that we can use to both guide new dbt implementations in the future. And for those of you out there who are already dbt users have have something to kind of peg your own project against, to understand where there might be opportunities to add more maturity to the current deployment of dbt that you have.
This slide is crazy complex. I really don’t expect people to try and read this all at once. I’m going to do my very best to break it down into chunks as we go through here.
[00:06:51] Thought Experiment With a "Seeq Wellness" ELT Pipeline #
Dave Connors: So in order to trace this curve, bear with me here. Let’s pretend we are an analytics engineer at Seeq Wellness, which is a very fake EHR company, electronic health [00:07:00] record company.
And we want to introduce dbt to model our patient and doctor and claim information that we’re currently transforming in some legacy ETL tool and pointing at our BI tool in our claim billing dashboard here. And we’re going to replace the transform tool with dbt. We’re going to walk through this together.
As I said, adding incremental features along the way with the hope that we can release this project into the world as a happy, healthy, fully realized dbtproject. So congratulations, we are all now dbt parents. Very, very exciting. So for the purposes of this talk, because I only have so much time, I’m going to keep things very high level, but if you’d like to follow along, Andrew is going to drop the link to my repository, our repository here, where we’ve actually developed a version of the same SQL in this project at each stage of this maturity curve.
So if you go to this link, you’ll find that there’s a sub folder in there for each of the stages that we’re going to pause at with a fully functional version of this Seeq Wellness with some subset of features implemented. We also have the raw data in there that you [00:08:00] can load into your own warehouse if you’d actually like to go ahead and run the project.
Okay, one last thing before we dive into the actual stages of dbt life here, the first thing is that this is definitely going to be more art than science. So what I’m presenting here is a composite view of lots and lots of projects blended up and compiled into one actual kind of standard opinion here and in the real world there’s definitely reasons why features might come into play a lot earlier than I present here. And that’s totally dependent on your own data context and totally okay. This is going to be kind of a, your mileage may vary warning here. Similarly, time. I’m not going to make a lot of assumptions about how long this process should take.
So for some teams, this could be days, weeks, months, or even years. It’s less important to think about how quickly you should progress through these stages and think more about what the levels of maturity are and how these features build on each other to kind of build them. Okay. I think we’re ready.
[00:08:58] Level 1 - Infancy #
Dave Connors: I think the stage is set. [00:09:00] Level one: infancy. Baby’s first model. So when we’re new to the dbt world, it’s best to iterate on dbt in parallel with your legacy transformation process. Just to start. So to start here, we’re going to point this dbt transformed version of the same report that we were building in the previous version, previous ETL tool that we had at our BI tool so that we can have an apples to apples comparison of each of the respective workflows. So you might see in the repository, we’ve joined all of our data sources and our old report and the model called patient claim summary. So all we’re gonna do is take the SQL out of that file from our legacy tool, pop it into a SQL file and our model sub-directory of our dbtproject type in dbt run and see what happens. This is actually exactly the way that we focus our rapid onboarding sessions for new enterprise clients. We start with existing logic, we deploy it in the simplest way possible. We build out from there. But what we’re trying to do here is get a [00:10:00] sense for how dbt actually interacts with our SQL and builds objects in our database.
Dave Connors: So what’s the benefit of a one-to-one migration here without any of the dbtbells or whistles? I certainly have no authority to speak on what it means to raise a child. But I understand that a big part of caring for an infant is just dealing with its inputs and its outputs. The same can really be said here about a dbt project at this stage.
The goal here is just to learn the basics of what makes dbt work. We want to feed our projects a nice bottle of SQL and get some data objects out of it. The complexity will come later, but we just want to create the model, give it command, and see the object in our warehouse that we expect. So kind of the bigger feature they’re introducing at this very first stage isn’t necessarily a feature of the tool. It’s actually analytics engineering itself, the modern version-controlled collaborative approach that dbt offers. You don’t have to do a ton more than that first exciting dbt run to understand how this paradigm can be massively impactful as your [00:11:00] analytics team.
If you decide not to do this, we end up missing out on what that DVT workflow has to offer. I’m not going to spend a ton of time kind of advertising why I think that is a good idea, but Andrew is going to drop the link to the dbt viewpoint here, which is a really good resource to kind of explain the benefits of analytics engineering.
Okay. So here’s the matrix that I introduced zoomed way in on level zero and level one. So we just went from level zero, no dbt to level one, our infant project here. Really only two features have come into play. Commands, we now know how to execute a command and make dbt build the thing that we want.
And we have a model here, which really isn’t so different from the stored procedure that we had in our legacy tool. Simply put here, we’re just, like I said, learning the basics, and this is a huge step forward to kind of introducing analytics at our organization, analytics engineering, I should say.
So to show kind of characterize this growth, instead of kind of going into the details that you can find in the repository, I’m going to refer back to the dag here. And so for folks who are not used to [00:12:00] dbt, dag stands for directed acyclic graph. And for the context of dbt, it just represents the flow of data through our project here.
So, because we just have one single model, we have one single node on our dag and there’s tons and tons of room for growth. So we’ll see this kind of change over time here.
[00:12:16] Level 2 - Toddlerhood #
Dave Connors: Okay. Moving on to toddlerhood and we’re starting to put together our googoos and gagas. Level two. So while we were plugging away in dbt here, unfortunately, the legacy tool that we were using at Seeq Wellness has continued to run into some issues.
And we have a bunch of error emails clogging up our inbox. From experience, you know that for every email we see here, we’re going to need to dissect that error message, dive and crawl through all of these nested sub queries in our legacy query here, figure out exactly what went wrong and likely spend hours fixing that problem.
So since we just together read that dbt viewpoint, we know that dbt is supposed to kind of simplify this process by letting us break big monolithic transformations into components, [00:13:00] adding testing and documentation along the way to simplify our debugging process here. A cluttered inbox like this is probably enough motivation for anyone to want to start a start, to build out a bit more of a robust pipeline.
So we’re going to take those first wobbly steps together. Toddlerhood is when the dbt’s built-in features that introduced modularity into our project are crucial to introduce. We’re going to start centralizing our business logic into individual SQL models, and then build relationships between those models with downstream queries that need to use it.
So in the old world at Seeq Wellness, my little report here, patient claim summary would probably have been one of maybe a thousand different queries at Seeq Wellness, where we may have defined what a claim diagnosis is. And some boilerplate SQL to interact with that table and get the information that we need out of it.
Well, we might’ve done that a thousand different ways with a thousand different analysts. Probably not a thousand different teams, but many different teams. Right? So now leveraging dbt’s modularity functionality, we’re going to do it once in its own model. We’re going to test it and [00:14:00] document it. And then everyone who needs to understand that definition of a clean diagnosis can benefit from it thereafter. We’ve also started to show this project to our stakeholders and they responded with the tier of questions that we, as analysts, are very, very used to hearing like, cool, how do I know this is right? Where did this data come from? So this first tier of kind of building blocks, building this lineage and this modularity is going to start to help us answer those questions that we get so often.
So we’re going to use refs to build relationships between our models, allow developers to share definitions and centralize that business logic. We’re going to use the source function and declare the locations of our raw data and understand the relationships of the raw data that we have in our warehouse and the models that are built on top of them. We’re also going to add documentation and testing to make assertions about our data, share those definitions with our stakeholders so we can make sure that we don’t have our stakeholders lose trust in our data. If we don’t, we’re still on the same exact stage we were with that legacy tool, right?
Our code is undocumented. Our models are untested, and we only have a monolithic single point of failure in our project. So we want things to be reusable [00:15:00] and durable as we continue to grow here. Okay. So we’re back on the heat map here. You can tell that leveling up from an infant project to a toddler project is a huge jump in feature completeness.
You can see we’ve got four whole squares to the right here. We’ve gone a little bit deeper on commands and models. So now that we have tests and documentation, we know the proper command to interact with those new features. And we’ve also introduced, like I said, this idea of layering our models together.
So stringing these transformations together. But the big difference here is that we’ve now taught our project how to do a lot of stuff at once. We’ve replaced raw data references with the source macro, we’re adding at least a little bit of testing and documentation, probably to the end of our dag, our final report models.
So we can describe what’s happening for our stakeholders and prove that we did it right. And we’re starting to experiment with this new modeling language Jinja. So we just have a couple of macros here, the simple built-in ones that allow us to modularize our code. This might feel like a very small step for dbt but for a single project, this is a giant leap in maturity.[00:16:00]
And we can kind of see this maturity reflected in our dag as well. So we have just a single node in our previous stage as an infant. And now we have these newly declared source nodes in green, kind of tracing that lineage from these raw data sources in our warehouse. We can see that we have claimed diagnosis, this intermediate idea that we were kind of talking about centralizing and sharing the definition of.
All of those get joined together in patient claim summary. So what we’re doing here is like teasing out the components of that was inside the patient claim summary report. There’s a ton of valuable stuff that was happening in here. What we’re doing is pulling them out and kind of presenting them and treating them as fully visible and bonified elements of our modeled workflow.
You can kind of see it visually represented here. Anyone out there who has been modeling in dbt for a while can probably tell that this project, this dag more specifically, is far from perfect. The goal for the toddler project is not necessarily to have a perfectly modular set of transformations. It’s really just to understand what the building blocks are and how we can start to put them together.[00:17:00]
[00:17:01] Level 3 - Childhood #
Dave Connors: Okay, we’re getting on the bus. Childhood. We’re going off to school. So so far, this has been a very one-to-one relationship between me, the developer, and dbt, the tool. But our project is still lacking a lot of discipline. In reality, there’s usually many more people who have a stake in this project, and it’s time to open up our project to these other people. Before we’re ready to send our project off to school for the first time, we have to kind of provide a rubric to its teachers for how our project likes to learn.
It’s important to spend the time now to define how this project operates and relates to all of the people involved so we don’t end up with a juvenile delinquent on our hands. So our theme for childhood is to add some structure and some rules for our project’s life. We want to define these guard rails now in order to create a foundation for scalability. I can absolutely tell you from experience that paying back a tech debt on a disorganized dbt project can be really, really painful down the line. [00:18:00] So without these rules in place, it’s kind of hard for our project to grow up properly. We’re going to force ourselves to define the minimum requirements for the features that we just introduced in the previous phase and then making those definitions, we’re actually going to force ourselves to go ahead and implement them in a more structured and predictable way. So at this stage, we’re going to start thinking about standardizing. How do we do that modeling behavior that we introduced last time around. Maybe adding corresponding naming conventions to our models.
And we want to find minimum testing and documentation requirements to make sure that every layer of our dag here is proving that it’s doing the thing we expect, that we can actually share the definition of that transformation. Most of this is going to come through kind of meta documentation here.
So project-level documentation, like a contribution guide or a read me detailing things like who owns this project, how you can contribute to the project, very, notably, how we write SQL in this project. We don’t want to figure out exactly who made which file, simply because of some syntactical differences between developers. And, [00:19:00] above all,
we have a PR template that ensures that only code that conforms to these rules actually gets checked in and used in our project in production. So you can see there’s a lot more depth here added to the things that we introduced last time around. You may not have changed the actual, you know, use of some of the features.
So like the macros, for example, and sources, are really the same. But if you’re looking at the repository, you might notice that it’s simply organized and there’s actual rules around how we use those things. You can see some depth has been added to models here. We’ve actually formalized how we’ve layered transformations together.
And we may have started to experiment with some basic materializations here. So changing from views to tables. You may notice that we have a new feature here and it’s deployment. So getting to this level of maturity is generally when we start to think about running this project in production. With these guardrails in place, we can be confident that our little project isn’t going to fall out of the crib at night and bonk itself on the head and hurt itself.
It’s ready to take on a little bit of independence. [00:20:00] So codifying and standardizing the use of our features so far is a massive step forward here in terms of project maturity. It’s a little bit less focused on what, so what we’re doing with dbt, and a lot more focused on the how. And we can see this kind of rule setting reflected in our dag here.
So we now have three distinct layers of models, staging doing some light touch transformations and data cleansing operations. Intermediate, where we start to actually change the grain of those models, perhaps perform aggregations over those things. And a final layer of the corresponding naming convention.
At this point, exactly the same as our previous report was. But our stakeholders now know that, with that prefix, they can feel safe accessing the data that’s inside here. This is a really rich subject kind of this, this dag structure idea. So I would definitely encourage folks to go watch Christine Berrger’s talk from last Coalesce where she gets really deep into this idea.
It’s an evergreen piece of dbt content. But this is a dag that [00:21:00] we should be really proud of and we should feel comfortable going to production with. Okay, quick checkpoint. I know you’re probably sick of this metaphor, so I’m going to give everyone a quick breather. If you remember from Will’s quote at the beginning of this talk, there’s a rough dividing line between which features are table stakes, and which ones are extra credit. And we are currently at level three and going from three to four is roughly where I would draw that line. Getting your projects to the level three state is something that I think every single project should strive to do. And the specifics of what happens thereafter are more determined by your individual data context than by any sort of set of rules.
Okay, apologies. We’re going right back into the metaphor here.
[00:21:41] Level 4 - Adolescence #
Dave Connors: So we’re at adolescents. Our project is growing up fast. It’s heading off into the world. I didn’t put any flavor text underneath this slide here. If you have a good idea of what dbt adolescence is like, feel free to put a little message in the chat there.
So here we are. So now that our project is out there in production, serving up [00:22:00] insights, it’s powering dashboards, the people want more. So specifically they’re looking for more dimensions and flexibility on the front end of this project. They’re looking for some more speed. It’s their data, and they want it now. They’re also looking for some more context around the information that they’re consuming from our project. So we’re going to try and serve all of those things by growing and optimizing our feature set so we can expand our scope and answer the more frequent and more complex questions that we’re getting from our stakeholders.
In our case, in the case of Seeq Wellness, this is going to come in the form of a broader set of marks to select from. It’s also going to come in the form of model optimizations to make the process of creating these models faster. We’re also going to enrich those insights with a little bit of metadata via the source freshness option.
Our project is really starting to have real responsibilities now. I also want to call out that one major avenue for solving this kind of problem is engagement with the dbt community. I’m honestly very hopeful that this has been a part of our journey all along. But now that we’re more or less past the basics, we’re going to have a lot more energy [00:23:00] to participate in the dbt community.
So, if you want to grow a project that’s best-in-class, it’s best to access the best-in-class knowledge that this incredibly talented group of analytics engineers worldwide have. This is where we can kind of start to focus on those types of problems. So again, you want to increase dbt service area influence at our org, making sure that we’re able to support more from our stakeholders.
Otherwise we risk that same buy-in problem that we were investigating in the previous stages here. So let’s take a look at what this looks like. Our newest feature here. So our newest addition to feature completeness is packages. That’s kind of how we represent this engagement with the community here. We can reduce our development time by standing on the shoulders of the giants of the dbt community, and literally use code that they’ve developed to solve common problems without a ton of development time on our own hands. We’re also optimizing our models here. You can notice that we’re experimenting with some different materializations.
We, in this case, in our case, we’ve widened our dag to make our BI tool a little bit more [00:24:00] flexible in the front end. We’re also going to give our project a little taste of metadata by configuring source freshness on a raw data so Jeremy will always know when that claims data was last refreshed.
Everything’s getting a little bit more adult all of a sudden, wouldn’t you say? So this is our dag from our childhood. And you can see the biggest change here is just that we have a richer set of marks here. So instead of locking our stakeholders in to a subset of things that we chose, as we kind of built out the patient claims model, we’re going to pull some of those dimensions out and allow those joints to happen on the BI layer.
This is definitely something that’s going to be more specific to individual projects than any sort of set pool here, as I mentioned,
[00:24:41] Level 5 - Adulthood #
Dave Connors: Wow. Can you believe it? We’ve reached adulthood. A life well lived. Now that the project’s rowdy teenage years are behind it, we’re at the point where we can start to answer some of the big questions, right? So what does it mean to be a dbt project in the year 2021? How, how have I, my, as a project been changing, how is our [00:25:00] project relating to its peers? So the goal at this place is start to think about solidifying our project’s place in the world. Right? Our project is now a critical piece of data infrastructure.
It’s time to honor that position as an actual piece of our product, and look for ways to deepen our project’s relationships with the tools around it. That’s also a time for our project to take a little bit of time for itself. Maybe pick up a new hobby in its midlife and focus on self-improvement by continuing to use metadata, to learn about itself and figure out ways that it might be able to do better at the things that it already does.
This is going to give us more observability and more control over our project. These things aren’t strictly necessary, but in my opinion, an unexamined dbt project kind of looking for opportunities to grow, is not one that’s worth running. Obviously that’s a little tongue in cheek. Well, we made it to our final matrix slide here.
As I mentioned, our project focus in adulthood is to focus on the relationships that it has with other tools in our stack. And that kind of manifests [00:26:00] itself in two ways on this grid. First through exposures, which is the most mature way to formalize the relationship between the models that we’re building and the BI tool that’s consuming them.
Secondly, our project is kind of renewing its battles with the warehouse here. dbt and the warehouse have been thick as thieves, and we’re going to reflect that relationship and what we’re trusting dbt to do when it comes to maintaining the objects in the warehouse. So we can add things like hooks and operations that were previously done, managing permissions on our objects that were previously done outside of dbt in a much more ad-hoc manner. And we’re also honoring this relationship and commitment and the kinds of macros that we’re going to start writing here. So instead of just doing the template template at SQL writing that we maybe introduced in the previous stage, we can start to write more complex operations that rely on introspective querying to our warehouse to solve some higher order modeling problems here.
Our project really starts to think beyond itself. And as I mentioned, that self-reflection is expressed in some of the measurements that we can actually take about our project here, like test coverage [00:27:00] and documentation coverage. There’s definitely going to be some more conversations about this concept later in the week, so I definitely think you should stay tuned there. Another example of this would be if we’re using dbt Cloud, we may start to use the metadata API, and some of the features built around that, like the model bottlenecks visualization to figure out how we can improve ourselves and our project here.
Dave Connors: All right. Let’s look at that last dag picture here. You can see the big change is we have a new color. We have our exposure. We now have very close to end-to-end visibility into our entire data transformation. That’s huge. That’s an unbelievable achievement here. We now know exactly what data our project depends on.
You know, whether it’s fresh, we know exactly how it’s transformed in a series of well-tested and well-documented dbt models. And finally, we now know exactly where it’s being consumed in our reporting tool. If any issue crops up at any point along this flow, you can now tell exactly which of our reports are impacted who to go and talk to about that, and we know exactly how to debug this process. [00:28:00] This is an unbelievably big achievement here. I think I just want to make sure people understand how cool that is. Okay.
[00:28:08] Before We Part #
Dave Connors: Just about to wrap up here. I think I just have a couple minutes and I know I’ve just covered a dbt lifecycle in less than 30 minutes.
So I probably left you all with a ton of questions. That’s kind of the funny thing about growing up here. There tend to be way more questions than answers, but don’t worry. I’m going to leave you with some resources here. Very first and foremost, as I’ve said several times over, the repository is going to be kind of the living document for this idea of maturity.
Most of everything that I talked about today is already represented in that project on GitHub. I envision this being a kind of the source of truth for this composite opinion about project maturity here. As new and cool things get introduced into dbt we’re going to do our best to maintain that here.
So feel free to open issues, discussions, PRs, whatever you’d like. Secondly, by the end of the day today, we’re going to have a version of this talk on our newly minted dbt developer blog on our doc site. The only reason to revisit this [00:29:00] is if you really want to revisit my tortured jokes here about growing up, building a project as a child.
And lastly, of course we have the Slack chat here that everyone has been contributing to. I hope everyone has been very kind to my dad whose first day in the dbt community is today. But we’ll continue the conversation there. I’ll be sure to share slides and contribute to your questions as well.
So yeah, thank you so much. I’ve been Dave Connors. That’s my Twitter and my email. I want to give a huge, huge thanks to Will Well who was instrumental in helping me develop this talk. And Andrew, Jason, Pat, David, and Winnie were huge, huge helps to kind of editing this content too. So that’s it. I hope everyone has a lot of questions for me.
I look forward to continuing the conversation in the chat here.
Last modified on: Apr 19, 2022