Table of Contents

Operationalizing Column-Name Contracts with dbtplyr

Emily leads a team at Capital One dedicated to the development of sustainable data products (including datamarts, internal analysis toolkits, and self-service business intelligence capabilities) for business partners ranging from entry-level analysts to senior executives.

Complex software systems make performance guarantees through documentation and unit tests, and they communicate these to users with conscientious interface design.

However, published data tables exist in a gray area; they are static enough not to be considered a “service” or “software”, yet too raw to earn attentive user interface design. This ambiguity creates a disconnect between data producers and consumers and poses a risk for analytical correctness and reproducibility.

In this talk, I will explain how controlled vocabularies can be used to form contracts between data producers and data consumers. Explicitly embedding meaning in each component of variable names is a low-tech and low-friction approach which builds a shared understanding of how each field in the dataset is intended to work.

Doing so can offload the burden of data producers by facilitating automated data validation and metadata management. At the same time, data consumers benefit by a reduction in the cognitive load to remember names, a deeper understanding of variable encoding, and opportunities to more efficiently analyze the resulting dataset. After discussing the theory of controlled vocabulary column-naming and related workflows, I will illustrate these ideas with a demonstration of the {dbtplyr} dbt package which helps analytics engineers get the most value from controlled vocabularies by making it easier to effectively exploit column naming structures while coding.

Follow along in the slides here.

Browse this talk’s Slack archives #

The day-of-talk conversation is archived here in dbt Community Slack.

Not a member of the dbt Community yet? You can join here to view the Coalesce chat archives.

Full transcript #

Elize Papineau: [00:00:00] Hello, and thank you for joining us at Coalesce. My name is Elise Papineau and I am a senior analytics engineer at dbt. I’ll be the host for this session. The title of this session is Operationalizing Column-Name Contracts with dbtplyr, and will be presented by Emily Riederer, who is a senior analytics manager at Capital One.

Before we get started, I just want to point out that all chat conversation for this presentation is taking place in the #coalesce-column-name-contracts channel in the dbt Slack. If you’re not a part of that chat, you have time to join right now. Please visit and then search for the channel #coalesce-column-name-contracts .

When you enter the space, I feel like I’m reading documentation allowed here. We encourage you to [00:01:00] ask other attendees questions. Please make comments react in the channel. We’re all here. Together doing this interactively after the session, the speaker will be available in the slack channel to answer your questions.

However, we encourage you to ask those questions as they come up at any time. Okay. Let’s get started over to you. Okay.

Emily Riederer: Hi, everyone. Thank you so much for coming today. And I’m thrilled to be here at Coalesce to talk about how we can use column names, to feel more robust data pipelines. So I think really our main function over the. We have so many great tools now to tackle the technical challenges of the modern data stack from everything from extracting loading data, all the way through that last mile testing and documentation.

But there’s still some glue that’s missing to tie all these pieces together. And I don’t just mean an orchestration engine. I would argue [00:02:00] that some of the people challenges are not fully solved. There’s a sense of communication that needs to exist and persist throughout our whole data pipeline. Or a lot of the work goes to waste and that communication can take many different things.

Between different data, engineers and analytics, engineers handling different steps of this process or between data users or data producers, a different way to think about this is each step relies on a shared set of expectations of what the input and what the output is going to be or else, no matter how good our pipeline is.

How technically correct. They are in terms of our understandings of the specs. They feel wrong and they’ll seem wrong. I suspect everyone here today has probably already been on one side of this dilemma where either we as users think data’s broken or pipeline not flowing through simply because some upstream engineer actually has that data loading at a different cadence than we expected, or a metric doesn’t [00:03:00] behave as we expect.

To, to different business users. How did different concepts sucked over a student business concept or variable was supposed to act so without clear expectations and a clear understanding between users data quality is in really serious jeopardy and put a little more simply we need some sort of contract between all of our data producers and all of our data users to build a really good database.

And if I think about what other things persist throughout this whole light cycle, that we can maybe attach those contracts to. I think about column names. So I came on to this idea of what if column names could be that contract, that thing that persists really rich metadata throughout the whole process and aligns all of the stakeholders in building our data pipeline.

[00:03:51] Column Names are Contracts #

Emily Riederer: And of course, all of us that are here Coalesce really loved the modern data stack because it’s pulled all of these best practices from technology and [00:04:00] software engineering, dev ops into the process of making data. So naturally when I started to think about how do we form these contracts? I turned to more inspiration from software engineering world, and I thought of three main ways the contracts can get made in software products.

The three that I came up with were through interfaces, through configs and through code itself. And put a little finer point on that interfaces really helped developers communicate with empty users can fakes can help developers communicate with other developers and. Code helps developers communicate with machines that are actually the ones that do all of the heavy lifting.

So this whole got me thinking, how could call names effectively pursue all of these three purposes? How can common names act as interfaces, act as configs and act as code. And that is what I want to explore today with the help of [00:05:00] course, of both dbt and a package that I’ve recently developed called dbtplyr.

[00:05:06] Columns names are the user interface of our data #

Emily Riederer: So first let’s start at the beginning. How can column names act like interface? To answer this question. First, we have to think a little bit about what our interfaces are and there’s that layer on top of the data on top of any sort of tool that helps that is what users actually see and interact with.

And I think it’s pretty clear that’s what column names do. They’re the things that we type in our code and benefit from the rich, underlying functionality of the day. That lingers behind those names. And you might question on this slide, why I say that data has functionality after all, isn’t it just some static artifact, but I actually think data’s a very living and breathing thing that is very important to think of as the functionality it has, that can be communicated through an interface.

[00:05:58] Data has functionality #

Emily Riederer: So for a brief example of this, [00:06:00] we can even think of something as being. As a login flow. If I’m modeling data for a user log in to some sort of an app, I can think about the complex process that symbolizes login from loading a page, typing username and password entering captured proof. You aren’t robot clicking some buttons hoping to get verified doing multi-factor authentication.

And finally, if all of that works, these are long set. But if I want a model one single log in event, one single log in table, I’m forced as an analytics engineer to make a ton of decisions about how all of that functionality in the data engineering process should show up in my data. There’s very clear.

Who, what, when, where, why of, what are we representing? Where does it come from? When does it manifest? And all of these things are things that we have to keep in mind when producing data and our users have to keep in mind while consuming that. And it can have massive impacts on their results simply by thinking about which of these 16 [00:07:00] events we actually decide to define as that log and event can have massive implications on which records, get loaded into an end table.

What’s considered a success and failure and could even dramatically impact what an analyst might calculate is as simple as a metric, as simple as. So all of this sort of functionality are the things that we ideally would want our data interface to capture. And I think we can learn more useful lessons by thinking about how interfaces usually capture some sort of dysfunction.

[00:07:34] Interfaces make performance contracts #

Emily Riederer: For example, they might use universal symbols to communicate in tent for a trash can to delete things or a floppy disk to save items, which is really intuitive. At least if you’re old enough, like me to remember what a floppy disk skips next interfaces can use grouping and put like things together.

If I know something’s nearby, I have a good chance to think that maybe it’s. Similar purpose. And [00:08:00] finally we have aesthetics, bright colored buttons, and colors can communicate things like places where caution is necessary. And these sorts of symbols can map really closely to the kind of functionality that we might want to hint our users about in our.

For example, we might want universal symbols, digital things like, yes, I’m a binary variable grouping could help us find all of our binary variables together in one place very easily and aesthetics and warnings could help encode and warn us when those binary variables might or might not have nuts.

So hopefully some of this is intrigued you into why our column names might want to act like. But next time on, get a little more tactical about how we might do that. And I propose that you step approach using a controlled vocabulary. First, we can define a simple set of stubs, thin codes, a lot of those semantics, those who, what, when, where, why questions [00:09:00] that we asked before and carries along some of these performance contracts.

Second, when we go to name our columns, we can simply compose these very simple. Stubs that have very clear, specific meanings into permutation of rich and complex concepts that really clearly characterize their data. So I’ll walk through a brief example of this, and I don’t pretend that you’d have to use the exact same controlled vocabulary.

[00:09:24] An example vocabulary #

Emily Riederer: I do. I just want to show the underlying principles at work. So as one example, we could have our first level of. The sort of angle a layer that represents the different types and purposes. The variables like indicator variables count variables is end date variables as DT. These are probably like prefixes that you’ve used or seen before, but we could go a step further by defining very clear and specific semantics for each one of these.

For example, an invariable is a binary zero one variable and [00:10:00] critically. We could specify that the rest of the name always defines the condition that’s tagged as well. Eliminate that ambiguity that can result in people sometimes incidentally, flipping the sign, but beyond just communicating semantics, these subs can also communicate performance contracts.

So you might notice that IND and BIN have similar semantics. But we could feed there to find something like Ind to also require that the variable is always not know. Whereas Bin might be a little more permissive. Similarly, we could promise our very, our users that date variables are always sent 8601 ISO format.

And you can imagine this extending to subsequent layers of controlled vocabulary next, perhaps having an entity layer that defines concepts like user and law. Once again, we could eliminate that confusion of what on earth is the login by attaching these stubs to very clear and specific semantics.

And [00:11:00] in addition to contracts, these steps can also carry along with them. Consequences. For example, if we are decided to define unique users by their IP address, we note that in some cases that might or might not be unique. So every time someone in. The stub, user and data, they should have that in the back of their mind.

And this can be a very powerful exercise, even just going through the process of defining these stubs with our business stakeholders, because by having these rich discussions about what every single word in our data set should mean, we also get a very clear sense of what it isn’t. We eliminate all of that ambiguity and can drive towards more consistency throughout an entire data.

So putting this all together, you can imagine us defining more subsequent layers of possible stubs. And when we go to define our column names, pre-meeting these together into those really rich and expressive concepts that are immediately intuitive. [00:12:00] And now that’s true back to some of the ways that we thought about building a neutral face before and see how stubs can serve all of these purposes.

We’ll see how universal symbols can fuel programmatic wrangling. How they enable grouping, which aids and data discoverability and how they act like aesthetics that can communicate warnings and type pets first using these universal symbols can make it easier to wrangle data. For example, if I’m a data analyst using data that was set up this way, and I want to take the mean of all indicator variables in my day-to-day.

I can very easily manipulate all of the column names, extract all of the variables that start with end and apply transformation to those. Secondly, by grouping things with similar names, we can make it easier to find fields we need. It makes them more immediately searchable. Or if you’re using a nice IDE that has auto complete, simply by typing the beginning of that name, we can pull up all the variables of a [00:13:00] similar type and easily browse through them to find the one that we need.

And finally data UIs, can increase the caution in a dataset by helping users use fields correctly. If I want to calculate the rate of successful logins in my data set, and I just have a login field that maybe I don’t want. Can I have knows maybe I don’t realize that it’s only considered a yes or no once it’s passed capture.

If I actually just take the mean of the log-in column, I might think I was calculating the proportion of successful logins than actually calculating the proportion of successful logins. Conditional I’m completing some actual step by not understanding that data had notes. I could fundamentally deep calculating a different business concept altogether.

But if my variables are more meticulously named with intervention, I either know that I can take that average confidently with no sorts of transformations required or [00:14:00] indicates they’ve been variable. I know that I maybe have to apply additional treatments to that variable to get it in the state that I need, but either with either of those outcomes, it’s much more apparent what I wearing my analyst.

Need to do, because I understand inherently what data analytics, engineers, already did. So in these ways, column names can be a very powerful interface for developers to communicate how data is intended to work to data consumers. But the benefits don’t stop. Because calm names can actually also act as configs and make life easier for us as the data developers ourselves.

And we can see this by reflecting on first, what is the config file? And of course, all of us that use dbt are very familiar with config files and know that they’re very good way to efficiently collect out inputs that are used throughout our project. And hopefully you can already start to see the connection.

Between [00:15:00] all the config files and defining stubs in a controlled vocabulary, thinking back to some of the steps we defined before and they’re related contracts that immediately helps guide us as data analytics, engineers, how we need to treat that data and what sort of steps we should take when producing that.

For example, if I wanted to code up a bunch of tests using something like great expectations to test that all of my contracts in the state of set were met. I don’t have to spend a lot of time reinventing the wheel and thinking about what checks do I need to do. And the reason for that is because those are implicitly tied into contracts themselves.

For all and variables. I know I should be checking those integers at etc. It’s pre-specified for me. And these config files are really nice too, because they embed this concept of entering put one place and use it everywhere because these steps can actually have a lot of power to drive standardization throughout an entire database.

For example, [00:16:00] before we even start the project, they can help us create a shared understanding between data producers and data consumers. They can set correct expectations for when that data will be available in loading. They can reduce kind of the metal overhead when we’re manipulating the data of what should we name it because that’s pre-specified.

And as we just saw the Canadian, figuring out what tests to run, and you’re like pretty, self-documenting pretty explicitly clear from the names. What’s that variable is doing, but beyond being input once use everywhere in my mind, didn’t even bigger benefit as a developer is their ability to help us drive consistency by being changed once update everywhere.

So suppose for going along once again, building our login state of pipeline. And when we reached the testing stage, We realize, oh, the non-male test is failing and we think Datalogic, and we realized for some fundamental reason that variable absolutely has to have nodes sometimes. [00:17:00] In kind of an ordinary workflow, you might put some band-aid fix on at the stage and only continue to move forward.

But in fact, by using these names, we’re guided through the whole process of thinking of all of the changes we need to make and all of the implications that this change has on our data pipeline. So for example, realizing that data must have. We noted that data probably needs to have a different name.

And that kind of pulls us back to the beginning of the process, thinking about how we’re extracting it differently, where the notes are coming from and renaming that and massaging the data accordingly. This also immediately helps trigger in our minds if we need to be applying different transformations to the data than we were previously.

And similarly different tests. We obviously don’t want to run a non-male test on a variable that we’re explicitly allowing to happen. And finally, because of the self-documenting nature, we know that once we’ve changed the name [00:18:00] that, that variable like has different meaning it has different intrinsic properties that should be communicated through any data dictionaries and meta data.

So when column names act, config files, they have this very powerful ability to help analytics engineers immediately understand what are the steps in the process. I need to knack to build high quality data, but even better is if column names could not just be that dictator that tells us what to do. It’d be nice if they were more of a true partner in the.

And actually help do the heavy lifting themselves. And this is where we reached the point of thinking about column names as code, because so far when we’ve just talked about intent, there’s a lot of benefits, but there are also a lot of inherent risks. For example, bad contracts can really be worse than no contracts at all.

[00:18:55] Bad contracts are worse than no contracts #

Emily Riederer: If we’ve made users these promises, but don’t deliver, we can drive [00:19:00] inconsistency with names, with a lot of typos that make it harder for users to actually write simple queries. We can cause infidelity where we’ve promised users that we’re upholding certain data quality checks, but we actually aren’t or through too much how to bait mation.

[00:19:15] Code turns theory into practice #

Emily Riederer: We can actually evade detection of errors instead of actually solving for them. But by regarding column names, as. Code can help us turn this theory methodically into practice. And this is particularly aided with the help of dbt. For example, we can use Jinja templates to drive that consistency and avoid typos.

We can increase the fidelity of our data through both using custom macros and dbtplyr. And finally, we can improve accuracy within pipeline. So for the rest of this talk, I want to get a little more tactical about how to put these ideas into practice. But first to do that, I want to say a few words [00:20:00] about my package, which is intended to help them do.

[00:20:04] dbtplyr #

Emily Riederer: dbtplyr is intended to help maximize the benefit of column named contracts by providing some nice help there functions that allow you to quickly select out columns based on certain properties of stubs that appear in the name such as sub column names that start with and with, or contain certain key phrases and then apply transformations more.

Over a set of column names, either in a stuck clause or even in worst clause and all syntax here. And a lot of the underlying philosophy of this approach is heavily inspired by the excellent art package dplyr, which is really a case study in great design, which is very much been an inspiration for me in this work.

So to give a little bit clearer view of what, how this actually. I’ll show just one quick example. So here we can use dbtplyr to both [00:21:00] get all of the column names out of the dataset that we intend to work with. And subset list of columns that are only the column names in that table that start with end.

And then we could write a query using the cross function to apply an average transformation to all of my indicator variables. And when that query is rendered, get one nice statement or one nice. row and our code taking the average of every single indicator variable we found in the previous step. And similarly, if we wanted to, we could apply similar sorts of transformation, similar sorts of iterations in the workforce as well.

So thinking about then both dbt and dbtplyr, we’ll see how we can put some of these ideas into. The first situation we might want to avoid is building is breaking our contracts in ways that really frustrate users. Like I mentioned, we could have those dreaded typos where we either put steps in the wrong order, [00:22:00] misled her out of one stub and make it really hard to remember what should those variable names is a little bit different when we go to write a query.

So to mediate this, ideally we’d probably have some perfect SQL script where every single one of those transformations is written completely and correctly and spelled correctly. But the more of these transformations we add into our data, the higher thrust that we do something wrong at some step along the way.

So to mediate this, what we can do is we can stead define any conditions that we want to iterate over and use ginger. Like for loop to create that consistency, or logic only once. So we can’t mess up either on the logic or on the naming conventions themselves and dynamically create those variables.

Of course, even worse than frustrating users is lying to them. And that’s what we do. If we can, if we break the contracts, we’ve promised. If I have a [00:23:00] variable like date login, and I want to use that to figure out how many users logged in on certain date. If I filtrate that data is if date is the date variable, but it’s actually time.

[00:23:12] Custom macros + dbtplyr #

Emily Riederer: The kind of horrifying thing that could happen is I could end up with zero results, not at all what I was looking for, and that would essentially be helping our data lie to our users. So clearly what we’d want to do is ensure that always every time we’re using one of these various. We’re testing them to the correct data type that we promised to users.

And we could do this manually, but if we later add new co column or field to our dataset, that means we might forget to add it to the step of the pipeline. There’s a risk that we don’t keep things totally consistent. But once again, this is where both custom macros and dbtplyr can enforce this consistency.

We can write logic. Extracts all of the column names again with the field, like date and name [00:24:00] and applies the transformation, casting it to the date in a way that’s flexible enough to adjust when new fields are added to the pipeline. Of course you can imagine cases where this might actually break down.

Testing might be great for a date field. They could handle times. But think for a minute, what happens if we just blindly cast fields to integers? If our data looks like fender line data in column and a, which is integer but just not, doesn’t have those zeros chopped off this transformation works great.

However, if our data is like data and B, where it feels like something went wrong earlier in our pipeline by testing it, we’re not actually solving that problem. We’re just hiding it. And that’s not good either. So that motive really motivates the need for testing any unenforceable contracts. And this is another great use for dbt’s internal testing functionality.[00:25:00]

Once again, ideally for a case like this, we would test for every single and variable before we do that cast integer that the absolute difference between the cast and the uncast versions was pretty much. But once again, if we add a new column to the dataset there, ways that this test could become littler obsolete or not sufficiently robust.

But if instead we wrote this test using dbt and dbtplyr, we can once again, make it more dynamic, plowed all of the variables that start with end in the name and automate this test. So it will always supply to all of our invariables and is flexible. Finally, the other exciting thing we can do is test metadata properties of our data and not just state itself.

For example, another way this could become really annoying for users is if we’re consistent, but incorrect. Say I define in my controlled vocabulary that all of my fields should start with [00:26:00] N but in reality, all of them start with numb. That’s going to make it really annoying to query, but the final type of tests that we can do.

Help us contest names themselves and not just the values. So by querying our data’s information schema and breaking part, the components we can test fit only allows. Our and our data and easily identify any invalid named components with Vertel deep dbt testing. Similarly, we can also test the correct mix between stubs and data types.

Once again, query the information, schema, your practice components and filter to allow to make sure we’re only allowing correct combinations of. And datatypes in the dataset. So by using the power of dbt , we can transform this theory into practice and in a really [00:27:00] rich way to automate a lot of the busy work of building robust pipelines that are very consistent.

Very faithful to our intent and highly accurate. And this completes our journey of how column names can act very richly as interfaces, configs, and code as we build a data pipeline. So in summary to build a good data pipeline requires excellent communication between all stakeholders and between the many different steps in tools we have in the.

And because we need those communication takes us throughout the whole process. We need to embed them in something else that lasts throughout every step and column names can be a very effective way to communicate throughout the state of life cycle for the streets. Column names can be very effective contracts with a little bit of discipline and a little bit of foresight to add a ton of value in building very [00:28:00] communicative, very intuitive and very robust pipelines.

Thank you.

[00:28:08] Q&A #

Elize Papineau: Wow. Thank you so much, Emily. That was inspirational. And I definitely leave that. Wanting to implement dbtplyr on a few of my current projects. That was really exciting for me personally. We do have a couple minutes for questions here, and we have a couple minutes from a couple of questions from Slack.

I’m gonna read a loud. Please forgive me if I mispronounce any names. So the first question is from Josh. What’s the role of column metadata in this type of system. For example, binary is a data type. What’s the reason for also including it in the column name?

Emily Riederer: I think that’s a really interesting question.

And I think it actually relates to a related concept called Hungarian notation. That like way, way back when, in the earlier days of I think C and C plus programming was the thing I think the distinction I make [00:29:00] is I agree that it’s would be silly to include in column names, things that are like purely, just.

Type information in a way I, because like you say, you can get that in other places, you can get it from metadata, from information schemas. But I think in my mind, probably I think part of that distinction is when those, like, when those kinds of like type information crosses over. So it’s not just about the type of yes, this is a teacher.

But when it starts to hint a little more at the actual functionality and semantic meaning of that dataset. So I don’t necessarily care something’s an integer because it’s an integer. I cared because it’s I know, I want to know that it’s telling me that certain information is account, or I don’t necessarily care.

Something’s like a binary field in terms of it’s and coding, but I carried it. It’s like always going to be zero and one. Because that means I can take the average, even calculate a proportion maybe I don’t want to do that. So I think it, it’s a [00:30:00] mix that it does get close to encoding disliked type information.

But typically I I try to like mesh stubs towards, I guess combining both like that technical information with a sense of semantic information as well.

Elize Papineau: Okay, next question is from Bingu. Who asks how do column name, ask a contract handle semantic changes. For example, a knowable column becomes not no, only in the future.

Does that mean the column should be renamed?

Emily Riederer: I feel as like kind of this framework exists today? I think my gut reaction would be, yes. I, you can certainly imagine that there are many downsides that like that approach in terms of But look, let me say it this way. I think the downsides and the upsides to that approach are exactly the same.

On one hand, it could be annoying for users to have to go back and change that column name, but at the same time, [00:31:00] if we are changing something so fundamental, the data is allowing it to have knows where it didn’t have that before users will probably, unless they are made aware and forced to confront.

Terry, they’re like previous understanding of what was allowed in that data with them. So I think by changing the column name, I think there’s almost a benefit of failing noisily and just forcing people to confront that change and accepted and internalize what that actually should mean for them and for their work.

Elize Papineau: That’s a great answer. I see a lot of analogy there, but just between like dbt tests as a whole, when, there’s a miscommunication in your data source to your, your, the data you’re building and how that test can really tell you when those underlying changes happen, that you’re unaware of.

That’s a really resonates with me. We have another question from honors in the. Who says, I love the perspective brought to SQL via dplyr as a dplyr [00:32:00] stand myself. That was forced to learn when I switched SQL and dbt was a breath of fresh air. To me. What dplyr conventions and experiences do you love that are still missing from Jinja and sequel?

Emily Riederer: That is definitely a great question. Cause I definitely have spent quite a lot of time in the art community, myself and in the. Huge fan. I think of just like they’re workflow and style. I think that is definitely something that I will say attracted me to the dbt community, because I think a lot of the things that I see a lot of similarities in the workflow of making things more modular and reproducible, I think dbt has now probably extended a lot of the functionality that I would have felt was missing from pure sequel.

Like getting into a little more of the like tight yard space, making it easier to pivot data, which is available in all SQL syntaxes. And do a lot of those things. I think probably the biggest thing that I, as an art user love [00:33:00] that it’s not really something that. It’s easy to do in SQL, but I think it’s probably more of a database thing than a strictly a dbt thing is an RMA, like huge fan of list columns and nesting data and having a dataset updated.

Of course from SQL, you can do similar things with putting a lot of Tucson in column or something, but I think that’s one really native to artwork flow that I really love. And I really missed the point of using a different language to really see that.

Okay, I’m going to go ahead and. This next question, based on a personal bias of, I also had this question roughly typed out Juul lab as a lot of the magic of dbt is it’s just equal as you wrap more macros into your code, into more of your car, less true, but you can make it harder to onboard people.

How do you think about those tradeoffs? Is it worth it because the team are more effective and can shift more reliably. And then do you wind up [00:34:00] internalizing the syntax eventually?

No, that’s that is indefinitely in, I feel like the perpetual question between like simplicity versus expediency in some ways.

I like, I definitely no matter what language I’m doing, there’s also a psych kind of a part of me that like loves to resist using packages. See how far I can get on just like they’re all language itself. I think there definitely is a cost that like, when things are just, slightly like prettier, semantics.

I tend to resist, I think the kind of macros, I think that really excite me and really make me, I’m willing to start adding dependencies into my code. Ari when I start getting to places where there things that I think truly cannot be done just with their all language itself, for example.

I think [00:35:00] something I really like, and I like, like with dbtplyer that I find very handy is just to be able to say yeah, take the sum of all of these variables you’re taking me. And it’s oh, these variables and not having to like manually type things out line by line, because I think at some point I reached the other trade off.

There’s risk from tools and dependencies, but there’s also risk from human error. And I guess that’s maybe the fundamental trade-off that I way of it’s if there’s something I can like force to do relatively simply with no dependencies, that’s probably the safer approach. But if there’s something where like I’m at more risk, myself messing up, making a typo, doing something wrong for getting about a variable, or I just don’t know.

Type a hundred column, different column names that I want. Some of that’s maybe where the, like the risk of the machine becomes less so than the risk of the human.

Last modified on: Sep 21, 2023

dbt Learn on-demand

A free intro course to transforming data with dbt