Table of Contents
- • No silver bullets: Building the analytics flywheel
- • Identity Crisis: Navigating the Modern Data Organization
- • Scaling Knowledge > Scaling Bodies: Why dbt Labs is making the bet on a data literate organization
- • Down with 'data science'
- • Refactor your hiring process: a framework
- • Beyond the Box: Stop relying on your Black co-worker to help you build a diverse team
- • To All The Data Managers We've Loved Before
- • From Diverse "Humans of Data" to Data Dream "Teams"
- • From 100 spreadsheets to 100 data analysts: the story of dbt at Slido
- • New Data Role on the Block: Revenue Analytics
- • Data Paradox of the Growth-Stage Startup
- • Share. Empower. Repeat. Come learn about how to become a Meetup Organizer!
- • Keynote: How big is this wave?
- • Analytics Engineering Everywhere: Why in the Next Five Years Every Organization Will Adopt Analytics Engineering
- • The Future of Analytics is Polyglot
- • The modern data experience
- • Don't hire a data engineer...yet
- • Keynote: The Metrics System
- • This is just the beginning
- • The Future of Data Analytics
- • Coalesce After Party with Catalog & Cocktails
- • The Operational Data Warehouse: Reverse ETL, CDPs, and the future of data activation
- • Built It Once & Build It Right: Prototyping for Data Teams
- • Inclusive Design and dbt
- • Analytics Engineering for storytellers
- • When to ask for help: Modern advice for working with consultants in data and analytics
- • Smaller Black Boxes: Towards Modular Data Products
- • Optimizing query run time with materialization schedules
- • How dbt Enables Systems Engineering in Analytics
- • Operationalizing Column-Name Contracts with dbtplyr
- • Building On Top of dbt: Managing External Dependencies
- • Data as Engineering
- • Automating Ambiguity: Managing dynamic source data using dbt macros
- • Building a metadata ecosystem with dbt
- • Modeling event data at scale
- • Introducing the activity schema: data modeling with a single table
- • dbt in a data mesh world
- • Sharing the knowledge - joining dbt and "the Business" using Tāngata
- • Eat the data you have: Tracking core events in a cookieless world
- • Getting Meta About Metadata: Building Trustworthy Data Products Backed by dbt
- • Batch to Streaming in One Easy Step
- • dbt 101: Stories from real-life data practitioners + a live look at dbt
- • The Modern Data Stack: How Fivetran Operationalizes Data Transformations
- • Implementing and scaling dbt Core without engineers
- • dbt Core v1.0 Reveal ✨
- • Data Analytics in a Snowflake world
- • Firebolt Deep Dive - Next generation performance with dbt
- • The Endpoints are the Beginning: Using the dbt Cloud API to build a culture of data awareness
- • dbt, Notebooks and the modern data experience
- • You don’t need another database: A conversation with Reynold Xin (Databricks) and Drew Banin (dbt Labs)
- • Git for the rest of us
- • How to build a mature dbt project from scratch
- • Tailoring dbt's incremental_strategy to Artsy's data needs
- • Observability within dbt
- • The Call is Coming from Inside the Warehouse: Surviving Schema Changes with Automation
- • So You Think You Can DAG: Supporting data scientists with dbt packages
- • How to Prepare Data for a Product Analytics Platform
- • dbt for Financial Services: How to boost returns on your SQL pipelines using dbt, Databricks, and Delta Lake
- • Stay Calm and Query on: Root Cause Analysis for Your Data Pipelines
- • Upskilling from an Insights Analyst to an Analytics Engineer
- • Building an Open Source Data Stack
- • Trials and Tribulations of Incremental Models
dbt Core v1.0 Reveal ✨
It’s been five years; it’s time to cut the ribbon. Jeremy will offer a highlight reel of the biggest changes included in dbt v1. 0, as well as honorable mentions for some neat ideas that ended up on the cutting-room floor.
Major Version One means major stability. We’ll discuss the commitments we’re making to every dbt user, and how upgrading versions can be a joyful experience.
We’ll also start answering the question on everyone’s mind: What comes after one-point-oh?
Browse this talk’s Slack archives #
The day-of-talk conversation is archived here in dbt Community Slack.
Not a member of the dbt Community yet? You can join here to view the Coalesce chat archives.
Full transcript #
[00:00:00] Barr Yaron: Hello, and thank you for joining us at Coalesce. It’s been an amazing day so far. I’m still recovering from that keynote. My name is Barr . I’m a product manager here at dbt Labs. I’ll be the host of this session: dbt v1.0 reveal presented by my good friend and colleague Jeremy Cohen, lovingly known as Jericho.
First. Some housekeeping, all the chat conversation is taking place in the #coalesce-dbt-reveal channel of dbt Slack. If you’re not part of the chat you have time to join right now. Visit our Slack community and search for #coalesce-dbt-reveal when you enter the space. So why is this a reveal? It’s been five years and Jeremy is going to offer a highlight reel of the biggest changes included in the launch of dbt v1.
Jeremy has been at dbt Labs since January, 2018 as the second debatable [00:01:00] junior data analyst. He discovered the power of using dbt to rapidly stand up analytics projects, refactor legacy code, replace outdated infrastructure. dbt Labs has gone through some major transformations, pun intended from being a consulting shop called Fishtown Analytics to now a product serving some of the largest and most complex organizations.
As dbt Labs transitioned into its current incarnation. Jeremy also transitioned from client facing analytics engineer to a community facing product manager, maintaining dbt Core and charting the course for its future. Now dbt is turning 1.0 today and Jericho is turning one year older tomorrow. If you have any wishes for Jeremy and his new year of life, you can throw them in the.
We would not be here today without the incredible community. So many of you are to thank for your contributions to dbt Core and to each other. Many of you have also interacted with [00:02:00] Jericho .Hi Jericho, and you know how thoughtful he is. I’ll often stumble upon a GitHub issue to see Jeremy’s incredibly thoughtful, nuanced, and future-facing multi paragraph comments.
He has a way with words and languages, not just English and get hub issues, YAML ancient Greek, Latin or French, all the regular stuff. He’s dialing in from his new home in France. I’m excited for him to lift the curtain on major V1. It means major stability commitments to every dbt user easier and joyful upgrades.
And I’m excited to hear Jericho you talk about what’s going to make the cut, what didn’t and what comes next after one point oh. After the session Jericho will be available in the Slack channel to answer your questions. And just as a reminder, this is not like an in-person event. You’re not supposed to stay quiet.
You can ask other attendees questions, ask Jeremy about France, make comments or react at any point in the Slack channel. I’m going to stop talking. I can’t wait to get it started [00:03:00] and I’m passing it over to you, jeremy.
[00:03:02] Jeremy Cohen: Thank you so much for. Welcome everyone to the dbt Core. 1.0 reveal. I’ll be your host, your guide, your master of ceremonies.
But really this is your celebration. Barr said enough. I’ll just say say I’m Jeremy, Jericho in Slack. If you’ve seen me there. And I want to take us through the last five years. What exactly we mean when we say V1 of dbt Core a little bit in this release, looking ahead in the short term, and maybe even the longer term, no promises and finally a chance for all of us to cut the ribbon and mark this momentous ceremonial.
So the last five [00:04:00] plus years, this is one way of looking at them. This is a chart that maybe some of you have seen before. It’s weekly active projects for dbt running everywhere. It started way back in 2016 with a handful, maybe a couple dozen, and just the other day we pass the 8,000. This is one way of looking at that metric, if you will.
In this view it looks monolithic. It looks inevitable, but really it is a bunch of different dbts running over that period of time, more than 20. And with each dbt along the way, there have been new features, new bug fixes, new approaches to the same underlying principle problem, the same viewpoint, the same approach, which now we call analytics engineering.[00:05:00]
[00:05:00] Ancient dbt #
[00:05:00] Jeremy Cohen: I want to take us on a little bit of a stroll through memory lane. If you’ll indulge me, even though some of this. Ancient history, apologies in advance to Mila and to any of my former classmates or professors who stumble upon this, on the public internet later on with each of the years, each of the periods, each of the areas that we’re about to go through, I want you to shout out, call out comment with what you remember from that time.
[00:05:27] 2016 Age of heroes #
[00:05:27] Jeremy Cohen: Is that when you first came across dbt, is that when you first typed dbt run into your command line, let’s do a little room for remembering together way back in 2016. It was the age of heroes, the age of myth, and no, not much was written down at this point of time, but a lot has been passed down to us through the oral tradition and some of the most important stories, the most important top us and mitosis that still exist in dbt today. This is their origin [00:06:00] dbt works with post-stress and Redshift. We had models. We had tests, we had seeds, we had archives, we had multiple threads. We had just come out with incremental models that were just under 30 weekly active projects. That wasn’t a project with more than a hundred bottles.
[00:06:16] 2017 Archaic period #
[00:06:16] Jeremy Cohen: And there were really four people contributing to the dbt Core code. That brings us to 2017, the archaic period, this thing we have here, is it real, is it going to stick to. This again, quieter in the historical record, but we had some remnants through material culture. The addition of Snowflake, the addition of big query, some rework, maybe some innovation around macros, materializations packages, even custom schemas for models.
Something that still trips up many folks as they’re first getting started with dbt, but also one of its most powerful features. This configurability that it offers that. That brings us to 2018, the classical period [00:07:00] coming into our own. This is also when I arrived on the scene. We’re starting to tell more compelling, more complex stories with dbt, the birth of tragedy and of comedy.
[00:07:11] 2018 The classical period #
[00:07:11] Jeremy Cohen: This is also the year when like our predecessors, Classical Athens who had an upon a mess on our con and like the Romans who named their years after the console, we started naming our dbt versions after famous Philadelphians. So we had Betsy Ross, Isaac Asimov, Lucretia Mott, Guion Bluford. The addition of hub.getdbt.com.
[00:07:36] 2019 The early republic #
[00:07:36] Jeremy Cohen: The addition of the doc site, real science fiction. The addition of the adapter cash speed up dbt projects that first big foray into performance at scale, because we now started seeing projects in hundreds and hundreds of models. 2019 was a building near the early Republic. There’s work to be done.
Grace Kelly, Steven, Gerard wilt, [00:08:00] Chamberlain Louisa may Alcott faster diags sources, the split apart of adopter plugins initiative. Presto spark. I know a first RPC server, a first cut at partial parsing. A first cut at structured logging. We hit over 1000 weekly active projects. Last year, 2020, a golden age of dbt literature, flexing some muscle.
[00:08:25] 2020 A golden age #
[00:08:25] Jeremy Cohen: What can’t we do some things that are near and dear to my heart. The addition of the Metta property by Taylor Murphy, community contributor serves a shout out the addition of ginger expressions everywhere. I This is, you didn’t know that you didn’t used to be able to do that and especially close to home for me.
Notes, selection, slim CCI exposures, all coming in. Oh 18 Marian Anderson last. More than 3000 weekly active projects, more than a hundred unique contributors to the dbt Core open source code base. And we started to see some [00:09:00] very big projects, unprecedentedly big, and we started to see interest in dbt. That was also a bit unprecedented, which made 2021 a year of stable foundations, pax dbtana
[00:09:15] 2021 pax dbtana #
[00:09:15] Jeremy Cohen: And an era ends. And another one begins as we see weekly active projects go beyond 8,000 projects, go beyond 2,500, 3000 models strong. And the number of contributors go into the foreign to the triple digits. Most recently, this looked like the release of version 1.0 web Dubois. But following on the tail of work around metadata artifacts at the beginning of this year, tests and project partners.
The dbt build commands and configs. All of this has built up to its final conclusion. We can truly say that we found dbts CLI of brick and left it a framework of marble. [00:10:00] All right. That’s enough of ancient history, but I do want to test you to give you a quick quiz as with any quick history. So there are 8,000 plus projects today that run dbt every single week.
[00:10:14] Quick quiz #
[00:10:14] Jeremy Cohen: And the very oldest had been running since 2016, half of those projects first ran dbt in this month or later. Now this is just for fun. Feel free to throw up your guesses. But I actually do want to say whether you started using dbt five years ago or one week ago. It is very good to have you here.
You’re welcome. I want to borrow from what my colleague, Jason Ganz said yesterday and his amazing presentation about analytics, engineering being everywhere. Very soon, we are still in the good early days. The good old days of analytics engineering dbt 1.0 is an invitation to everyone. Thanks for those of you who have been along the [00:11:00] way, all along and to those of you just joining.
Answer. What is May, 2021? Yes. Earlier this very year. Not that many months ago at all. I do hope you answered in the form of a question otherwise, unfortunately, won’t be able to count it. Another question, V1 of what exactly are we talking? Let’s start with V1. For those who aren’t familiar, dbt Core is versioned following the semantic versioning specification, or SEMver for people who like to be cool and abbreviate things.
[00:11:34] Semantic Versioning Specification #
[00:11:34] Jeremy Cohen: Major version zero. That’s what dbt Core has been all this time. It’s for initial development. Anything may change at any time. The public API should not be considered stable. Boy we’d have big organizations, complex companies, lots and lots of folks are banking, their data stacks and their professional careers on dbt.
So it was high time to release 1.0 semantic versioning after [00:12:00] 1.0 a lot more relieving once your year you’ve made. There are some things that we are going to keep adding a new minor versions of dbt and some very specific contracts, very specific interfaces that can have really clearly communicated and documented changes.
But the code in your project when you upgrade is going to keep working. So for minor releases, but especially for pattern leases with bug fixes, don’t think just upgrade. All right. We answered the V1 part, but V1 of what exactly. That gets us into, what do we mean? What do when I say dbt V one, dbt Core V1, dbt dash under a lowercase -C Corp V1.
I hope you’ll bear with me. I think it’s a subtle, but an important distinction at the risk of doing one of these let’s get started. This is dbt. This is the platform undergirding, the modern [00:13:00] data stack the practice of analytics engineering. I hope I’m not being too hyperbolic. It is everything around metadata data transformation, testing, documentation development, deployment, all of the above.
This is a distinction that might be familiar to folks who have been coming to events like staging over the last several months. This is basically the same slide that I showed back in February, dbt Core dbt Cloud. There are pieces of dbt that are, and always will be open source because we think it’s so important that all of your business logic lives in a framework that you can take with.
That you can see that you can look at that you can participate in. So compilation and execution fully featured deployment agnostic, dbt Core will always be open to. Then at the same time, there are incredibly compelling things that we want to do for the full platform. For the slide I just showed you that want to happen, need to happen inside of a stateful [00:14:00] identity, aware browser-based user experience, or I should say everywhere based that’s dbt Cloud, most stable, reliable, and collaborative way to develop and deploy dbt projects.
So if this distinction sits well with you, Maybe this one is a little more mind bending. What do when I say there is dbt Core and dbt Core dbt Core, the Python package is this it’s a code base. It’s a pretty specific code base meant to accomplish a pretty specific thing. So in the broadest sense, If you’ll bear with me for this, another metaphor.
Yes. One more. If analytics engineering is this healthy growing young tree in the midst of the data ecosystem and dbt is at its very heart at its very trunk. If you will, where does dbt Core fit in here? Where does dbt Core the pipeline package? Take a cross section. We’ve got dbt [00:15:00] Core in the very middle dbt, the whole picture all around the outside and between these circumscribed circles.
There’s one more dbt Core. The participatory possibility of everything that is open source code maintained, developed upgraded put out with love by community member. So I think dbt space, capital- C Core, dbt Core, the realm of possibility. It looks a lot like this. It’s got dbt Core, the Python package right in the middle, but there’s a whole lot else going on all around.
There’s adaptor plugins for databases that are maintained by dbt Labs and lots of ones that are maintained by community members, by partner companies, even some new ones just in the past. There’s packages on the dbt hub maintained by us and by community members and by partners and other companies.
There’s [00:16:00] all of the metadata integration, starting from dbt docs, all the way, even to projects like open lineage, there’s new tools built on top of dbt, like light dash there’s all of the workflow, optimizations that folks out there have put out because out of the goodness of their heart and because they’ve been part of this open source community, they’ve wanted to get.
But this is dbt Core with the Python package at its center. And I think for dbt to be all that it can be. And not in the broadest sense possible.
We need something from dbt Core, the Python package we need V1 and we need it to be fast. We need it to be stable. We need it to be intuitive, extensible and maintain it.
Here’s all of those same words on a pyramid like here, people really like pyramids. This is a hierarchy of needs if you will. And I think it’s in [00:17:00] the right order because if it’s not fast, folks, won’t use it. If it’s not stable folks, won’t trust it and rely on it. It’s not intuitive. Folks won’t be able to get started and keep going.
If it’s not extensible dbt, won’t be able to scale to meet the needs of some of the most complex organization. And if it’s not maintainable, we won’t be able to scale to meet the needs of dbt, but these were our priorities this year leading up to one point. It means that while it was very tempting a year ago, even a year and a half ago to ask the question what should go, what should be part of dbt Core 1.0, it was very tempting to start here with some of the coolest, most exciting most mind-bending ideas.
That an open source community could come up with. These are all still amazing ideas. These are still things that I really want to do. They’re not the things that we prioritized for 1.0, that was things like [00:18:00] this because we needed stability. We needed maintainability. We needed things to be intuitive and fast and extensive.
So what actually went into the release. What do you get right now by upgrading to the. Okay. I hope you are with me when I say this is as much about the ceremony, about marking the occasion as it is about the specific contents of the, this is a mark of maturity it’s coming of age. It took a long time to get there.
A lot of work leading up to it. I promised that I wasn’t going to talk about my own bar mitzvah here, but much like that we had to study. We had to work. We had to make incremental steps on the way. All the way from oh 19 through 21. And then in the final V1 release, we prioritize foundational improvements.
So those same foundations I showed you earlier, dbt B one is fast by default [00:19:00] for everyone. Partial parsing is on and ready to go. The experimental parser is now our static, parser, and it’s on and ready to go. This translates to 100 times speed up in development for especially large projects compared to January.
[00:19:15] Foundations #
[00:19:15] Jeremy Cohen: First of this year, it’s stable. We totally reworked structured logging from the ground up so that it is ready to be the necessary input for applications that want to be powered by dbt Core in real time, real-time metadata and things like the new server that drew mentioned earlier. It’s intuitive. We finally renamed tests.
They’re not schema or data anymore. They’re definitely not bespoke and configs that work in the places you expect them to work. It’s extensible global macros have been reorganized. They’ve also been made easier to reimplement that’s I think has saved some folks, a couple of thousand lines of code, which is nothing.
And it’s [00:20:00] maintainable. We split apart plugins for database adopters. The RPC server, the new server will also live separately from dbt Core. We want core to be focused and we want to lower the barriers to contribution in all of those other repositories. You shouldn’t need a working big query connection to contribute to the Redshift adapter.
And in fact, we want to make contribution going forward as easy and as welcoming as possible. So that’s foundations. That’s great. Nice job, Jeremy boring stuff. No, there isn’t some new stuff too. There’s metrics. You just got to hear a whole lot about them. There’s also extensions on existing functionality, like result-based selection being able to take a run that just failed with a couple of models with errors or tests with failures and pick just those to rerun.
So artifact based selection, dbt, and it has a whole new look and feel. I call out both of those with asterisks because. Community and colleague contributions by folks who do not work on dbt Core for their day job [00:21:00] metrics. I want to call out because we’re, we are going to be finding the balance in the future between this stable shared foundation and continuing to find new ways to innovate.
I think a lot of that innovation is going to come in the ecosystem. It’s going to come in that biggest concentric circle that is DV. Just in the ways drew is showing you it’s going to come from other tools, open source, and otherwise that integrate with metrics. Now that they’ve been defined in dbt Core, I’m excited for tools like dash, like metric UL, like all of the others that have already been built on top of dbt to make the most of what defining metrics in dbt code is going to.
So that’s, what’s in V1. What’s in the v101, the 1.1 going forward, patch releases are going to be bug fixes only. You don’t even have to think about [00:22:00] it. It’s just 1.0, latest fixes included. Minor releases are going to be. Let’s say every six to 12 weeks, you can expect an official support period for minor versions that lasts for one year, from the day of their release.
So 1.0 was released on December 3rd, 2021. The end of its official support will be December 3rd, 2022. And we are going to keep putting out patch releases for critical fixes security, bugs, regressions, anything throughout the entirety of that time.
I want upgrading to feel the way it should feel good. Your code still works. The packages you need are available and compatible. You always get the latest and greatest in dbt Cloud. As I mentioned on the previous slide, it’s just going to look like v1.0 latest. Get the fixes for free automatically. [00:23:00] Now I’d be remiss if I didn’t include a slide on it that had the words dbt V2 on it.
I don’t know. We’ve got some big ideas and there are going to be things along the way that we need to rework. So in the same way that we’re going to do a lot of net new stuff, we’re going to keep innovating. We are also going to rework a lot of things behind the scenes, under the hood to always give you the most stable foundation to give analytics engineering and the modern data ecosystem, the stable foundation that it deserves.
[00:23:42] When is dbt-core v2? #
[00:23:42] Jeremy Cohen: So when’s dbt Core V2. This is process of elimination. Don’t take my word for it. Absolutely. I don’t think it’s gonna be next year. I don’t think it’s gonna be another five years. So two to four years. And until then, there’s plenty of good stuff to work on. [00:24:00] There’s all the ideas that I showed earlier in the screenshot of most commented, most reacted issues.
I would especially encourage you to take a look at some of what’s on this slide. Some of what is in the link that I think Andrew is going to drop in the chat, which are two, some get hub discussions, a thing we’re trying for the very first. I want dbt to be able to do all of these things, but in some sense, this is still thinking pretty small.
[00:24:31] Dream with me #
[00:24:31] Jeremy Cohen: This is a pretty linear step from dbt, as it works today.
Let’s think big. And to do that, we have to get in the right mood. You can close your eyes if you’d like, listen to me. Throw some ideas out there. What if in dbt V2, dbt, sequel had all the same [00:25:00] capabilities as it does today? No Jinja, what if the docs were always ready? No separate command ducks generate just the men data available and all the ways you would expect.
One dbt could run across many databases and query. Not just, redshifts not just the query, but Redshift and spark and Presto. And who knows what else? What if you could define your own tasks for the dbt DAG? Another way to put this is what if your run operations could do concurrency over threads could use node selection, syntax.
I’m not saying any of these things are going to be. I’m not saying this is what dbt V2 is going to look like. There’s a tremendous amount of possibility in our near and far future. With no further ado, except one very important thing, [00:26:00] which is to give a massive shout out. And thank you to the team of people who made this happen.
This is the core team at dbt Labs, the engineering team. Rebuilt dbt Core from the ground up. When I say it is now a framework of marble and it’s because of the capable and able hands of these people, we’ve got Kyle on bass guitar. We’ve got Gerta on drums, Nate on the keys, Ian on vocals, Emily on a second guitar, maybe Matt on them.
I’m running out of instruments that, that I really know Cahone. And then Leah, Tour manager, the person who made it all happen over the last several months. Thank you. Thank you. Thank you to every single one of you and to all of our colleagues at dbt Labs and to any person, any community member who contributed a PR ,opened an issue, asked a question, answered a question, wrote about your experience, help some [00:27:00] other person, some other lucky soul get started with dbt for the very first time.
That is to say every single one. Thank you. This is your celebration too.
Last modified on: Nov 22, 2023