Blog How we think about dbt Core and dbt Cloud

How we think about dbt Core and dbt Cloud

Jason Ganz,Jeremy Cohen

Apr 02, 2024

How To

Regular readers of this blog will know that dbt Labs maintains two distinct and complementary pieces of software:

  1. dbt Core is the open source framework that is essential to the work we do, and to our mission of enabling data practitioners to create and disseminate organizational knowledge.
  2. dbt Cloud is a commercial product that extends, operationalizes, and simplifies dbt at scale for thousands of companies. It is state-aware and metadata-rich, powering differentiated experiences across an accessible UI, an extensible CLI, and enterprise-grade APIs.

As these two projects have progressed over time, we (Jeremy and Jason) have heard from community members who want to better understand what features go where and why. Our hope in writing this post is to offer stable long-term guidance — to the dbt Community, to open source maintainers and contributors, and to practitioners considering how they will deploy dbt — about our criteria for deciding what functionality goes into dbt Core and what into dbt Cloud. We want you to feel confident building your analytics stack, your everyday workflows, and your career around dbt.

This post is not the announcement of any change. It’s the distillation of a strategy that we’ve been pursuing for 4+ years.

First things first: Our commitment to open source is not changing. dbt Core is and will remain licensed under Apache 2.0. Many more people today work full-time contributing open source code to dbt Core than in its early days, and far more people are using dbt Core today than ever before. dbt has become a standard for the industry, and we’re not done extending that standard in important ways.

We’re also going to continue investing in making dbt Cloud a world-class product. dbt Cloud allows us to take the vision of dbt so much further, and to so many more people, than would be possible otherwise.

It’s worth stating clearly: Building a commercial business around dbt is essential to the long-term sustainability of dbt Labs, and to the sustainability of dbt Core as a well-maintained open source standard.

What goes where?

dbt is the standard for data transformation on cloud-first data platforms. We are serious about treating it as a standard: We believe everyone should work in the way that dbt makes possible. We believe that the work gets easier as more people are doing it, as they engage with the dbt Community and share their hard-won wisdom with the world.

To that end, whenever there is an opportunity to standardize how people are defining, executing, testing, and describing their transformations in a data platform, we believe this functionality belongs in open-source dbt Core. This is the basic workflow of analytics engineering. As we built out the standard, over the better part of a decade, we’ve put a lot of that functionality to dbt Core.

At the same time, we believe dbt Cloud should be the easiest way to get started using dbt, and the best way to deploy dbt at scale. It should power differentiated experiences that are best built and delivered in stateful, scalable, cloud-first ways. Over the past year, dbt Cloud has delivered on these promises with step-change improvements to development, discovery, and collaboration.

At this point, it is our explicit goal to build a superior end-to-end experience for customers of dbt Cloud, while preserving dbt Core’s place as the definitive standard for data transformation. Finding that balance, and finding ways to communicate it clearly, has required us to develop heuristics and to test them in practice.

dbt Core (OSS) dbt Cloud (Commercial)
Language specification and basic workflows. Includes standardized specification across data platforms Fully managed SaaS platform, backed by cloud architecture
Maintained by dbt Labs, licensed permissively, with community contributions Built and supported by dbt Labs with SLAs
Develop via local file system + CLI Development backed by remote execution - via local CLI, in-browser IDE, or other future experiences
Stateless + static State-aware, with up-to-date metadata
Supports individuals, smaller teams, and less-complex deployments Scales seamlessly to larger teams and enterprise complexity

These heuristics are not new, though this our first public post describing them in detail. They’re not carved in stone, but they have remained stable over 4+ years of internal discussions.

Paraphrasing our pal Benn, a company shouldn’t just talk about what its values are; it ought to show them in the decisions that count. We present three “case studies” over the past year that show how we think about our open source and commercial offerings.

Case study 1: dbt Explorer

Last October, we launched dbt Explorer as a new platform for discovering, understanding, and optimizing your organization’s data assets, across one project or many.

Explorer contains a lot of things, and we’re constantly adding more:

  • Properties and descriptions of production dbt assets (models, sources, etc)
  • Opinionated recommendations about dbt best practices, such as description + test coverage
  • Aggregations over historical runs, to analyze model execution timing + test failure rates
  • Lineage visualizations at multiple levels:
    • Node-level
    • Column-level
    • Project-level (including role-based awareness of who can see what)

Explorer is powered by many metadata inputs — including the model and column descriptions that you define within your dbt projects.

What’s going on here?

There are now many products — built by members of the community, by dbt Labs, and by other companies — that leverage and extend the metadata defined in dbt projects, following the standard spec in dbt Core, in order to deliver an enhanced experience. dbt Explorer is one such product.

Where do we draw the line between “standard spec” and “enhanced experience”? We strongly believe that everyone should define descriptions on their dbt models right alongside their transformation logic, in version-controlled code. This means analytics code is documented in one place, and that documentation is updated along with the code it’s documenting, as part of the same code-review flow. This fulfills a key tenet of the original dbt viewpoint.

On the other hand, the mechanism for viewing, navigating, and accessing this metadata is not a thing that must be standardized. Put another way, it would be bad if you had to define a dbt model’s description in multiple places — but we expect and encourage multiple ways and places to interact with that metadata. Define once, access everywhere.

Back when we added description to the dbt standard, we also released dbt Docs, under an open source license, as a lightweight way to visualize that project metadata. dbt Docs fell a bit outside the core (pun intended) functionality of dbt, and with its limited functionality, it left a lot to be desired. At the same time, it has motivated tens of thousands of people to describe their dbt models, to visually consider their DAGs, and to share the fruits of their labor with countless more people.

When we set out to build a next-generation experience, we decided it would need a scalable UI powered by highly available APIs. We needed to rebuild this on real cloud architecture, not static-website-plus-big-JSON-file architecture. That architecture would need the flexibility to blend historical and up-to-date metadata, logical and applied state. It needed to support end-to-end lineage, across multiple projects, with role-based access baked in from the start — that is to say, enterprise complexity.

For all these reasons, dbt Cloud was the right place to build dbt Explorer.

Case study 2: dbt Mesh

dbt Mesh is a pattern for collaborating across projects and teams, enabled by a suite of new “model governance” constructs (groups, access, contracts, versions) and the ability to resolve references across dbt projects.

We chose to build a fast and scalable service for resolving cross-project references as a feature of dbt Cloud, while the rest are capabilities in OSS dbt Core.

What’s going on here?

Model contracts, versions, groups, owners, and access levels are entirely new constructs of the core language. They empower teams of any size to treat their models as stable interfaces, and the mechanism for defining those interfaces is part of the dbt standard (one that works across data platforms).

So what about cross-project ref? Resolving references to models in other projects has long been supported in open source dbt Core — by installing upstream projects as packages. We also improved dbt Core’s mechanisms for modelvariable, and macro namespacing, to improve the scalability of this approach. What our customers really needed, and what we built, was a strictly better mechanism for resolving cross-project model references — powered by a state-aware metadata service within dbt Cloud — which enables developers in downstream projects to load up just the needed context about public models in upstream projects.

We made that strictly-better mechanism part of the commercial offering, to support organizations with multiple teams collaborating on dbt. This service has enabled the dbt Mesh pattern for our largest customers, and it’s one of many metadata-rich services in the dbt Cloud platform that scales to enterprise complexity.

Case Study 3: Unit testing

This is a big, juicy, much-discussed feature that we anticipate will drive a ton of value for data teams. And unit testing is coming to dbt Core this spring.

What’s going on here?

Let’s return to the principle above: “Whenever there is an opportunity to standardize how people are defining, testing, and describing their data transformations, we believe this functionality belongs in open-source dbt Core”.

We came to the conclusion that unit testing is an important part of the standard for testing data transformations because:

  1. Unit testing is an important component of software testing best practices. It has long been dbt’s viewpoint that, whenever possible, analytics workflows should take inspiration from software engineering best practices.
  2. We have heard many times over the years from the dbt Community, at organizations large and small, across industries and use cases, that this is something dbt should support
  3. There are numerous independent Community implementations of unit testing, from practitioners interested in using this themselves. The need isn’t to make it possible, it’s to make it standard.

This combination of factors — strong overlap with software engineering best practices, durable interest from the Community, and multiple independent practitioner implementations of unit testing in dbt — all in area of data work (testing) that we have long committed as a crucial part of the dbt standard, meant that unit testing was a strong candidate to be added to the open source offering.

We look forward to working with the Community to finding more areas like this: big, important problem spaces that are prime candidates for addition to the dbt standard, and the functionality in dbt Core OSS.

How we’ll keep building it

A big new feature is exciting, but even more important is our ongoing commitment to the day-to-day maintenance work of dbt Core. We take its position as a standard seriously. We will continue triaging issues, resolving bugs, and tackling the “paper cuts” that won’t make the marquee but mean better quality of life for the people who use dbt every day. Over the past six months, we’ve also prioritized substantial behind-the-scenes work to decouple and solidify the interfaces between dbt-core and data warehouse adapters — making both easier to develop, test, and maintain going forward.

A commitment to “maintainership” means being intentional about what dbt Core is and ought to be; it is not the same as a commitment to “more.” Going forward, you will see us closing issues and pull requests as “out of scope” for OSS contribution, if they fall outside the purview of dbt Core. Our goal is to communicate as quickly and openly as possible what is out of scope.

Every single member of the open source community stands to benefit from the investments we’re making in dbt Core, which ensure the stability and rigor of the dbt standard, as well as a clearer definition of what’s included in that standard. At the same time, we are making real and substantial investments in dbt Cloud, with experiences that enhance the analytics engineering workflow and make it accessible to more people than ever.

The journey of a mature open source company is figuring out how to make two things true at the same time: supporting a vibrant user community with an ongoing open source roadmap, while also delivering a compelling commercial product that can support the growth of the business. Many companies and open source projects never reach that point; we feel lucky to have the chance to try.

We hope this post gives you more clarity on how we’ve been navigating that journey at dbt Labs — and how we’ll keep doing it, for all that’s still to come.

Last modified on: Apr 03, 2024

Accelerate speed to insight
Democratize data responsibly
Build trust in data across business

Achieve a 194% ROI with dbt Cloud. Access the Total Economic Impact™️ study to learn how. Download now ›

Recent Posts