What is open data infrastructure? How is it different from the modern data stack?

on Oct 13, 2025

Today, we announced that we are merging with Fivetran. In my blog, I shared the motivation behind this merger and our shared vision for building an open data infrastructure. But what exactly does that term mean…and why does it matter?

In this post, I’ll dive deeper into what an open data infrastructure is, why it’s the right evolution of the “modern data stack” in the age of Iceberg and AI, and how dbt Labs and Fivetran together plan to make that vision real for data teams everywhere.

The modern data stack solved some problems, and created new ones

I hate neologisms for the sake of neologisms. No one needs a tech company to introduce new terms of art purely for marketing. If we’re going to use this phrase it’s because it actually means something. So let’s put it to the test. And let’s start with the term “modern data stack” (MDS): what it meant and how it’s failing to scale in the new era.

Data practitioners are widely familiar with the term “modern data stack” at this point. The cloud, and the MDS, transformed data over the past decade. Previously, data had been slow, clunky, and expensive. Its success stories made for great headlines, but in the trenches things moved very slowly, cost too much, and working in the field was not particularly…fun.

The MDS changed this. This practitioner-focused tooling was lightweight but production-grade. It allowed users to move fast, with SQL as the primary standard, and bring the best practices of software engineering and DevOps into data at scale for the first time. A once-sleepy profession, the data engineer, and a brand new one, the analytics engineer, became the focus of innovation for a fast-moving ecosystem.

As the MDS gathered steam, vendors popped up to solve every conceivable problem, and customers wrestled with constructing end-to-end solutions from a dozen or more tools. More time was spent debating what tools to use and how to integrate them than was spent in actually working towards business goals. While standards helped with interoperability, they couldn’t solve everything. In particular, complex problems like data quality, governance, and metadata management never seemed to quite get solved.

The rise of “all-in-one” data platforms

As a result, customers became frustrated with the tool-integration challenges and the inability to solve the larger, cross-domain problems. Customers began demanding more integrated solutions—asking their existing vendors to “do more” and leave in-house teams to solve fewer integration challenges themselves. Vendors saw this as an opportunity to grow into new areas and extend their footprints into new categories. This is neither inherently good nor bad. End-to-end solutions can drive cleaner integration, better user experience, and lower cost. But they can also limit user choice, create vendor lock-in, and drive up costs. The devil is in the details.

In particular, the data industry has, during the cloud era, been dominated by five huge players, each with well over $1 billion in annual revenue: Databricks, Snowflake, Google Cloud, AWS, and Microsoft Azure. Each of these five players started out by building an analytical compute engine, storage, and a metadata catalog. But over the last five years as the MDS story has played out, each of their customers has asked them to “do more.” And they have responded. Each of these five players now includes solutions across the entire stack: ingestion, transformation, notebooks and BI, orchestration, and more. They have now effectively become “all-in-one data platforms”—bring data, and do everything within their ecosystem.

These platforms often do a decent job of delivering on the integrated vision—there is typically less duct tape required than in a traditional do-it-yourself MDS solution. But there are tradeoffs:

Costs can be high and are difficult to control because all negotiations are with a single vendor who is both authoring workloads and charging for compute.
Customer choice is restricted. Different compute engines are actually good at different things, and going “off platform” is hard when you’ve made the intentional decision to bring everything to one of these tools.
Internal collaboration is harmed. Data organizations become walled gardens, where some teams are on Platform A while some are on Platform B and they have a hard time collaborating.

So: customers are faced with a hard choice. The fragmentation and duct tape that often existed with the MDS? Or the high cost and restricted choice that come with the “all-in-one data platforms”?

Open data infrastructure as the path forward

This is the context in which we introduce the term “open data infrastructure”. Open data infrastructure describes an infrastructure that is pluggable, relies on integration via standards, does not assume the usage of any one particular compute engine, and does not assume that solutions will be duct taped together from many individual products and vendors.

Here’s what open data infrastructure will look like:

dbt and Fivetran together to define the future of data

At dbt Labs and at Fivetran, we’ve both been building towards this vision for a long time. But in order to deliver a complete open data infrastructure, it will require both of us, together. This is what we have heard customers ask us for, over and over again, over the past several years:

Help make it easy for me to deliver best-in-class data infrastructure.
Help me save costs on my compute bills.
Deliver end-to-end capabilities like governance that are hard to do piecemeal.
Help me preserve optionality with my choice of analytical compute.

These are the promises of open data infrastructure. Much of this we can deliver today, but of course, there is work still to do.

The growth of AI over the last few years makes delivering on open data infrastructure only more critical. For customers building new AI capabilities:

You need to have reliable, high quality data, but also centralized and high-quality metadata that describes everything about it.
You need to make sure that you can access your data directly from any model you choose, without having your access to your own data locked up behind what partnerships your all-in-one data platform vendor has at the moment.
You need to have your data exposed by open standards like MCP to quickly integrate with the fast-moving AI tooling ecosystem.
You need the ability to seamlessly use AI-native analytical compute engines that deliver the types of latency and concurrency that AI systems require.

Open data infrastructure accomplishes all of this without asking customers to do it all themselves. This is why we’re excited about it, and why we hope you will be too.

Live virtual event:

Experience the dbt Fusion engine with Tristan Handy and Elias DeFaria.

Save your seat

VS Code Extension

The free dbt VS Code extension is the best way to develop locally in dbt.

Install free extension

Latest posts

Press3 min

dbt Labs Expands dbt Fusion Engine Ecosystem with Microsoft Fabric Integration

Elaine Green

on Nov 18, 2025

Learn16 min

Modeling for success: Building data structures that last

Kathryn Chubb

on Nov 18, 2025

Learn16 min

Talk to your data: AI-powered conversational analytics with the dbt MCP server

Kathryn Chubb

on Nov 13, 2025

The dbt Community

Join the largest community shaping data

The dbt Community is your gateway to best practices, innovation, and direct collaboration with thousands of data leaders and AI practitioners worldwide. Ask questions, share insights, and build better with the experts.

Join the CommunityExplore the community

100,000+active members

50k+teams using dbt weekly

50+Community meetups