/ /
The era of open data infrastructure

The era of open data infrastructure

Ryan Segar

on Oct 13, 2025

For years, we have treated data like a promise that somehow never quite arrives. We built warehouses, lakes, and then a “modern data stack,” each step giving us new power but also exposing new seams. Then came the big platforms, offering simplicity but introducing lock-in. The net result is an uncomfortable truth: enterprises still put only around a third of their data to work, while roughly half remains totally dark; collected, stored, and never used. That is not a rounding error. It is the ceiling on what your models can know, the drag on every digital initiative, and the reason AI pilots stall before they scale.

Two forces make this the right moment to break that ceiling. First, AI has moved from novelty to necessity. It now sits in decision paths where error is costly and transparency is non-negotiable. Second, Apache Iceberg has matured from a great idea into the neutral substrate enterprises needed all along. Iceberg lets data live once and work everywhere by decoupling storage from compute, enabling safe schema evolution and ACID reliability across engines and catalogs. This is not hypothetical alignment: Snowflake open-sourced the Polaris Catalog for Iceberg under Apache 2.0, and Databricks announced full Iceberg support governed in Unity Catalog. The industry has, in effect, agreed on the table standard AI can trust.

Why this is the moment for open data infrastructure

This is the context for the merger of dbt Labs and Fivetran. We are not combining to create another all-in-one platform. We are unifying the backbone of a new category: open data infrastructure. The mandate is straightforward: movement, transformation, metadata, and activation must operate as a managed continuum. The fabric must be open by design, across engines, catalogs, BI tools, and LLMs, and it must deliver reliability you can measure. When that fabric is built on Iceberg, AI finally sits on a governed, portable foundation.

Consider what changes, concretely, when you put these pieces together. A global retailer uses Fivetran to land operational events from hundreds of SaaS apps and OLTP systems into Iceberg tables with service-level guarantees on delivery and uptime. dbt orchestrates contracts, tests, and transformations that compile into governed models and a semantic layer. The same definitions feed Tableau and Power BI, but also AI copilots that answer “What is net revenue by cohort for the last two weeks?” and use lineage to show exactly which inputs were read and when. If a source schema shifts at noon, Fivetran’s CDC and state-aware syncs update only what changed, dbt’s tests fail fast, and the copilot offers a patch PR with a clear diff. The answer at 12:05 is both fresh and explainable. None of this requires a re-platform when you decide to run the same workloads on Snowflake today and Databricks tomorrow, or to share a governed slice via Delta Sharing with a partner while your internal teams query the same Iceberg tables from Trino. This is portability without penalty, and it rewires what “time to trustworthy answer” means inside any organization.

Or take a bank that must prove how a risk model was trained. Iceberg snapshots give you time-consistent data slices, dbt’s lineage and documentation preserve feature provenance, activation from Fivetran (Census) pushes validated aggregates to downstream apps. When an auditor asks “Which version of the ‘exposure’ metric did the model ingest on March 31?” the system can reproduce it, because governance travels with the data. Unity and Polaris interoperate with the same Iceberg tables so the bank’s mixed compute estate is a feature, not a liability. The same architecture supports low-latency agents that need concurrency spikes, because you can scale compute independently of storage while keeping semantics stable.

This is also a moment of standards. AI needs a clean, universal way to connect to governed data, tools, and workflows. The Model Context Protocol (MCP) is emerging as a practical standard for that as a “USB-C for AI apps,” and it is already gathering real traction in the ecosystem. A semantic layer that exports contract-backed metrics and lineage into MCP gives agents the context to be right and the paper trail to be trusted.

By combining dbt and Fivetran’s complementary solutions, the transaction will create a more complete solution for data workflows that will better serve customer needs.Fivetran created the standard for automated data movement at scale, now with a published 99.9% uptime SLA for core services and data delivery. It pairs high-volume database replication methods including log-based CDC from its HVR heritage with hundreds of managed connectors so operational change is captured with low toil and predictable freshness. dbt created the standard for analytics engineering and made transformation reproducible, testable, and explainable; its semantic layer turns metric drift from an inevitability into an anti-pattern. Together, after closing, these will become one fabric that is provably reliable and measurably portable.

Open by design

Because clarity matters during transitions, let me state our commitments in unambiguous terms. dbt Core remains open source under its current license and will continue to be supported indefinitely. The dbt Fusion engine remains source-available under its current license and will continue to be supported indefinitely. We published these commitments publicly, and we are standing by them. These technologies are not just components in our architecture; they are flagships of the open movement and essential to the portability customers demand.

Why insist on openness when a single platform could make the diagrams look neat? Because the market math is unforgiving. AI evolves faster than any one vendor’s roadmap. Your workloads will span warehouses, lakehouses, and specialized engines. Your models will include hosted LLMs and open-weight models that you fine-tune. Your governance surface will extend across Unity, Polaris, Glue, and whatever comes next. Open data infrastructure assumes this heterogeneity and turns it into a strength. Iceberg makes the storage layer common. Catalogs and sharing protocols make discovery and access universal. Movement, transformation, and semantics make the data verifiably right, no matter which engine reads it.

If you want a simple yardstick to hold us to, use these. How long from source change to governed, queryable metric or feature, with lineage intact? How often are freshness SLOs met without brute-force recompute? How consistently does a metric return the same value in BI and in an AI copilot? How easily can you validate the same Iceberg tables across two engines and two catalogs without copying data? When these numbers move in the right direction, your 32 percent data utilization becomes 40, then 60, and the dark half of your estate starts to light up.

Some will ask whether we are replacing one center of gravity with another. The answer is no. We are providing the roads: a fully managed, open, reliable infrastructure that makes choice practical and safe. Until our transaction closes, it is business as usual for customers, no changes to your contracts or support, and the work of unifying this fabric will proceed in the open, with the community that got us here. After close, our responsibility is to keep the roads smooth, the standards open, and the service levels high so that AI can scale on something worthy of your ambitions.

The era of open data infrastructure has arrived. AI finally has the foundation it needs. Iceberg is the neutral substrate. dbt and Fivetran are the living system on top that moves, shapes, explains, and activates your data with the reliability an enterprise can sign its name to. Most of the world’s data has been asleep. It is time to wake it, and to do it in a way that you control.

Published on: Oct 13, 2025

Rewrite the rules. Redefine what’s possible.

Join the premier conference where data leaders shape the future of data & AI. Stream Coalesce Online FREE next week.

Set your organization up for success. Read the business case guide to accelerate time to value with dbt.

Read now

VS Code Extension

The free dbt VS Code extension is the best way to develop locally in dbt.

Share this article
The dbt Community

Join the largest community shaping data

The dbt Community is your gateway to best practices, innovation, and direct collaboration with thousands of data leaders and AI practitioners worldwide. Ask questions, share insights, and build better with the experts.

100,000+active members
50k+teams using dbt weekly
50+Community meetups