From dbt Core to dbt Cloud: Why Warner Brothers Discovery made the switch
Nov 27, 2024
LearnWarner Brothers Discovery moves data. A lot of data. And they’ve used dbt Core to manage it for years. However, a sudden spike in data demand revealed multiple weak points in their architecture. Here’s why the company moved from dbt Core to dbt Cloud, how they did it, and the benefits they reaped from the move.
An end-to-end first party data platform
Warner Brothers Discovery already knew the value of its data. To this end, they created a common data platform—a place where all of their businesses warehouse their data.
In addition to this common platform, some of their businesses also had their own respective data platforms. One such is the CNN data platform called Zion, lovingly named after the city in The Matrix, which serves as a first-party news analytics platform.
Zion is a three-layered platform containing a collection layer, an information layer, and a service layer. The collection layer is effectively a complex data ingestion pipeline that runs in AWS. The service layer comprises the actual consumers of the data and associated products, built using a combination of dbt Core and Airflow.
This platform is fully-featured—it sports everything from an SDK to a dashboard—and is designed for resilience and scale. And when they say scale, they mean scale: Zion moves around 100TB of data in a day, can quickly scale 15x, and ingests about a billion events every 24 hours.
A platform of this size presents a lot of problems. Since it’s a news analytics platform, it’s designed to handle breaking news. That means it has to support identity resolution and stitching, behavioral events, profiling, and content metadata. Plus, it has to present data in a way that makes sense to its businesses.
Why dbt?
The two main things Zion powers are WBD’s content analytics and its Machine Learning (ML) recommendation system. The company chose dbt Core to build this system because dbt is modular and automated. dbt Core enables insight into what is being processed, how, and from where. This transparency has enabled WBD to test and validate the decisions they’re making.
Moreover, dbt readily integrated with common engineering practices like WBD’s Continuous Integration and Continuous Development (CI/CD) pipelines, further smoothing adoption. The magnitude of their data was also no problem, requiring no esoteric changes in Snowflake to dynamically change a test model’s provisioning. A simple configuration change was all the team needed.
Lastly, but certainly not least, dbt Core is also very cost-effective—it basically doesn’t cost anything, but it also didn’t cause cost spikes for, e.g., the team’s SQL queries against their data warehouse.
WBD’s journey with dbt
Although they started creating Zion in 2017, WBD adopted dbt Core only in 2021. The company then quickly scaled up its use for CNN+. In 2024, it adopted dbt Cloud to help empower its data stakeholders. In 2025, it plans to leverage it to enable those stakeholders to use the tools and cloud platforms they want to use.
With dbt Core, WBD is running 450 models, with about half of these covered by 1,600 tests. Most of those models run every hour and, even under anomalously high loads on a high-traffic day, only experience about 10 minutes of variance in model run time—instances like when a tanker hits a bridge and people keep rewinding, then rewatching the moment the bridge was struck.
WBD’s architecture is a typical medallion-like architecture, with multiple DAGs supporting multiple time aggregations. This way, each team can have data processed for them in different functional areas.
Why dbt Cloud?
The description above already seems like a big win. If WBD was already so successful with dbt Core, why move to dbt Cloud?
The answer is that, over time, WBD’s data needs grew—often with sudden, huge spikes in demand from unforeseen events such as Queen Elizabeth II’s Funeral. With each of these spikes, the company realized its dbt Core architecture faced multiple challenges.
These challenges included:
Inconsistencies in job performance
Sometimes jobs resulted in unpredictable, unreliable outcomes that also increased costs.
Infrastructure management and scaling
Difficulties implementing distributed processing, load balancing, or horizontal scaling.
Direct dependency on engineering teams for business analytics
WBD had no support for a data mesh architecture, an approach to data engineering in which data domain teams can create, launch, and manage their own data sets as data products. This hobbled grass-roots data efforts. Stakeholders required continuous engineering support to launch their own data products.
How dbt Cloud helped
Moving to dbt Cloud enabled WBD to solve these issues in a variety of ways.
The monolithic nature of their 450 models in dbt Core not only impeded the independence of their stakeholders. It was also challenging to find concepts or semantics. Breaking this into smaller conceptual chunks using dbt Mesh immediately improved this and also made it easier to manage dependencies between projects.
Decentralizing data ownership and management allowed for better scalability by incorporating distributed processing, load balancing, and horizontal scaling. While this improved reliability under higher demand, WBD also implemented fault-tolerance mechanisms such as automated retries and error handling. All in all, the more efficient resource allocation and utilization of cloud services also resulted in lower costs.
WBD faced many challenges at an organizational level with which this move helped:
Increasing stakeholder adoption
The move to dbt Cloud enabled them to provide self-managed projects that are built from a centrally managed platform without the need for constant engineering support.
Enhancing developer onboarding
WBD improved the flow of onboarding engineers and the experience of their analysts by creating robust deployment tooling and project scanning using dbt Cloud’s scheduler, environments, and environment variables.
Support through tooling
While easy deployment is important, ongoing support is also a common source of stress for developers. By reducing the redundancy in their code, WBD could create centrally controlled projects with common macros to solve common problems.
Standardize observability and alerting
By utilizing dbt Cloud’s webhooks, the team could send Slack notifications directly to the engineering team to enable them to mitigate problems as soon as possible.
How WBD migrated to dbt Cloud
Zion is a large and important platform. WBD aimed to minimize the impact to the existing platform while maximizing the effectiveness of the migration result.
To this effect, the team performed the migration in steps:
- Create a POC with something small and easy to understand, with minimal impact.
- Isolate that process or pipeline from the Zion ecosystem.
- Examine what did and didn’t work.
- Begin building out macros and project tooling.
- Pick the next smallest and easiest-to-understand piece.
- Keep some relation to the first isolated piece so the mesh can be tested, then build up and standardize for observability and maintainability.
- Finish moving the rest of the processes and pipelines.
- Lastly, instrument for observability and maintainability.
Since simplifying the architecture of Zion was also a primary goal of this move to dbt Cloud, WBD took the opportunity to increase the code legibility. Simultaneously, it made the infrastructure easier to use and maintain by implementing Infrastructure as Code (IaC). Using tools like Terraform, they streamlined things with single-command, push-button project deployments.
Another benefit of this architecture is that the data engineering team could swap out the underlying data sources without impacting their stakeholders. They can now run cost metrics and projections and switch to whatever resources are most optimal for a domain team’s use case.
In other words, the domain team doesn’t need to worry about making those decisions themselves. Domain experts can focus on launching products using their domain expertise while the engineering team optimizes their data performance under the hood.
Choosing the right orchestration
WBD found that there were instances where it made sense to continue to use Airflow for orchestration, while there were other instances where dbt Cloud jobs excelled. dbt Cloud jobs worked best for:
Last mile delivery
Models that are primarily concerned with refreshing dashboards or providing endpoints.
Complex run environments
In particular, projects requiring dynamic incremental loads or dynamic variable settings.
Non-standard incremental processes
Processing done over fixed incremental windows.
Backfilling
I.e., processes where data needs to be replayed or rerun for specific periods.
Additionally, WBD used Terraform, Terragrunt, and AWS CodeBuild to reduce the number of deployment scripts and automate tasks to eliminate manual intervention.
Migration benefits
With the move to Infrastructure as Code using dbt Cloud, Warner Brothers Discovery was able to see many gains:
- Launch new projects more quickly using quick-start templates to deploy new projects on dbt Cloud
- Automatically set up alerting and observability
- Integrate new projects with the existing Zion ecosystem using dbt Mesh
- Create fully automated deployments with Terraform, Terragrunt, and Code Build
From the overall transition to dbt Cloud, WBD saw the following positive impacts:
- Implemented a multi-project framework by adopting a data mesh architecture, which improved data organization and made the entire system easier to understand
- Adoption of the platform by their Data Analytics Research and Testing team (DART team uses the platform to explore lineage and develop their own data products)
- Improved run times and cost efficiency - in some cases, by up to 75%
- Enhanced monitoring and alert systems (Slack and switchboard warnings)
- Visibility into migration progress and refined roadmaps
- Improved documentation and lineage maps
As WBD looks to the future, it aims to support multi-compute. Whether it’s Databricks or Snowflake, they want their platform to enable platform-agnostic functionality out of the box. Using dbt Mesh, they can finally accomplish that goal.
Empowering stakeholders, streamlining systems
Moving from dbt Core to dbt Cloud and dbt Mesh enabled Warner Brothers Discovery to reduce costs while increasing the number of domain teams who could create and manage data products without engineering assistance. The result was better stakeholder adoption, higher reliability, and less time spent babysitting data deployments.
Find out how dbt Cloud can accelerate your digital transformation and streamline data operations—contact us for a demo today.
Watch Warner Brothers Discovery’s session at Coalesce to learn more about how they moved from dbt Core to dbt Cloud.
Last modified on: Nov 27, 2024
Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.