Condé Nast serves up multimedia content on a global scale
This is the story of how Condé Nast unifies global teams on Databricks Lakehouse and dbt Cloud
Increase in self-service
for data warehouse engineers
dbt models built
on an evergreen platform across various domains
per data integration sprint project
Multimedia offerings delight users around the globe
Condé Nast strives to create the highest quality content across its digital, social, video, and print channels. The company has 1 billion fans, 435 million social followers, and 75 million monthly print readers. This activity has generated 3.6 petabytes of data — but Condé Nast is just getting started. The company recently aimed to intensify its focus on digital channels and extend its global reach but realized its data architecture wasn’t up to the task.
Like many large enterprises, Condé Nast stored its data in siloed systems. Five different data sources integrated with the company’s then query engine, Presto. Data engineers ran ETL jobs and processes on Databricks Lakehouse and stored the data in Amazon S3. They also created tables in Databricks and pointed them to the storage layer in AWS S3. The data warehousing team used Informatica to build data models, stored the results in S3, and worked with data engineers to point that data set back towards Presto so that teams could access it in data queries.
“We could see that our data architecture was far too complex,” explained Nana Essuman, Senior Director of Data Engineering at Condé Nast. “It prevented collaboration and led to an overall mistrust of data across our data engineering, data science, and data warehouse teams. Even more significantly, it was preventing us from scaling to the degree that we needed as we prepared to become a truly global organization.”
Simplified data architecture reduces reliance on data engineers
Seeking to increase collaboration among its data teams and simplify its data architecture, Condé Nast tested dbt by installing it on a few company laptops and connecting it to Databricks Lakehouse. Within one week, Essuman was certain he had found the ideal solution.
“It was a very easy pitch to convince my VP that using dbt Cloud alongside Databricks Lakehouse was the right move for us,” recalled Essuman. “We could see how simple it was to integrate dbt into our other systems. And dbt had just launched dbt Cloud, which fit perfectly into our organizational strategy toward greater scalability.”
dbt Labs worked closely with Condé Nast to ensure a smooth integration process and help the company’s data teams develop more quickly on the new platform. Today, teams across the company’s three geographic regions access data on Evergreen, a platform built on Databricks and AWS. All data consumers now work from the same Silver (analytics-ready) and Gold (business-ready) data sets — built with dbt — which they access through Databricks Lakehouse.
Condé Nast enhanced teams’ access to data by using Databricks to build reusable data ingestion frameworks for the company’s four main data sources. Seamless integration with dbt Cloud enables data warehouse engineers to build data models quickly for analytics, machine learning applications, and reporting. Data scientists can pull data transformed with dbt to build better machine learning use cases for personalizing Condé Nast’s products in advertising, consumer experiences, and content recommendations. They can then store this data in the lakehouse, where it remains available to the entire enterprise.
“With dbt and Databricks, our data scientists who build personalization models and churn models are finally using the same data sets that our marketers and analysts use for activation and business insights,” reported Essuman. “This has dramatically increased our productivity while decreasing dependency on data engineers. It’s also much easier to monitor and control the costs of our entire data infrastructure because it’s all running on one platform.”
Self-service increases by 30% among data warehousing engineers
Since going live on Evergreen, Condé Nast has built 85 dbt models and counting across domains such as subscriptions, consumer, content, and commerce. The company has saved 16 hours per sprint on data integration between the data warehousing and data engineering teams. The new platform has enabled a 30% increase in self-service among data warehousing engineers.
“During our data integration sprints in the past, a data warehouse engineer would have to build a data set, store it in S3, and then work with a data engineer to make that table available in Presto,” explained Essuman. “Evergreen (Databricks and AWS) plus dbt Cloud has eliminated this work and is saving us 16 hours per project.”
Condé Nast has saved even more time by building reusable, centralized macros across its dbt instance. And the company can see the lineage of its data models from its Silver-level tables all the way to its BI tools that consume the data. As a result, it’s important for the company to test the data at various points using dbt tests to ensure it is improving the trust and quality of its data sets.
“Now that we have an end-to-end view of our data models, we can catch problems before they reach a business user and we hear complaints,” concluded Essuman. “These business users now have a much greater trust in the integrity of our data. None of this would have been possible without the solutions from dbt Labs and Databricks and the talent and hard work of our data engineering and data warehouse teams here at Condé Nast.”