How data transformation platforms reduce data chaos

11 minute read

In large and complex data systems, it’s easy to lose track of what is going on. Many teams struggle to find data, understand its purpose, or figure out who changed what. If that sounds like your team, you can benefit from a data transformation platform. In this article, we’ll look at what a data transformation platform is, how it benefits data teams, and how to get started building one.

What is a data transformation platform?

A data transformation platform is a system that handles the movement and structure of data within a data warehouse. It loads data from databases into useful tables, transforms them into new tables, and connects the resulting objects to tools that drive data use cases. The connections between tables and transformations are documented on the data platform so that the structure and dependencies of pipelines are accessible and transparent.

Data transformation platforms sit in between data storage and users in an organization. Analytics engineers work on the platform, managing and optimizing queries for tables, cleaning data sets, preparing meaningful data objects, and more. The platform is like a data hosting service that also organizes and automates the transformations that form the backbone of data systems.

Problems with the modern data stack

The modern data stack is a cloud-based architecture that has emerged as a set of solutions to natural developments in data architecture as an organization grows. Small organizations begin with ad hoc data pipelines to manage the limited amounts of data they produce and access. Analysts download raw data into CSVs and develop analytical reports with spreadsheet tools.

A spreadsheet-based data pipeline can work well enough in the beginning. However, as data volume grows, the process's limitations become apparent. To get past these limitations, many organizations move their data operations onto the cloud. Cloud storage is centralized, easily scalable, and manages network usage with connection pooling. Alongside this shift in storage, tools for on-cloud development emerged to streamline cloud-based data development.

The network of cloud-based data development tools – Looker, Redshift, Fivetran, etc. – that emerged is the “modern data stack.” These analytics products are designed to take advantage of cloud computing and connect with each other via SQL queries.

However, the modern data stack comes with its own host of problems. Data is hosted but not organized or discoverable due to a lack of data governance. Query structures are developed and deployed, but documentation is poor, dependencies are opaque, and updates are chaotic due to a lack of version control. The stack is valuable, but complexity at scale gets in the way.

Top three benefits of data transformation platforms

Data transformation platforms support the modern data stack by organizing and automating data development. The transformations that form the backbone of data systems happen on the platform rather than being attached to individual projects. The tools that utilize the developed transformations connect to the platform, where they process and store meaningful data.

This centralized platform standardizes data operations since everyone works from the same baseline of well-developed datasets. Standardization aligns metric definitions, preventing conflicts between different definitions, usages, and calculations of similarly-named metrics. Governance systems can hook into a single platform, maintaining privacy and security standards across the organization.

Data transformation platforms also manage query pipelines, enforcing access rules and tracking dependencies. Data engineers and analytics engineers can schedule queries so that developed data sets are always kept up-to-date. They can also automate tests for data quality, consistency, and formatting, guaranteeing stability while streamlining development timelines.

This approach yields numerous benefits:

  1. Having a data transformation platform saves time on repetitive and routine tasks, opening up data teams' schedules for more value-driven projects.
  2. Having a standardized, documented baseline of transformation and tables improves trust, allowing for effective collaboration and avoiding blame games when problems arise.
  3. Accessible and prepared datasets allow non-data teams to self-serve answers to their most burning data-driven questions, enabling data teams to focus on building out useful infrastructure rather than struggling under a backlog of data pipeline requests.

How dbt does data transformation

dbt Cloud is a data transformation platform built for cloud systems, designed to enable safe, trusted collaboration. Over 100,000 data engineers use dbt to support their organizations' data development.

dbt allows transformation logic to be defined as code, enabling more readable, modular logic. When pipelines are built as code bases, teams can manage documentation and version control through standard systems like GitHub. This enables applying the same rigorous controls to data that software engineering teams have applied for years to software.

dbt Cloud connects dbt pipelines to powerful cloud-based tools like dbt Mesh for project organization, a semantic layer for standardized metrics and data access, and much more.

dbt’s design philosophy allows organizations to treat data as a product, accelerating engineering cycles with support for CI/CD in data development. With dbt’s support, data teams can keep up with the accelerated product cycles of cloud-based software development. Data teams become proactive contributors to projects rather than bottlenecks in update cycles.

How two enterprise companies transformed their data with dbt

Condé Nast

Like many large enterprises, media company Condé Nast stored its data in siloed systems. Its complex data architecture prevented collaboration, led to a mistrust of data across teams, and prevented the organization from scaling. Seeking to increase collaboration among its data teams and simplify its data architecture, Condé Nast tested dbt by installing it on a few company laptops and connecting it to Databricks Lakehouse. Within one week, they were certain they had found the ideal solution.

Condé Nast’s teams now access data on Evergreen, a platform built on Databricks and AWS. All data consumers work from the same Silver (analytics-ready) and Gold (business-ready) data sets — built with dbt — which they access through Databricks Lakehouse. Seamless integration with dbt Cloud enables data warehouse engineers to build data models quickly for analytics, machine learning applications, and reporting. Data scientists can pull data transformed with dbt to build better machine learning use cases for personalizing Condé Nast’s products in advertising, consumer experiences, and content recommendations. They can then store this data in the lakehouse, where it remains available to the entire enterprise.

Since going live on Evergreen, Condé Nast has built 85 dbt models and counting across domains such as subscriptions, consumer, content, and commerce. The company has saved 16 hours per sprint on data integration between the data warehousing and data engineering teams. The new platform has enabled a 30% increase in self-service among data warehousing engineers.

“Now that we have an end-to-end view of our data models, we can catch problems before they reach a business user and we hear complaints,” explained Nana Essuman, Senior Director of Data Engineering at Condé Nast. “These business users now have a much greater trust in the integrity of our data.”

JetBlue

JetBlue’s data infrastructure is managed by a central data team. As data volumes have grown exponentially, the centralized data team structure began to reach its limits. The team needed a setup that eliminated the current data engineering bottleneck and gave data analysts and consumers at JetBlue a bigger role to play in owning their own data sets.

Enter dbt and Snowflake, the perfect fit for helping JetBlue adopt a more modern, collaborative data workflow. Together, these tools enabled the team to overcome obstacles familiar to any data team, as well as issues specific to a large airline.

“Airlines can succeed or fail based on how they respond in a snowstorm,” said Ben Singleton, Director of Data Science & Analytics at JetBlue. “JetBlue’s data infrastructure not only needs to help business leaders make long-term decisions, it needs to help crewmembers in system operations decide whether or not to keep the doors open an extra few minutes to accommodate a delayed connecting flight. These are high-pressure situations that require real-time data.”

“People might think of dbt as a tool that only supports batch workflows which update a few times a day or once daily, but dbt can be used for any kind of data transformation in your warehouse, including real-time use cases,” Ben said. To pull this off, the team used dbt to implement lambda views, which union historical data with the most current view of the data in Snowflake.

Lambda views provide real-time access to flight information, bookings, ticketing, and check-ins, among several other data sources–the data that ultimately makes or breaks the customer experience. “Delays in operational data result in suboptimal decisions, and one suboptimal decision across a network of a thousand flights a day can disproportionally affect the customer experience and be expensive,” Ben said. “Traveling can be stressful, we want to be really smart about using data to improve that experience.”

Creating a data transformation platform

Data transformation platforms are a natural evolution of the so-called modern data stack, which itself emerged as a natural extension of small-scale data operations. They connect data sources and data products, manage transformation logic, and govern data access while streamlining and organizing development processes. With a data transformation platform, your organization’s data development can proactively contribute to project value.

dbt Cloud is a premier data transformation platform used by hundreds of thousands of developers worldwide. Its cutting-edge features let you treat data as a product, enabling data teams to keep up with modern development cycles.

Start building with dbt Cloud

Streamline your data transformation process, reduce manual errors, and increase productivity with dbt Cloud. Sign up today and take your data transformation workflow to the next level.