DataOps aims to do for the Analytics Data Lifecycle (ADLC) what DevOps has done for the Software Development Lifecycle (SDLC). However, while similar, both processes have key differences in their personas and operations. Here’s what they share, how they diverge, and how to implement DatOps effectively.
What is DevOps?
DevOps is short for Development and Operations. The process was created to tear down a wall that historically existed between software engineers who create software (development) and the IT personnel who deploy, monitor, and maintain new releases (operations).
In the past, development would spend months creating new software releases in isolation from operations. At the end of the cycle, they’d take what they built and “throw it over the wall” to the operations team.
The result was chaos. Developers would use library versions, operating system features, memory and processing power, etc., with little to zero knowledge of what was running in production. That forced operations to spend weeks in a painful back-and-forth with development. Releases were often delayed and bug-ridden.
By contrast, in a typical DevOps lifecycle, deployment and release are viewed as two halves of the same release cycle, with all personas in both Dev and Ops working together as one team. Release cycles are shorter—typically one to two weeks—to enable rapid development, testing, and issue resolution.
DevOps teams rely heavily on version control and automation to facilitate smooth and high-quality releases. All changes are committed to source control so they can be tracked, reviewed, and rolled back if necessary.
The primary vehicle for deployment is a Continuous Integration and Continuous Delivery (CI/CD) pipeline, which deploys and tests software changes from a source control check-in before releasing it to production. Everyone on a DevOps team shares responsibility for keeping the DevOps pipeline healthy and happy.
The DevOps lifecycle
For each deployable change, a DevOps lifecycle iterates rapidly over the following stages:
Plan: Decide which features to implement and how to measure success.
Code: Author the features and all related tests. Obtain a code review from another team member before kicking off the deployment process.
Build: Assemble the components of the system into a deployable package.
Test: Run both automated and manual tests in multiple release environments (dev, test, staging, prod) to assess the quality of the change before release.
Release and deploy: Make the feature available to customers.
Operate and monitor: Observe metrics and logs to ensure smooth operations, firing a notification or alert if critical system values drop below an acceptable threshold.
DevOps personas
A DevOps lifecycle involves, at a minimum, the following personas:
Developer. Software engineers responsible for the technical design, development, and testing of software.
IT administrator/system administrator. Technical personnel responsible for the installation, configuration, monitoring, upkeep, and backup/recovery of technical assets. These include servers, storage (databases, object storage, data lakes, etc.), networks, load balancers, etc.
Systems Reliability Engineer (SRE): A role pioneered by Google in the early 2000s, SREs focus on creating features and automated solutions that enhance the reliability of a product.
Business stakeholder. Business representatives who represent the voice of the customer, providing guidance on which features to develop next.
Benefits of DevOps
Done well, DevOps has numerous benefits. Most of these can be quantified as metrics that organizations can use to track improvements in the SDLC.
Increased deployment frequency. By focusing on smaller work units, teams can ensure that a new feature works—and works well—before moving on to the next one. The tight cooperation between Dev and Ops enables better coordination on each release, leading to smoother releases with fewer obstacles.
Increased deployment quality. By leveraging techniques such as automated testing in CI/CD pipelines and deploying to multiple environments (dev, test, stage, prod, etc.), a DevOps team can find and resolve critical issues before release. That results in higher-quality releases and less system downtime.
Increased scale of deployments. Using automation to improve quality and velocity means that teams can build larger, more complex software systems without drowning themselves in a sea of defects.
What is DataOps?
Like DevOps, which inspired it, DataOps follows a similar approach of integrating Data —data acquisition, transformation, and deployment of new changes—with Operation— observation, metrics and logging, and discovery and analysis.
Releasing new data products means merging and transforming data, usually from multiple sources. So, as in DevOps, DataOps uses version control to track changes to data transformation code and CI/CD automation to ship these changes to production.
The DataOps lifecycle
In DataOps, all personas who work with data work on the same team at each stage of the Analytics Development Lifecycle. The ADLC resembles the SDLC with a few minor changes:
Plan: Decide which new data products to create or how to change an existing data product (e.g., adding a new field to a table) and define your KPIs and success factors.
Develop: Create data pipelines and data transformation models to create the new data set from one or more trusted sources. Submit model changes for code review.
Test: Write and run unit, data, and integration tests that verify your transformations are running correctly.
Deploy: Move the changes from development through production using an automated CI/CD process.
Operate and Observe: Ensure data changes remain in a steady state by testing data in production and recovering quickly from failure to maintain always-on access to data. Ensure access to data is controlled by role-based access control (RBAC) and that sensitive data—e.g., customer’s Personally Identifiable Information (PII)—is restricted and audited.
Discover and Analyze: Enable data stakeholders to find data products and standardized metrics so they can use them to answer questions and drive business decisions.
DataOps personas
Just as the process of DataOps differs from DevOps, so do the personas. The following personas aren’t fixed roles but, rather, hats that multiple people can wear at different times.
The engineer: Creates reusable data assets—pipelines, models, metrics, etc.
The analyst: Performs analysis on data sets that drive business decisions.
The decision-maker: Takes the output from the engineer and the analyst and translates them into actions for the business.
Benefits of DataOps
DataOps has many of the same benefits as DevOps. It also adds a few additional benefits:
Focuses data teams on business outcomes. Historically, data engineers and other technical roles have driven the definition of new data products. That’s led to new features being driven more by technical capabilities than by the needs of the business. By involving analysts and decision-makers throughout the ADLC, DataOps aligns all data changes with business KPIs and OKRs.
Democratizes access to data. Up to 75% of data in a company may be “dark data” - i.e., data rotting away unused in undiscoverable silos. Thanks to its flexible personas and emphasis on data discovery, DataOps reduces dark data, driving additional business revenue with value-added data products.
DataOps vs. DevOps breakdown
Multiple similarities and differences likely jumped out at you while reading the descriptions above. Here’s a summary of some key points where both processes meet—and where they diverge.
How DataOps and DevOps are similar
Agile. Short development lifecycles with all team members participating at each step. Inclusion of business stakeholders to ensure alignment with business objectives and customer needs.
Emphasis on quality. Use of version control, automated testing, code reviews, and multiple deployment environments to increase the quality of each release.
Automation. Reliance on CI/CD pipelines to eliminate human error from the release process and increase release cadences.
How DataOps and DevOps are different
Flexible personas. In the data world, one person can be an engineer, an analyst, and a decision-maker on different projects. DataOps recognizes this reality and doesn’t associate personas with job titles.
Discoverability. Good data doesn’t have value if people can’t find it. As such, DataOps devotes part of its process to ensuring new data products are published in a centralized repository for easy discovery.
Security and access control. In DevOps, security is more about issues such as supply chain control and user application access. In DataOps, ADLC participants have to consider compliance with industry standards as well as national laws governing data privacy, such as GDPR.
Driving DataOps with dbt Cloud
Because they’re so heavily dependent on automation, DevOps and DataOps require great tooling to implement. Traditionally, this means teams must spend weeks or months building their own CI/CD pipelines from scratch.
But not anymore. There’s a better way to do DataOps.
With dbt Cloud as your data control plane, your data teams have a standardized and cost-efficient way to build, test, deploy, and discover analytics code. Meanwhile, data consumers have purpose-built interfaces and integrations to self-serve data that is governed and actionable.
dbt Cloud makes implementing DataOps easy:
- Represent all of your data transformation pipelines as dbt models in SQL or Python, enabling anyone to develop data pipelines
- Develop data tests
- Store all changes in version control to facilitate code reviews, versioned releases, and rollback
- Kick off a CI/CD pipeline to test and push changes from dev to stage to prod
- Create and publish standardized metrics with dbt Semantic Layer
- Find data products and metrics using dbt Explorer
Learn more about how dbt Cloud can bring DataOps to your organization—schedule a demo today.
Last modified on: Nov 18, 2024
Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.