/ /
How to reduce your data pipeline tech support burden

How to reduce your data pipeline tech support burden

Kathryn Chubb

on Sep 15, 2025

Deriving business value from data isn’t a simple task. It requires combining and transforming raw data from multiple sources to create high-quality data products.

Doing this at scale requires more than just writing a one-off data transformation script and setting up a cron job. It requires creating scalable and robust automated data pipelines that can continuously validate and publish new data as it arrives.

Many data teams start off trying to build this functionality themselves. It doesn’t take long before they find themselves in tech support hell. Pretty soon, it seems all their available time is spent supporting users and fixing pipeline errors instead of investing in next-generation solutions.

The components of data pipeline automation

To prepare data for commercial use, data engineering teams need to ensure that, at a minimum, all data is:

  • Processed using high-quality data transformation code that’s been thoroughly tested
  • Packaged and deployed into easily discoverable data products
  • Continuously monitored for data quality and governance issues

To facilitate this, most spin up some form of data pipeline automation, turning a manual, error-prone, and inconsistent process into one that’s standardized, centralized, and well-governed. These pipelines typically consist of several open-source and commercial data tools, which shepherd data through a multi-stage process:

The challenges with a DIY data pipeline

To be sure, there are benefits to rolling your own data pipelines. There are a number of fantastic open-source data tools on the market. This enables data teams to mix and match components to meet their exact needs, often with minimal licensing costs.

Unfortunately, as most teams quickly realize, the tech support burden of a DIY approach escalates quickly. A few of the problems that rear their ugly heads include:

  • Local installation is a headache
  • Git-based workflows are complicated for many users
  • Maintaining reliable CI/CD infrastructure is resource-intensive
  • Little to no support for local testing
  • Lack of self-service tools

Local installation is a headache

Most DIY systems require anyone who wants to contribute to data pipelines to download, install, and configure a vast array of tools - data warehouse connectors and CLIs, Git, dbt, sqlfluff, etc. Less technical users might struggle to install all of these dependencies correctly - especially if one of their dependencies conflicts with another software package.

Data engineering team members are the ones on call to troubleshoot and resolve these issues. The more time spent on tools debugging, the less time they have to spend on more fundamental work, such as improving the company’s overall data architecture.

Git-based workflows are complicated for many users

Git is the source code control powerhouse that powers nearly all automation in the software and data engineering worlds. Using Git repositories, teams can easily collaborate on data analytics code, tracking and reviewing all changes. Pull requests to a repository act as a trigger to kick off an automated testing and deployment pipeline.

While Git is ridiculously useful, it’s also complicated. Even engineers struggle with groking it for the first time. It’s so easy to paint yourself into a corner with Git that entire websites exist to explain how to get out of them. (As one popular site put it, “Git is hard: screwing up is easy, and figuring out how to fix your mistakes is f*****g impossible.”)

To make matters worse, not everyone using data is a technical user. A mature Analytics Development Lifecycle (ADLC) involves data engineers, analytics engineers, analysts, business stakeholders, and other roles that overlap or fall in between these catch-all job descriptions.

Even for those who are SQL experts and may be able to contribute to data pipeline code, working with Git creates a high barrier to entry. And supporting their issues can become a full-time job in and of itself.

Maintaining a CI/CD infrastructure is resource-intensive

Automated data pipelines are the lifeblood of a data-driven organization. If they go down, business grinds to a halt. Keeping them running is a Herculean effort that requires:

  • Scaling up to handle incoming data spikes - and scaling back down to avoid unnecessary cloud spend
  • Detecting or fixing issues - invalid or unexpected data from an upstream source, data drift, cloud computing platform issues, etc. - that can result in pipeline stoppage
  • Continuously monitoring and improving performance as data workloads grow over time

Given the effort involved, it’s no wonder many large companies need a small operations team just to keep their data pipelines running smoothly.

Little to no support for local testing

A good data pipeline will run automated tests on any code changes. This verified that the changes are error-free before running the code in production.

However, this involves a time-consuming round-trip. Data engineers have to check in a change to a data pipeline. Then, they have to wait for the pipeline to trigger (which may take a while if others are ahead of them in the build queue) and the test suite to run.

If a test fails, they have to sift through the logs, find the root cause, make a fix, and check in another change. That starts the whole time-consuming process all over again.

It would be much faster if data engineers could thoroughly test and debug changes on their dev boxes before checking in code. Usually, this involves creating a separate dev data warehouse, with isolated environments for each engineer.

This is time-consuming to create. It’s even harder to keep synced with the current state of production. It also significantly increases data development costs.

Lack of self-service tools

Data isn’t worth anything unless the people who need it can find it. One of the most frequent drains on engineering teams is answering basic questions about data, such as:

  • Where do I find data for [x]?
  • Where did the data come from?
  • When was it last updated?
  • Can I trust it?

Most homegrown data pipelines don’t provide a way for users to answer such basic questions for themselves. That results in a large queue of support tickets that fall into the data engineering team’s lap.

Building low-overhead data pipelines with dbt

All of this tech support burden means that data engineering teams spend less time building useful new data products.

The good news is that it doesn’t have to be this way. dbt is a data control plane that teams can use to build, deploy, and monitor data transformation pipelines at a fraction of the time and cost of a DIY solution. What’s more, dbt provides tooling that enables all participants in the data lifecycle to find, understand, and utilize analytics code - not just data engineers.

Create scalable data pipelines with a few clicks

Using dbt, data engineers can set up full CI/CD promotion pipelines for their analytics workflows with just a few clicks.

All changes to analytics code are checked into Git source control, where other team members can review them before approving for deployment. The CI/CD pipeline can then run all associated data tests in a pre-production environment, and deploy changes only if all tests succeed.

This automated release process means deploying analytics code changes isn’t an error-prone manual process that pulls the data engineering team away from more critical work. It also enforces a series of gates that improve the quality of all code shipped to production, reducing the time that engineering spends diagnosing and fixing data issues.

Analytics that are accessible to everyone

Not everyone who touches analytics code wants to write their own SQL or memorize Git commands. That’s why dbt offers multiple ways to create or revise analytics code:

  • dbt Studio, a fully-integrated cloud IDE for all personas that simplifies both editing and checking in code
  • dbt Canvas, a tool for analysts and business stakeholders that provides a visual editing experience for data transformations
  • The dbt extension for Visual Studio Code, for seasoned engineers who are most comfortable with traditional programmer IDEs

dbt enables producing data products that are easy to discover, understand, and use:

  • Built-in support for documentation and data lineage means stakeholders can more easily understand the origin and purpose of data
  • All data products are propagated to dbt Catalog, where stakeholders can self-service answers to their most common questions about data
  • Stakeholders can also verify the health and quality of data using data health tiles

Debug and test locally, as you write

The dbt Fusion engine is a rewrite of the dbt engine that speeds up development by implementing a full SQL compiler locally. This enables Fusion to parse their entire projects and understand data dependencies, giving contributors immediate feedback on data model errors as they code.

The dbt Fusion engine understands and emulates all major data warehouses, so engineers can test code locally - no need to set up dev instances of the warehouse. It’s written in Rust as a single installable binary, making it simple to install locally. You can power your dbt projects with the Fusion engine in the dbt platform.

Conclusion

DIY data pipeline solutions start with the best of intentions. Ultimately, however, the attending tech support burden means they fail to scale with your business.

By migrating your data pipelines to dbt, you can ship more analytics code to production in less time and at less cost. Reduced infrastructure costs, self-service features, and AI-powered productivity tools mean your data engineering teams can spend less time on tech support and more time building tomorrow’s solutions.

Try it for yourself today by signing up for a free dbt account.

Published on: Sep 15, 2025

Rewrite the future of data work, only at Coalesce

Coalesce is where data teams come together. Join us October 13-16, 2025 and be a part of the change in how we do data.

Set your organization up for success. Read the business case guide to accelerate time to value with dbt.

Read now

VS Code Extension

The free dbt VS Code extension is the best way to develop locally in dbt.

Share this article
The dbt Community

Join the largest community shaping data

The dbt Community is your gateway to best practices, innovation, and direct collaboration with thousands of data leaders and AI practitioners worldwide. Ask questions, share insights, and build better with the experts.

100,000+active members
50k+teams using dbt weekly
50+Community meetups