Implementing DPICT: Preventing Analytics Reporting Breakage through Data Pipeline Contracts at Xometry from Coalesce 2023
Jisan Zaman, data engineering manager at Xometry, discusses using data pipeline contracts to prevent breakage in analytics reporting.
“When things do break…it's about 36 times faster [to fix], especially if it's on the upstream changes that developers are making.”
- Jisan Zaman, Data Engineering Manager at Xometry
Jisan Zaman, data engineering manager at Xometry, discusses using data pipeline contracts to prevent breakage in analytics reporting. Jisan discusses the problems they faced with their data ingestion system, the solutions they implemented, and the results of those solutions.
The use of a data pipeline contract to prevent breakage in analytics reporting
Jisan states, "Our pricing models, quoting models, [and] shipping models that we used are directly dependent on the data that we ingest into our data warehouse." He highlights that any upstream changes in their main databases or microservices could cause delays and problems with their reporting.
“We needed a way to automate the process so that developers would know when they're publishing the data. They would know what it is they're affecting, and if they're changing anything, deprecating anything, deleting any columns, they would know what they were doing and what the impacts of it would be."
To tackle this, they established a process called DPICT (which stands for “data pipeline contract”) that gives the engineering teams a clear and repeatable process that allows them to be the owners of their own database data. This process ensures that any changes made by developers do not adversely impact downstream reports. Jisan emphasizes, "DPICT gives you column-level lineage and prevents breakage at the source."
The importance of accurate and comprehensible data lineage tools
Jisan emphasizes the need for accurate and comprehensible data lineage tools in managing and preventing breakages in analytics reporting. He highlights their use of a tool called SelectStar in tracking their data back to the root cause of the issue.
Jisans explains, "SelectStar is a lineage tool that gives you insight, not only of your dbt layers…it gives you insight from your data warehouse to any downstream report that's in Looker, Tableau, [or] whatever you use. It gives you full visibility into your data lineage and it gives you column-level lineage." He also notes that this tool helped them to automatically detect and display accurate column-level lineage, mitigating issues and providing developers with context for any error messages.
The importance of comprehensive documentation and developer engagement
Jisan highlights the importance of clear, comprehensive documentation and deeper engagement with developers for the smooth functioning of data pipeline contracts. He stresses that developers are more likely to follow guidelines that are clearly laid out and easy to follow.
"None of this works well without really great documentation," says Jisan. "Developers are comfortable following clear and specific documentation, and then in here, it has a step-by-step guideline of how to automatically keep your staging layers up to date."
He also shares that their approach has led to significant time savings and increased efficiency: "200 plus hours saved for the data engineering team, per year. Lowering the surface area problems, especially on the microservices layer where we had some control, really helps with that. And it's about 36 times faster."
Jisan's key insights
- Xometry faced problems with its data ingestion system, with developers making upstream changes in databases or microservices causing delays and problems in reports
- The company implemented a solution called DPICT which allowed developers to be the owners of their own database data
- They also used a node module to allow users to easily add tables to the schema
- The implemented solutions have led to significant time savings, faster problem-solving, and a decrease in resources required to add new data sources.
- The solutions have also increased transparency, reliability, and speed in Xometry's data engineering processes