Rebtel's data product value enhancement: Migrating to a modern data stack with dbt Cloud and Snowflake from Coalesce 2023
Quentin Coviaux, Data Engineer at Rebtel, describes the company's journey of migrating its data pipeline.
"We have happy data engineers now right, so we're not really stressed going to work anymore.”
Quentin Coviaux, Data Engineer at Rebtel, describes the company's journey of migrating its data pipeline and transforming its data processes. Quentin also explains the issues he faced with Rebte’s legacy stack, the steps he took to overcome these challenges, and the results of these efforts.
The migration from Matillion to dbt for data transformations
Quentin shares his team's experience of migrating their data transformations from Matillion, a graphic user interface, to dbt, a SQL-based tool. They found the transition seamless and beneficial, ultimately allowing them to consolidate all their transformations in one place, reducing redundancy and increasing efficiency.
"Over the years, our data stack has changed a lot. We started using dbt around 2019 because we were just more comfortable writing code directly rather than just having a graphical component," Quentin explains. "We decided to consolidate everything into dbt, right? That just makes sense, and you know we had documentation that was embedded in the product… same for data testing, and a lot of other features, so it just made sense to consolidate everything into dbt."
He highlights the importance of being patient and communicating throughout the migration process. "We continuously brought up those issues with upper management and just resurfaced all of these problems continuously, so it would not be hidden in some obscure data issues," he explains.
Implementing real-time data points
Quentin details how his team worked to reduce data latency. They worked closely with the backend engineering team to generate real-time events which were sent to a Kinesis stream and then pushed to Snowflake, significantly reducing data latency.
"Because we were not really satisfied with this delay in the data, we wanted to change this… We had some events sent in real time to our Kinesis stream, and then with a Snowpipe injection, we sent that to Snowflake," Quentin elaborates.
This shift to real-time data points not only made the data more timely and relevant but also helped the team focus on more high-impact tasks.
Importance of stakeholder involvement in data validation
Quentin emphasizes the importance of involving stakeholders in data validation. As his team migrated their data transformations, they continuously checked with stakeholders to ensure that the metrics matched their expectations, enabling them to catch and correct any discrepancies early.
"When we were fairly confident, the metrics were usually different. Not significantly different, but there were some differences here and there…we would involve stakeholders," Quentin notes.
This joint effort in data validation not only ensured the accuracy of the migrated data but also built trust between the data team and the business stakeholders.
Quentin’s key insights
- The company faced several issues with its legacy data stack, including redundancy in its pipeline, infrastructure overhead, multiple points of failure, and high latency in data retrieval
- They decided to consolidate all their transformation into dbt and move over to dbt Cloud, which greatly reduced their infrastructure work
- They also made changes to their ingestion process, including generating real-time event streams and automating data push to AWS
- The migration process was a lengthy one, requiring continuous communication with management and stakeholders, and careful planning and validation
- The migration resulted in significant cost reduction, near real-time data points, zero time spent on infrastructure work, and happier data engineers