Reliable and scalable data transformation at Fanduel from Coalesce 2023
"This isn't just another talk about why you should choose dbt. This is a talk about how we implemented dbt at a mature organization, both from a technical and collaborative point of view..."
- Phillip Tan, Technical Program Manager, Fanduel
Fanduel team members Phillip Tan, Michael Lee, and Harry Williams, describe their challenges, strategies, and learnings from migrating a mature organization to dbt. They also share insights on how they implemented dbt and how it transformed their company's use of data.
The growth of Fanduel required scalable data solutions
Fanduel, a leader in the online sports betting industry, saw significant growth which led to an increase in data processed daily. Because of this growth, the company realized that they needed scalable solutions for their data platform. "With more data being processed each day, we realize that the tech stack that got us this number one position won't be the same one that keeps us there," explains Phillip, their Technical Program Manager.
The team at Fanduel acknowledged that they process diverse data points to provide their customers with a world-class experience, with their bet transactions table processing over a quarter billion records per day. "This data can power things like our marketing decisions, our betting lines, and even our responsible gaming program which helps protect our customers," Phillip adds.
However, the growth also brought challenges. "Our pipelines were very complex. We have a huge amount of dependencies that interact with each other. We also have a huge amount of SQL scripts that were executing during these pipelines," says Harry, one of the team’s Senior Data Engineers. As such, the team at Fanduel had to come up with a strategy to manage and utilize their data effectively.
Shifting from Redshift to Databricks
Fanduel’s team experienced limitations when scaling on Redshift, so they began to explore alternatives. "Knowing that Redshift would be a limitation, we ran proof of concepts on other data platforms, especially Databricks,” says Michael, a fellow Senior Data Engineer at Fanduel.
Following successful proof of concepts, the company committed to a migration plan onto Databricks. The phased approach involves infrastructure setup, governance policy refactoring, developing new data pipelines, and porting Redshift dbt models. Michael adds, “Setting up dbt on Databricks was fairly streamlined, and there was a 20% to 30% increase in performance against similarly sized data brick SQL warehouses."
Fanduel's strategy for data migration involved training, validation, and careful rollout
Fanduel's complete migration plan involved several key steps, including training engineers, migrating models, validating the migrated models, and carefully rolling out the changes. "Firstly, we had to train our engineers. A lot of our engineers [didn’t have] dbt experience...," Harry states. Once the engineers were comfortable with the new system, they began migrating models.
Validation was a crucial step in the process. The company wanted to ensure that the migrated data was accurate and consistent. "Validation was probably the key thing here because we didn't want to disrupt our data customers. We wanted to make sure that the data that we were getting out of these models was matching what we currently had in production," Harry mentions.
With the validation process complete, the team could then roll out the changes. They tracked the migration progress using a mirror board, which allowed them to see which models had been completed, which were still in progress, and who was working on each model.
The implementation of dbt significantly improved data management and reduced issues
The implementation of dbt at Fanduel led to several improvements in data management. They saw a 133% decrease in data support tickets, indicating an increase in data quality. They also experienced a 28% increase in the resolution time for data support tickets, showing an improvement in development turnaround time. Furthermore, the company was able to meet 100% of its SLAs over the last four months.
Phillip explains that developers found dbt fairly straightforward to learn but difficult to master. He states, "dbt is fairly straightforward to learn, but we believe it's much harder to get good at. The learning curve is following best practices."
Through the implementation of dbt, Fanduel was able to manage its increasingly complex data and provide a better service to its customers. The process wasn't without its challenges, but the company was able to adapt and improve its data management significantly.
Michael, Harry, and Phillip’s key insights
- Fanduel faced various challenges during migration, including data duplication, data access issues, inefficient SQL queries, and quality degradation due to large amounts of data
- They adopted dbt to tackle modern data issues and improve data quality, development turnaround time, and reduce SLA failures
- Fanduel's migration process involved three main steps: migrating to their internal airflow DAG factory, shifting from an ETL to an ELT paradigm, and transforming data pipelines into dbt models
- Fanduel’s team partnered with Brooklyn Data to implement dbt models, implement data quality testing, and solidify their working model
- Fanduel shifted to Databricks from Redshift due to its better performance, cost-effectiveness, and improved query management