This is the story of how Pepperstone uses dbt to create data decision-makers can rely on
A Major Forex Player
Founded in 2010, Pepperstone has rapidly grown to become one of the world’s leading foreign exchange (forex) brokers. The business now operates from offices across the globe, has more than 300,000 traders on its books, and handles more than US$12.5 billion worth of exchanges on an average day.
However, with the impressive growth of the business came an increase in the amount of data Pepperstone’s data team needed to process. As the number of incoming sources began to stack up, the team found that their existing systems weren’t able to keep up with the growth.
At the time, Pepperstone’s data team was operating with an analytics schema in Redshift, created from a collection of sources in their data warehouse. The BI-reporting schema read from all the other source schema in Redshift, and the team would build out tables and views to then ingest into Tableau.
“It was created when the team was a bit smaller and the company was a bit smaller,” explained Sam Ellett, Lead Data Scientist at Pepperstone. “At that point in time, it suited our needs, but as the business grew, we quickly hit the ceiling as new sources and new team members came in. As a result, we couldn’t scale our data sets effectively, and we started to lose a sense of lineage as the number of datasets blew out.”
As the growing business added more and more data sources, it became difficult for the data team to have complete confidence in its output. “We would start to lose track of how an upstream change would impact a downstream data set that would be exposed in Tableau,” said Sam. “We would often have the same metric across various dashboards being different.”
These inconsistencies presented a significant issue for the Pepperstone team. Of course, every industry relies on accurate and trustworthy data, but there are few where reliability is more important than forex.
Sam explained: “Pepperstone works in an industry where price matters and price can change every millisecond. Small inaccuracies can blow up to very large reporting issues.”
These small inaccuracies showed up as the same metrics, such as revenues and retention numbers, appearing as different numbers across dashboards. These discrepancies began to undermine the data team’s ability to deliver accurate, clear reporting to the business.
“Our business stakeholders lost confidence in our numbers,” said Sam, explaining that as a result, other teams within the organization tried to perform their own analytics without involving the data team—pulling numbers from older spreadsheets and potentially making decisions that weren’t backed by reliable data.
“That confidence is really hard to build…and it’s really easy to lose,“ he added.
This lack of confidence had consequences across the team. It became harder, for example, to interrogate unexpected data when it appeared on a dashboard. Was an unusual data point a sign of something interesting buried in the numbers? Or was it the result of an inconsistency in the system?
“It’s good to be surprised if your confidence in your model is high because that’s an insight,” explained Sam. “It’s bad to be surprised if you have no confidence in your model…that’s just a search for the mistake.”
A strong lineage
Sam began working with dbt Cloud after a recommendation from a former colleague. After experimenting with a small project, he quickly recognized the potential benefits.
“I realized dbt was good for us as soon as I saw the lineage,” he explained. “It was easy to educate others because it was visual. You could just read it; you could actually see it.
One of the team’s main goals in switching to dbt Cloud was to slash the number of inconsistencies in their reports, with an aim of limiting them to only a handful of cases each year. In addition, Sam and his team wanted to set up comprehensive test coverage of their most important reporting. Any mistakes needed to be caught before the reports are sent out - “it’s better for us to delay a report rather than send out an inaccurate one,” he noted.
The DAG and documentation that clarified Pepperstone’s data lineage have not only reduced errors by 80% but also allowed the team to scale its data sources. “We get new data sources a lot of the time,” said Sam. “We’ve been able to keep pace with the changes now because we have a really easy way to add or update documentation.”
Realizing the benefits of dbt Cloud
Since introducing dbt Cloud, the data team at Pepperstone has been able to keep up with the business’s rapid growth.
Onboarding new team members with ease
One of the issues the Pepperstone team encountered was that their legacy systems made it difficult to bring on new talent. New hires had to absorb fragmented context, which was time-intensive to teach and learn “We’ve got a growing team worldwide and needed a better way to onboard people,” said Sam. “Before, we were very much at capacity.” Onboarding new data team members the old way slowed analytical capacity and hindered the team’s ability to scale with the company.
After implementing dbt Cloud, easy access to information simplified the onboarding of new team members. Documentation became widespread and detailed, with systems in place for users to request any missing information from the team.
“We’ve already managed to release company-wide documentation across our data sources, which we never had before,” said Sam. This knowledge hub, accessible to all, allowed analysts of all skill levels to plug into the right models and sources without the previously-required extensive context.
Enabling analysts to do more
The move to dbt Cloud has also allowed many existing team members to expand their analytics skills with Jinja templating. “A lot of people had reached a bit of a ceiling in terms of what they were learning in regards to SQL,” Sam explained. “It allows analysts to build macros and think a bit more programmatically in SQL, which has raised that ceiling.”
One of the most apparent improvements to Pepperstone’s workflow came from a 30% increase in speed to delivery. Beyond improving the team’s data modeling efficiency, “the increased speed gives analysts more time for insights and more time for discussion with stakeholders about the report that they’ve built, which should be the priority,” noted Sam. “We aren’t just building a data set; we should be talking about it.”
Building a trusted source of truth
With more people contributing to data modeling, faster, quality remained top-of-mind. The clear lineage and out-of-the-box test allowed the team to quickly identify any potential errors or inconsistencies, making it much easier to track any issues back to their source.
“It makes our life easier if we’re able to identify root data quality issues in sources and raise that visibility to the engineering team who can fix the root cause,” said Sam.
The ability to test for quality and easily find and resolve issues allows Pepperstone’s analysts to be more confident in their data sets and deliver insights without worrying about unreliable data.
“Trust is so important because we are the experts in data analysis,” Sam explained. “And if you have a good level of trust, your insights are more likely to be robustly discussed.”
Pepperstone holds trading licenses across the globe. Its international market presence means the organization is held accountable to a multitude of regulatory environments.
From anti-money-laundering transaction monitoring programs to frequent standardized reports, failing to supply reliable data could cause regulators to impose financial penalties or withdraw their licenses.
On top of strict requirements, earning and maintaining each license is further complicated by the fact that no two licenses have the same requirements. “So you have to be nimble, you have to be able to scale compliance, and you have to be able to handle many different requirements.
“Working with dbt helps us achieve that,” explained Sam.
What’s next for Pepperstone?
Looking ahead, the Pepperstone data team is looking at improving its workflow and efficiency. As part of this, the team plans on performing meta-studies on its existing projects.
“We’re going to run analytics on our dbt analytics, getting across more of the metadata reporting,” said Sam. “I want to show all our tests to the rest of the business.”
Beyond looking to ensure that his team is the most trusted source of analytics, Sam also plans to enable other teams throughout Pepperstone to do their digging and research using dbt.
“My job is to make it as easy as possible for the team to get their work done,” said Sam. “And dbt helps me do just that.”