How Monzo Bank increased analytics velocity with dbt
Monzo’s team has grown by nearly 4x in the past year, and the data team has expanded from 9 to more than 30. Pressure on the data pipeline was higher than ever and analytics velocity was beginning to slow. Here’s the story of how Monzo’s best-in-class data team built a scalable analytics engineering workflow with dbt.
Travel anywhere in London and you’ll see the flash of Monzo’s hot coral card. The largest digital bank in the UK, Monzo has over 3 million total users and is adding 55,000 new accounts every week.
For Dimitri Masin, VP of Data at Monzo Bank, a successful analytics team is a fast analytics team, “We want to build a world where any data scientist and analyst can have an idea on the way to work, and explore it end-to-end by midday.”
This kind of speed means that analysts can have a much bigger impact on their teams. They’re empowered to work like “mini-CEOs” able to deeply understand a problem, make decisions, and measure the impact of changes on customer behavior. It means they can get more done. It means the data they deliver is trusted and reliable.
But what does it take to make this vision a reality?
Stephen Whitworth, Data Engineering Lead at Monzo, says Monzo’s vision for fast analytics requires improvement along two dimensions:
- ETL pipelines need to be reliable
- It needs to be easy for analysts to make changes to data models
These two dimensions are often thought to conflict with one another. Greater reliability is frequently achieved by hiring very technical talent—data engineers—which then injects another step in the process of getting a question answered. “In big companies with centralized BI teams, getting data ready to explore can take weeks or even months,” Dimitri says. “That’s not a world we want to live in. We want analysts to be able to explore data in a matter of hours.” They needed a solution that helped them improve along both dimensions at once.
To help them achieve their vision for fast, reliable analytics, they implemented dbt.
Dimitri and Stephen were drawn to dbt because it aligned with Monzo’s existing viewpoints about how data transformation and modeling work should be done, “We believe analysts should work like software engineers,” Stephen says. “Version control, continuous integration, testing – these are practices that make anyone working with code more efficient and productive.”
dbt enabled analysts to work like software engineers in a few ways that were of particular importance to Monzo:
1. Data transformations are written in-warehouse, in SQL
“If we expect data scientists to own ETL end-to-end, this process needs to be so simple that any person with good SQL skills can get up to speed within one month after they join,” Dimitri says. Monzo uses BigQuery, a modern cloud warehouse powerful enough to handle transforms in-warehouse. With dbt, Monzo is able to do 99% of all data transformation work in BigQuery using SQL. This makes it easy for analysts to author data pipelines, make changes, and troubleshoot issues when they arise.
Importantly, ease of use doesn’t come at the cost of sacrificing reliability. “With dbt, we have a common way of writing all of our data transformations so the data engineers can now build tooling around those assumptions that abstracts away the complexity, allowing analysts to focus on just writing good SQL,” Stephen says. “And this makes analysts more productive.”
2. Analytics work is collaborative.
Analysts are used to working in ad-hoc ways, and while this is fast in the short-term, it makes it difficult for analysts to collaborate on a code base, which slows teams down in the long-term. “We aim to work in a way that means every individual piece of analysis makes data at Monzo a little bit better,” Dimitri says. “We encourage our data analysts to spend maybe 30% longer answering a question, if it means they do it in a way that’s easily reproducible and benefits other people in the company.”
Dimitri’s advice to analysts on the team is to assume that there is a 100% chance that someone will ask them to update a piece of work. “In the worst case, that’d mean six months into the job you’d be spending 50% of your time updating your previous work instead of doing something new,” Dimitri says.
With dbt, all analysts are working on the same central code base which adheres to the software engineering best practices of version control and continuous integration. Working in this collaborative way means that a company’s analytics code base becomes more stable and valuable over time.
3. Data accuracy and speed require thoughtful trade-offs.
One of Monzo’s most important data models builds The Single Customer View file. The FSCS (similar to the FDIC in the United States) requires that all banks be able to produce this file within 24 hours of the regulator’s request. For this data model, accuracy is critically important. With dbt, the team has built rigorous testing on every aspect of this model. When a test fails, the team is alerted immediately via Slack. And just as importantly, using dbt docs, the logic of every column is thoroughly documented.
But not all data models require this same level of rigor. With dbt, it’s easy to attach “severity” ratings to tests, indicating if a test failure is critical. “If you treat all data the same and try to enforce very high accuracy everywhere, you’ll often find yourself sacrificing speed,” Dimitri says. “We differentiate between crucial and non-crucial data sets and enforce tougher change management procedures on anything that’s tagged as crucial.”
4. Data should be self-service.
Data is everywhere at Monzo. It’s in everything the company does. And there is no way that analysts alone can serve all of those needs. “Looker is a staple in our analytics stack that empowers non-technical people within the business to self-serve data questions in ~80% of cases,” Dimitri says. “In fact, more than 60% of people at Monzo are weekly active users on Looker!”
With dbt tests, the data team at Monzo can be confident that the data being delivered to end users is accurate. “In the old world, we’d often find errors within Looker and then we’d have to go far up the pipeline to find the source,” says Rob Knight, Operations Analyst at Monzo. “Sometimes it would be all the way at the source, which means we spent hours debugging an issue that was just poor data sent to us in the first place. dbt helps us enforce data integrity earlier in the pipeline which means we catch issues before they reach Looker.”
At the same time, dbt docs also makes it easy for users to understand the data sets more deeply. Rather than interrupt an analyst to ask what a given column represents or how its calculated, they can jump over to dbt docs and explore how all the data flows together and what each column means.