Mastery Logistics establishes an entire data function in five months
Mastery Logistics was founded by industry pros who already sold one company, Coyote Logistics, to UPS. This time around, the founders had a new mission–to reduce waste across the entire freight logistics industry. Data is central to this mission.
As an early employee at a Mastery Logistics, Jessie Daubner was responsible for building the data function from the ground up. “In the first couple months I was just trying to wrap my head around what the business needed data to deliver on,” said Jessie. “And in the back of my head I’m thinking, ‘It’s going to be a data warehouse. I hope it’s not a data warehouse, but of course it’s going to be a data warehouse’.”
Jessie’s background and interests are in data science and machine learning, “Enabling more complex data science and machine learning work always requires an initial upfront investment in data warehousing and ETL, and I’ve seen this work take years,” Jessie said. So before kicking off any massive infrastructure projects or going on a hiring spree, she wanted to make sure she had a concrete idea of what the data roadmap would be, or at least, a roadmap that was as concrete as any early-stage startup roadmap could be.
The most urgent need that emerged was reporting. Specifically, Mastery wanted to deliver pre-built reports to clients. This meant centralizing data in, you guessed it, a data warehouse.
Building a modern data stack
“Data is considered a competitive advantage because it’s something that competitors in the space don’t do well right now,” Jessie said. “We wanted to deliver basic reports to our partners, with the ability to make data a core part of our long-term vision.” There were two things that were important to her as she started building Mastery’s data infrastructure:
- Flexibility: Jessie has extensive experience working with managed services. This time she was the one building the data function and she wanted to do things differently. “I don’t want to be a database administrator. If I can avoid that, I’m going to,” Jessie said. In addition, “I wanted to build a modern data stack that was cloud-agnostic and gave us the flexibility to adopt the best tools now and in the future.”
- Speed: She didn’t have years to build data infrastructure, it had to happen fast. In December of 2019, she signed contracts with Fivetran, Snowflake, and Tableau, and implemented dbt.
Optimizing for speed and flexibility has proven to be a good choice. For example, the team believed the most value they could deliver to clients would come from a set of 10-20 canned reports delivered in the Transportation Management System via Tableau. As they got up and running with dbt, the data team realized that they could deliver the same value, and save themselves time, by delivering a small set of key Tableau dashboards along with access to the complete data set via Snowflake shares. “Since business logic is stored in dbt, building reports directly on top of Snowflake data is easier for our clients as we’ve already done the heavy lifting, in terms of transformations, for them,” Jessie said. “In addition, dbt makes Tableau reporting easier for us because it allows us to reuse data sources across reports. We don’t have to recreate logic in each subsequent Tableau dashboard and hope it matches the same logic/calculated field used in the prior report.”
Hiring the right people
At the same time that Jessie kicked off work on building data infrastructure, she also began hiring her team, positioning the modern data stack and associated workflows as a benefit to joining her. “My first hires on the data team wanted to increase their technical knowledge and skill set,” Jessie said. “Just learning the software engineering process and applying that to analytics was a step in that direction for them.”
Jessie has found success hiring people who have experience with older ETL technologies: “GUI-based ETL tools limit the amount of creativity and expression you can have. The analytics engineering workflow is just more interesting work than dragging and dropping something in an interface. I look for people who are excited about learning that new way of working.”
She also wanted to be realistic about what the work would entail. “I didn’t want to hire a data scientist and make these wild promises that would all be data science work, they would need to do some reporting, at least to start. But I also didn’t want to hire data analysts whose sole job would be to just create Tableau dashboards forever.”
So she hired the classic startup skill: “People are going to be able to wear a lot of different hats.” For her data team, she wanted people with some experience with programming–Python, object oriented programming, and a familiarity with writing DRY code or the concepts of modularity–as well as some exposure to reporting and BI. “I wanted them to understand that world a bit and what they were getting into,” Jessie said.
Creating a collaborative culture
As the people and technology began coming together, Jessie started to think seriously about how to create the right habits on her data team. “Data teams can lose trust with end users really quickly when there are bugs or mistakes.” Jessie said. She wanted to create a collaborative data team culture that valued high-quality, trusted code.
She points to three practices that are helping them build this culture on the Mastery data team:
- Require two people to review every pull request. Jessie wants to be intentional about how her team learns to write code together, and the best place to put this in practice is during code review. “We explicitly require that every pull request is reviewed by two people, and the expectation is that you will get quite a lot of comments and questions. This is a powerful way to get feedback, learn, improve, and knowledge share,” Jessie said. “To me, it’s better to take the hit on velocity today if it means the quality of what we’re producing is improving over time.”
- Define SQL conventions. “Very early on we looked at the SQL conventions from Fishtown Analytics, and then went through them and were like, ‘Well, this resonates with us, this doesn’t, or we would like to add this addition’.” This has helped reduce cognitive load for people collaborating on dbt models, but like many in the dbt community, she dreams of a SQL linter: “When someone is looking at the diffs of a code base, you want them to only be looking at real changes, not formatting changes that adjusted one space.”
- Lay the foundation of a testing program. Jessie sees the data team as being the first line of defense in catching data integrity issues. This is particularly challenging when providing product usage reports on a product that is, quite literally, being built while they are reporting on it. Today, they are running tests when building new models. In the future they will automate these tests, pipe test failures into Slack, and improve how new features and reporting on those features are rolled out. Today, they’re focused on getting the basics right. “We have the basic tests in place that cover things like uniqueness, missing values, and accepted values,” Jessie said. “But we haven’t gotten super complex yet.”
To most software engineers, these practices will look familiar. “This isn’t anything new, it’s how every high-quality software project is run,” Jessie said. “You expect there to be tests. You expect there to be documentation. You expect the PR process to be collaborative. You’re building software together. We’re just applying this to analytics code as well.”
When Jessie first joined the Mastery Logistics, her biggest concern was that her team would spend years mired in a data warehousing project. With Fivetran, Snowflake, and dbt she was able to set up her data warehouse and supporting infrastructure in a matter of months. In addition, the adoption of modern analytics engineering workflows and tools proved useful in helping her attract the right data talent.
The team’s biggest win so far? Fast reporting. “The thing we’ve got the most support and praise for is being able to deliver reports quickly,” Jessie said. The company realized early-on that it could deliver data to clients via Snowflake Shares instead of Tableau dashboards so the original, client-facing use case for reporting went away, but reporting has remained integral to Mastery’s success. “Training is a core part of our industry. Our sales people deliver training, and then they want to come back and see who is actually using what they just covered,” Jessie said. “Our internal reporting on product adoption has been really useful for them.”
Over the coming months, Jessie has a few things on her mind:
- Improving collaboration between the data and engineering teams: Specifically, she wants to create a process for better tracking on new features.
- Providing real-time insights to clients as data volumes grow: Mastery Logistics wants to provide clients with shipping statuses every 15 minutes. Hitting this goal will require the team to keep refining its current ETL capabilities.
- Maturing the testing program: Inspired by a dbt Meetup talk, Improving Data Reliability, Jessie has plans for a more robust testing program.
- Doing the fun work: With the infrastructure work largely taken care of, Jessie is excited to get down to “doing actual data science” like machine learning and optimization.