Maturing as an analyst alongside the ADLC
In the recent post about the Analytics Development Lifecycle (ADLC), we outlined a framework for evaluating the maturity of data workflows. We broke the process down with guiding principles across eight stages of the ADLC.
This subject matter is vast, and we have only started to scratch the surface in aligning on what maturity means across these phases. This model is intended to serve as a springboard for debate, discussion, and deep reflection.
Today, we reflect on what it means for us analysts in particular. Per the ADLC, a mature analytics workflow must manifest features such as accessibility, velocity, and correctness. What are the table stakes in our analyses that ensure this? How do we keep our exploration nimble but our output mature? How do we ourselves evolve as professionals alongside our systems? Tristan states that in the ADLC:
“There is no ‘right’ or ‘wrong’ way to conduct exploratory data analysis. Rather, it specifies a set of requirements that all users should have”.
Here we share some of the requirements we've found useful in our own work as analysts at dbt Labs.
Why the analysis phase is challenging
If you have ever worked as an analyst or with one, you have likely (and intimately) felt the friction in building toward the following characteristics.
Accessibility
Analysis can be extremely tough to communicate at the right level with the right context to the right people. Things get missed, glossed over, or misconstrued. On top of that, code complexity (or let’s be honest, plain bad code) can also outweigh any benefit of building further on the work.
Velocity
Long-term, dashboard overhead continues to bloat (and bloat!). Time is spent reactively, either fixing charts or trying to answer the "this looks wrong" Slack message, instead of pushing our organizational knowledge forward. Given the interconnected nature of the ADLC, short-term velocity gains from tech debt upstream (e.g. a feature release without proper tracking) can often hit analyst velocity too (additional time spent on logic workarounds).
Correctness
The cost of those maintenance and upstream data quality issues hurt not only velocity but also correctness. Much of the analyst's time is spent validating their work, often at odds with velocity. Do we move quickly or do we move accurately?
Here's what we do about it
In order to mature our analyses, our team continuously leverages the best practices checklist below. This isn't, in any way, exhaustive. However, we believe that when conducting an analysis, data practitioners should be able to answer the below seven questions. This checklist helps us maintain focus on the right things, making our work easier to leverage (accessibility), faster to turn around (velocity), and less prone to error (correctness). And those are some of the hallmarks of a mature analyst.
1. Do we understand the question behind the question?
It feels natural to take a question at face value. We’ve learned the hard way that a question can be very subjective, often meaning something different to the asker than to the listener. Say a PM asks for a deep dive on feature X retention. Maybe this is tied to a very specific retention definition in a company OKR, or maybe they're just more broadly trying to get at feature value or a customer segment. What is at the actual heart of it, what do they want to achieve? Are we aligned on how it'll be used to make decisions? As our head of data recently wrote about, always “start with the why”.
2. Is that question the right starting point?
The stakeholder is ready to go, and we know exactly what they want. But wait—are we positive that makes sense? We need to ensure we're confident in the landscape surrounding the ask. We have a different perspective than our stakeholders, and it never hurts to check their assumptions. This is a critical step. This may feel obvious to a seasoned analyst who does this implicitly, but explicitly making this part of your process ensures consistency across the team.
What if that feature X was only rolled out to new users so far? In that case, we may want to take a step back and examine the broader funnel first. Ignoring the behaviors of existing users could skew overall retention takeaways.
3. What small steps can we take to break down complexity?
Fight against bloat. If it sounds urgent, if it sounds complex, or if it adds to maintenance overhead, we need to make sure it is fully understood upfront. If not, rescope and reprioritize. Are we sure it can’t be addressed by existing work? What exactly needs to happen when?
Humans are wrong all the time, whether it be on priority, premises, reasons, or solutions. Over decades of analytics work, we have learned it never hurts to pause, simplify, and then iterate. The more we can break down big work into small steps, the less likely we are to spin cycles accidentally going in the wrong direction.
A mature ADLC does not mean perfectionism. For example, maybe a quick correlation is a good sanity check before a full-fledged mixed effects model. Maybe ensure that the PM is monitoring the top metric they requested before building an entire dashboard. Our backlog is our friend. Don’t ignore all the asks and tangents, but prune and prioritize them ruthlessly.
4. Are we explaining our work enough?
Right now, our lovely new analysis is fresh in our heads. Excessive code comments may feel…excessive. But will they feel that way when we have to update this work a year from now? What about if other people want to borrow our logic? We must be clear in our assumptions and our approach as we write our code. We must write out why we are doing things that may not be obvious to others. We can’t always assume a shared perspective. If the work isn't maintained in an ongoing manner or if it’s just scratch work, it never hurts to make that explicit. It's easy for us to stumble across work that looks like the logic we want, but it can be hard to know if we should trust it.
5. Have we established sufficient confidence in our findings?
Do we feel confident that we can trust these results? Any survivorship biases we need to reflect on? Do shared averages break down in time or main subgroups in the way we’d expect? Do shared percentages have sufficient volume to back them up as trends?
If there are known limitations to our confidence, make sure that's conveyed. It can be very useful to regularly communicate if this is a high, medium, or low confidence analysis; this is critical for building trust. Alongside that, flag if this isn’t maintained so the company knows how long-lived that confidence may be.
6. Are we communicating the takeaways in an optimal way?
For sharing our work, we need to make it visible and make it easy. Some guidelines that we like to keep in mind here (if you want to go even deeper, check out our past work):
- We go where our stakeholders are. This often means not our BI tool but rather Notion, Slack, etc.
- Before we spin up yet another dashboard (YAD ™️), are we sure there isn’t an existing one that'd make more sense to add to? Fight against that sprawl.
- We try to communicate regularly, even if the status update is a boring “still working on it”. We also often boost the end results more than once (e.g. in Slack, plus in our meetings, plus in our data newsletter). A million things are happening each day, and people get busy. Ensure visibility.
- We get to the point. Rather than just sharing a link or a novel, we remind ourselves to frame everything with a highlighted “why is it important, and here’s what to do with it” upfront. No one else is thinking about the nuances of our work as much as we are. That being said, for those who want them, we have details at the ready in links and dropdowns. This makes it more efficient for our varied audience to get the varying levels of detail that they need.
7. Can we leave the campground cleaner than we found it?
Keeping scope creep in mind, what did we learn from our project? Is there anything worth converting into a win around scale or efficiency? Any small tweaks we should make in favor of our vision of accessibility, velocity, and correctness? For example, were we surprised there wasn’t documentation on something? Is it easy to add this back in, since we now have learned it? Did we use a cool approach or learn something insightful that’s worth sharing with our team?
Developing people and processes alongside systems
The ADLC framework outlines in very broad strokes a path towards data maturity. We’ve already started to dig into what that means in terms of pitfalls and focus areas. Here, we dig further into the analysis phase, focusing on a few key aspects of maturity (such as accessibility, velocity, and correctness) as they relate to analysis workflows and thought processes. More to come as we introspect further on evolving in the analysis phase of the ADLC, and you can always catch us in the dbt Community Slack to discuss. We’re all on this journey together.
Last modified on: Nov 20, 2024
Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.