Exploratory analysis

The Work #

In many cases, we have questions, but no idea how to answer them. We must spelunk and explore.

This is often the nebulous beginning phase of a data project, where requirements are being hammered out by various stakeholders (analysts, analytics engineers, data engineers and business users).

In this phase, it’s useful to get to a shared understanding as a group as quickly as possible.

Without that shared understanding — of what raw data currently exists, what form it might take, and how it might be transformed — someone’s bound to be let down.

This is where exploratory analysis tools, like notebooks and spreadsheets, really shine.

Notebooks allow multiple people to collaborate around:

  1. The structure of raw data.
  2. The structure of the final output of transformed data.
  3. Sketches or proposals for model structure to get from a to b.

They recreate interactively what each stakeholder once did individually.

Owned By #

All of us! Exploratory analysis lives in all of our tool belts, whether we’re on a data team or not.

If you’ve hacked up a spreadsheet before, you’ve done exploratory data analysis.

Exploratory analysis is the cafe table that we all gather around.

Downstream dependencies #

Exploratory analysis lives in the in-between stages. We do it before data transformation work begins to explore how we might model data.

We do it before building ML models or static reports to explore the shape of the data and its features.

Prerequisites #

Primarily curiosity: a question or set of questions that you’re looking to explore.

In terms of data, at the very least, you’ll need raw data and ideally staged (initially transformed, deduped etc.) datasets.

Notebooks in practice #

Notebooks supporting SQL as a first-class language is a relatively new phenomenon — generally, they were Python-centric, and therefore not accessible to analytics engineers, analysts, or business users.

Recently, notebook platforms like Hex and Querybook (from Pinterest) have emerged, offering the promise of a SQL-based notebook experience.

On the Fishtown Analytics team, we’ve recently started using Hex for internal data projects + professional services.

This section will be updated once we have strong opinions on how we use it internally!