Business intelligence reporting

This piece is adapted from Jeremy Cohen's presentation at a private dbt Learn training in May of 2020.

The work #

Building static reporting within your team’s BI tool of choice, which will generally be used by business users who live outside the data team.

There’s two streams of work involved:

  1. Technically constructing reports: creating charts, moving pixels around on the page
  2. Building trust in those reports: by communicating the work that’s been done at the transformation layer (data testing, source freshness reporting, documentation)

These two streams of work are equally important — if users don’t trust the underlying data, then reporting loses its utility.

Owned by #

Reporting builds are generally owned by data analysts (known as BI analysts on some teams).

Analytics engineers contribute by scoping out which elements (data joins, metric calculations) should live at the transformation layer (in dbt in our case) or within the data visualization tool.

Analytics engineers don’t deliver the answers to an analyst’s questions; they make all relevant data available in the best format for querying, so the business analyst can answer those questions on their own.

Downstream dependencies #

What’s downstream of reporting? Your business users’ trust!

Most businesses run on reports. Whether you think that’s good or not (I can confess to having written a blog post titled “Delete your KPI Dashboard”), it is the way of the world.

Reporting therefore has the most eyeballs on it, and the largest potential to ruin someone’s day when things break. Low-reliability reporting can crush a team’s faith in data.

Prerequisites #

The analyst and analytics engineer must build a shared understanding of what belongs in transformation code, and what belongs in reporting.

The boundary can be somewhat blurry, so deciding what transformations should be done where can be case-dependent.

Generally speaking, transformations in dbt should gear towards producing clean feature + dimension tables, with no risk of fanout for your BI tool user.

The BI layer should remain thin and flexible, and be ready to answer business questions with minimal additional work (limited joins or renaming of fields). To summarize this split:

dbt-appropriate processes BI-appropriate processes
- any critical business logic - on the fly aggregations
- best agreed on versions of a model - joins qualified by user input
- complex SQL - ad hoc queries
- multipurpose code - select-star SQL that feeds R, Python, JS (custom viz)
- version-controlled items  

The backstory: how this came to be #

If I were to ask you questions like, “what do you use your BI tools for” and “what should you use them for,” what would be your answer?

I can tell you what your answer should be — you should use BI tools for the things they’ve always been good at:

  • Presenting a flexible interface where users can point, click, filter, or drill down as needed
  • Making data accessible to less-technical users
  • Telling a well-scripted, beautiful data-driven story
  • Providing stunning visualizations right out of the box with little code work required
  • Offering bonus features such as dynamic user input, alerting, scheduling, and PDF/CSV generation

BI tools like Tableau, Metabase, Mode, Looker (and more!) offer a way to grab hold of data in different ways and illustrate its present impact.

Whether you’re looking at flight data, sales figures, consumer transactions, or biometric stats — with access to the correct sources and a well-trained user, BI tools make the data speak for itself.

But for all of their capabilities, there are many tasks we push them to do that they’re not so good at, which is where dbt comes in:

  • BI tools can’t do hard thinking about multilayered data transformation with complex order dependencies.
  • They cannot enforce single sources of truth (defining, documenting, centralizing business logic).
  • BI tools are unable to maintain separate data environments, such as those for testing or version control.
  • They’re poor at running multiple non-performant SQL scripts concurrently.
  • They don’t offer the ability to manage user access to sensitive data.

Harking back to the intro to the transformation section, dbt entered the modern data stack back in 2016 to specifically solve these problems.