Table of Contents
- Start here
- Accessing raw data
- Data transformation
- Downstream use cases
- Building a data team
- Joining a data team
What is analytics engineering?
Originally published on 2019-10-16
Analytics engineers provide clean data sets to end users, modeling data in a way that empowers end users to answer their own questions. Here are the market trends that gave rise to the newest role on modern data teams.
A year ago, I was preparing a presentation for an event and the title slide asked me to fill in my role. I had been hired as a “Data Analyst”, and when I started the role, I spent my time doing normal data analyst things. I pulled data for finance and marketing, analyzed trends and generated insights, and spent lots of time in Excel and Looker.
But my role had been changing dramatically. Finance and marketing were able to run their own reports. So a normal day for me involved preparing data for analysis by writing transformation and testing code, and writing really good documentation. My tools were no longer Excel and Looker, they were iTerm, GitHub, and Atom.
Was I still a data analyst?
I left the slide blank for the moment, and just before the event, I filled in: “Claire Carroll – Data Something.”
Since then, the industry has begun to adopt a title for what I was attempting to describe – analytics engineer.
What is an analytics engineer? #
Analytics engineers provide clean data sets to end users, modeling data in a way that empowers end users to answer their own questions. While a data analyst spends their time analyzing data, an analytics engineer spends their time transforming, testing, deploying, and documenting data. Analytics engineers apply software engineering best practices like version control and continuous integration to the analytics code base.
When did analytics engineering become a thing? #
The traditional data team
If you were on a “traditional data team” pre 2012, your first data hire was probably a data engineer. You needed this person to build your infrastructure: extract data from the Postgres database and SaaS tools that ran your business, transform that data, and then load it into your data warehouse.
You would then hire a data analyst to build dashboards and reports on top of this data. Analysts, like me, would maintain a mess of SQL files with names like
monthly_revenue_final.sql, or maybe just bookmark their queries in a SQL web editor. Often we would need to supplement data in the warehouse with fancy Excel work.
The people consuming the data–CEOs, Marketing VPs, CFOs–would receive monthly reports, request special analysis as-needed, and send analysts a never-ending stream of requests to “segment by this” or “cut by that” or “hey, we’ve updated our definition of ‘account’”.
Being a data analyst was a hard and thankless job, and it didn’t have a ton of leverage. Because of this, it was often a junior role, one where you “did your time” and then moved on to something else.
What happened to the traditional data team?
Since 2012, there have been huge changes in the data tooling landscape:
- Cloud-based data warehouses (Redshift, followed by BigQuery and Snowflake) made data storage and processing affordable and fast.
- Data pipeline services (ex: Stitch, Fivetran) turned data extraction into work that took only a few clicks
- Business intelligence (BI) tools (ex: Looker, Mode, Chartio) increased ability for stakeholders to be self-service.
By 2016, it had never been easier to get data into a warehouse in a raw form, and for stakeholders to build reports on top of the data.
As data tools changed, so did the people who used them. People who weren’t on data teams began developing data literacy. This was good: business users wanted to self-serve and be data-driven. The downside was that these people often knew just enough SQL to be dangerous. If you’ve ever been to a meeting where two executives have different numbers for the same metric, you’ve experienced the result of this.
The solution: transform the raw data into a shape that’s ready for analytics. At the time, there were only two widely-used options:
- Looker’s Persistent Derived Tables
- Get a data engineer involved
The first was easy enough for anyone with SQL skills and a Looker license to manage, but created a host of maintenance issues. The second meant waiting in a data engineering queue that could take…a long time.
This is when dbt entered the market.
The modern data team
dbt is the transformation layer built for modern data warehousing and ingestion tools. Built around SQL, dbt puts the transformation layer firmly within the domain of data analysts.
Today, if you’re a “modern data team” your first data hire will be someone who ends up owning the entire data stack. This person can set up Stitch or Fivetran to start ingesting data, keep a data warehouse tidy, write complex data transformations in SQL using dbt, and build reports on top of a clean data layer in Looker, Mode, Redash, etc.
This job is neither data engineering, nor analysis. It’s somewhere in the middle, and it needed a new title. Starting in 2018, we and a few of our friends in the Locally Optimistic community started calling this role the analytics engineer.
Analytics engineers deliver well-defined, transformed, tested, documented, and code-reviewed data sets. Because of the high quality of this data and the associated documentation, business users are able to use BI tools to do their own analysis while getting reliable, consistent answers.
It turns out, your company can get pretty far with a single analytics engineer working as a data team of one supporting a whole business. But for those companies that need a larger data team, how does this team structure scale? Do you simply hire another analytics engineers? Or do you diversify?
In our experience, we see team members start to become more specialized, with roles that align more closely with those that we started with. Depending on your needs your next hire may be a data engineer, or a data analyst.
Here’s how I think about the different roles on modern data teams in larger organizations:
The lines between these roles are blurry – some analytics engineers might spend time doing analyst work like deep dives, while others might be comfortable writing production level Python code but realize doing so often isn’t the highest leverage use of their time.
The term “analytics engineer” is pretty new, and a lot of people doing analytics engineering work don’t go by this title (I didn’t a year ago!). So how do you know if you’re an analytics engineer?
On the surface, you can often spot an analytics engineer by the set of technologies they are using (dbt, Snowflake/BigQuery/Redshift, Stitch/Fivetran). But deeper down, you’ll notice they are fascinated by solving a different class of problems than the other members of the data team. Analytics engineers care about problems like:
- Is it possible to build a single table that allows us to answer this entire set of business questions?
- What is clearest possible naming convention for tables in our warehouse?
- What if I could be notified of a problem in the data before a business user finds a broken chart in Looker?
- What do analysts or other business users need to understand about this table to be able to quickly use it?
- How can I improve the quality of my data as its produced, rather than cleaning it downstream?
Where is this headed? #
At a recent NYC meetup where 100 data professionals gathered to talk about analytics engineering, one speaker compared the analytics engineer to a librarian—the person who curates an organization’s data and acts as a resource who wants to make use of it. I like this metaphor: the analytics engineer is a steward of organizational knowledge, not a researcher answering a specific question. The analytics engineer curates the catalog so that the researchers can do their work more effectively.
The tooling, the practice, and the organizational role of the analytics engineer are very much evolving in real time. This title didn’t exist a year ago. Today when we put this topic as the subject of a meetup we had over 100 attendees turn up, and we’re seeing more and more job postings for this title every month. So: there’s a ton of traction in the industry for this idea and this role, but we’re all very much figuring this out together in real time.
While I may not have had the right words to describe my role a year ago, I knew dozens of other individuals within the dbt Community whose roles aligned with mine, and who had incredibly intelligent opinions on the space. That’s why the dbt community is so valuable to me, personally, and to all of its members. All of us, together, are inventing a new thing.