Table of Contents

  1. Data transformation
  2. Data testing
  3. Implementations + deployment
  4. Documentation + metadata
  5. The modern data stack
  6. Data dream teams

Human in the loop data processing

Anna Bladey is an Applied Data Engineer at Civis Analytics, where they build data pipelines and database architectures for public sector clients. Previously they worked as an Applied Data Scientist and earned a Master of Data Analytics from the University of Chicago.

Originally presented on 2020-12-12

What do you do when data is too messy to be useful, but too large for manual cleaning? In this talk, Bladey will share their tips for implementing 'human in the loop' data processing — focusing manual efforts on the messiest data. When their team implemented this approach, a data cleaning task that used to take two months was reduced down to two weeks.

Browse this talk’s Slack archives #

The day-of-talk conversation is archived here in dbt Community Slack.

Not a member of the dbt Community yet? You can join here to view the Coalesce chat archives.