Building a data stack for a start-up
The dbt Live: Expert Series features Solution Architects from dbt Labs stepping through technical demos and taking audience questions on how to use dbt. Catch up with recaps of past sessions, and sign up to join future events live!
Afzal Jasani, Solutions Architect at dbt Labs, dedicated his session to a short tour of the data stack he built for a business close to home.
Video replay and overview
Business requirements & scope
Afzal’s fiancee, the owner of an e-commerce company offering luxury Indian apparel, sought his help to set up a data stack to enable analyses on data sourced from:
- Shopify: customer and order information
- Google Analytics: website traffic
- Google Search Console: search performance
- Klaviyo: email marketing data
Given the scale of his partner’s business, and the (one-person) size of the team that would maintain its data infrastructure, Azfal chose tools that could meet his top requirements:
- Easy to deploy and use
- Flexible (with no vendor lock-in, so he could switch out pieces of the stack if business needs changed)
He favored open-source solutions, and tools with which he had some prior experience (and could deploy with relative ease and speed).
Data stack components
With priorities in line, Afzal selected:
- Hevo for data ingestion, as it offered a free tier that could accommodate the volume and cadence of data expected
- Postgres AWS/RDS for data warehousing, due to its month-to-month cost model and lack of vendor lock-in
- dbt for transformation, chosen for its Git/CI support and Afzal’s incumbent experience
- Metabase (Open Source) for business intelligence, preferred for having a self-hosted deployment option with robust documentation
Managing the stack
Post-deployment, Afzal turned to features and routines that would ease day-to-day stack management, automate as much as possible, and enable new team members to onboard to the data stack more easily down the line.
He used dbt packages and macros to expedite his dbt model development process, leaning on:
codegenpackage, to autogenerate base model code and YAML files.
dbt_utils, to join a date column to daily data aggregations from sources like Google Analytics.
In his dbt project, he added:
- snapshots to help him monitor data changes over time
- exposures to represent the Metabase dashboards that rely on data transformed in the project.
- documentation to enable others to look up a dashboard by name and understand what data is feeding it under the hood.
He enabled alerts for each tool, to send notifications via Slack or email if certain error conditions (too many database connections, or a failed job) were triggered in any layer of the stack.
He set up role-based access controls in each tool, with an eye towards a future state of Kynah: when it’s time to onboard others to its data architecture.
Last modified on: Nov 29, 2023