Databricks + dbt

Close the gap between data analysts and data scientists with the dbt Cloud <> Databricks Spark integration. This integration enables data analysts to build, test, and deploy data transformations for structured and unstructured data sets within a single unified analytics platform.

Why Databricks + dbt

  1. Unify Teams & Tech: Maintaining separate data workflows multiplies infrastructure and divides teams. Databricks + dbt enables analysts to model complete datasets from the same platform trusted by their data science counterparts.

  2. Go Big: Machine Learning and AI datasets can be extremely large, making them difficult to clean and query in a consistent manner. By combining Databricks and dbt Cloud, organizations can apply analytic best practices like version control, testing, scheduling, and documentation without sacrificing speed or reliability.

  3. A growing ecosystem of dbt packages: Fivetran’s dbt packages provide an additional layer of transformation including data standardization, metric aggregation, and key restructuring. This removes one more step in the process from data extraction to data analysis.

dbt and Databricks are a great couple. Spark has the power I need to process ridiculous volumes of data while dbt helps structure pipelines using software development best practices. Together they improve data quality and confidence in a way that’s much more accessible for data analysts everywhere.

Fokko Driesprong
Principal Code Connoisseur

Case Studies