dbt and Databricks

Data Transformation for the Databricks Lakehouse

Enable your analytics team to collaboratively build, test, document, and deploy data transformation pipelines in the Databricks Lakehouse with dbt and Databricks SQL.

“[With dbt and Databricks,] everything lives in one place and it’s all access controlled. Having everyone in the same environment and accessing the same version of the same data, every time, is huge.”

Felippe Caso, Business Analytics Manager, Loft

Why dbt and Databricks

dbt works on top of your Lakehouse to provide analytics teams access to a central environment for collaborative data transformation. Now anyone on your data team who knows SQL can collaborate on end-to-end transformation workflows in Databricks.

Simplify architecture

Maintaining separate architecture for data analytics, data engineering, and/or ML workflows multiplies complexity and compounds cost. With dbt, Delta Lake, and Databricks SQL, your entire data org can now work out of the same platform, eliminating the need for redundant infrastructure.

A familiar analytics experience

dbt and Databricks SQL provide a SQL-based development experience familiar to your analytics team, so they can bring their workflow to Databricks without missing a step. Data engineers and data scientists can continue using their preferred tooling, powered by Delta Lake, in the same Lakehouse.

Reduce data bottlenecks

dbt empowers analytics teams to build and troubleshoot production-grade transformation pipelines on their own, within clear guardrails. This enables data engineers to work on higher leverage projects with confidence that architecture is secure.

Automate documentation and testing

dbt makes use of Apache Spark SQL commands to automatically populate documentation and depict data lineage, which is hosted in dbt Cloud and accessible to anyone in the organization. Pre-configured and custom tests within dbt help verify data assumptions and ensure broken code never makes it to production.

All the benefits of open source

dbt, Delta Lake, and Apache Spark are all open-source projects with broad community adoption and support. This ensures ongoing innovation, eliminates the risk of lock-in, and provides an invaluable source of learning opportunities for data practitioners.

The Analytics Engineering Workflow

With dbt, analytics teams work directly on the Lakehouse to produce trusted datasets for Business Intelligence use cases.

Get Started Today

Get started

Start a dbt Cloud free trial

Get started

Discuss

Join the conversation on dbt Slack

Explore the Databricks-Spark channel