Analytics Engineering for Everyone: Databricks in dbt Cloud
Databricks is now natively supported in dbt Cloud.
Last November, Fishtown Analytics announced that we were providing support for the dbt-spark adapter in dbt Cloud. After months of testing and refinement (much of it directly in collaboration with the Databricks team), I’m happy to share that support for Databricks and Apache Spark is now available to everyone using dbt Cloud.
Since we launched dbt Cloud just over two years ago, we haven’t added a new adapter to our original four. We plan on expanding that list significantly to provide our users more options, and this is just the start of that process. But why was Databricks first on our list of adapters to support in dbt Cloud?
Why start with Databricks?
We first started considering building this connector when we were approached by Databricks about a really exciting development on the Databricks Lakehouse Platform - Databricks SQL. Long a favored tool for data scientists, we were excited that Databricks was providing a new interface and highly performant connection method to meet the needs of the SQL-native analytics engineers who comprise much of the dbt user community. At the same time, we also see dbt on Databricks as a natural way to extend the analytics engineering perspective to the data engineers and data scientists who have been longtime Databricks users. As these two groups of users begin working with the same toolset, we see the opportunity for a powerful new way of working. Jeremy Cohen of our product team put it this way in a previous post:
Our belief is that some of the most important work happens between the traditional silos of data engineers and data analysts—the connective tissues of defining, testing, and documenting foundational data models. Databricks now offers a compelling and accessible interface for each of those two traditional personas. I believe that the real Databricks power users will be those who can make the most of both—and they’ll do it with dbt.
Databricks is a powerhouse of flexibility that enables the data engineer and data scientist to use almost any tool or approach she wants to process enormous lakes of data (structured, semi-structured, images/videos/pdfs/anything!) for analytics, data engineering and data sciences. With the launch of Databricks SQL, and the Lakehouse concept, data analysts also have the tools they need for BI workflows.
This SQL-first integration with Databricks means that analysts can build fully automated data pipelines with dbt, in the same integrated workspace that data engineers & data scientists work in their preferred frameworks like SparkML, scikit-learn, and even fully custom AI libraries.
When each group can work autonomously and collaboratively in the same space, without losing speed or efficiency, everyone wins.
The Databricks and Apache SparkTM connectors are now fully available to all dbt Cloud users! Learn more about using dbt on Databricks in the dbt product docs, or check out the Databricks blog for more information.
Last modified on: Nov 29, 2023