The dbt-databricks adapter delivered
December 6th, 2021 was a great day for users of dbt and Databricks. Not only was it Day 1 of Coalesce 2021, our industry conference, but it was also the day that Databricks released their dedicated dbt-databricks adapter for use with dbt Core. The Databricks team, in collaboration with dbt Labs, built on top of the foundation that the dbt Labs’ dbt-spark adapter provided, and they added some critical improvements. The new adapter is:
- easier to set up,
- has better defaults for developing on Databricks, and most importantly
- supports Unity Catalog, compatible with dbt starting with version 1.1.
With a ceremonious commit by Databricks co-founder and CEO Ali Ghosdi, the new adapter was ready to use by the dbt community.
For those unfamiliar, adapters are purpose-built to allow dbt to connect to and run dbt against a specific warehouse, data lake, or query engine, and they’re a critical component to the overall workflow.
Announcing the dbt-databricks adapter availability in dbt Cloud
We’ve been excited about the dbt-databricks adapter for awhile, and we’re even more excited to announce it’s now available in dbt Cloud!
dbt Cloud is the easiest and most reliable way to develop and deploy a dbt project. It helps remove complexity all while delivering more power, so we’re thrilled to now offer a simpler Databricks connection experience with support for Databricks’ Unity Catalog and better modeling defaults.
This is great news for the hundreds of Databricks customers already using dbt Cloud with the dbt-spark adapter, as they can now migrate their connection to the dbt-databricks adapter to unlock the benefits. The Databricks team is committed to maintaining and improving the adapter over time, so you can be sure the integrated experience will provide the best of dbt and the best of Databricks.
dbt-databricks is compatible with the following versions of dbt Core in dbt Cloud with varying degrees of functionality.
|dbt Version||Available features|
|1.3 (all)||dbt-databricks with easier set up, better defaults, Unity Catalog, and support for Python models|
|1.2 (all)||dbt-databricks with easier set up, better defaults, and Unity Catalog|
|1.1 (all)||dbt-databricks with easier set up, better defaults, and Unity Catalog|
|1.0 (all)||dbt-databricks with easier set up and better defaults|
To ease the migration process, and for the time being, dbt Cloud will be able to connect to Databricks by using both dbt-spark and dbt-databricks. We will eventually retire connecting to Databricks via dbt-spark. We encourage all projects to be created on / migrated to dbt-databricks. Connecting to Spark with dbt-spark in dbt Cloud will not be deprecated.
The dbt-databricks advantage
So why are we so excited about dbt-databricks? And how does it compare to using the dbt-spark adapter that is already available in dbt Cloud? The benefits include:
1 - Easier set up
Connecting to Databricks has never been simpler. You only need to enter:
- the server hostname of the Databricks workspace
- the HTTP path of the Databricks SQL warehouse or cluster
- an appropriate credential
This is significantly streamlined compared to using the dbt-spark adapter to connect to Databricks.
2 - Better defaults
The dbt-databricks adapter has better defaults and is more opinionated, guiding users to an improved experience with less effort. Design choices of the dbt-databricks adapter include:
- defaulting to Delta format
mergefor incremental models
- running expensive queries, like unique key generation, with Photon
3 - Support for Unity Catalog
Unity Catalog allows Databricks users to centrally manage all data assets, simplifying access management and improving search and query performance. Databricks users can now get three-part data hierarchies – catalog, schema, model name – which solves a longstanding friction point in data organization and governance.
A trusted adapter experience in dbt Cloud
There are dozens of open source dbt adapters that enable dbt Core users to connect to different warehouses. Some of the adapters are maintained by the dbt Labs team and are available in dbt Cloud; others are maintained by vendor partners or by good samaritans in the community interested in sharing their work with others. As you can imagine, there’s a range of quality of open source dbt adapters, and unfortunately, we haven’t been able to validate if they all deliver a reliable experience.
Until now, all adapters available in dbt Cloud were built by the team at dbt Labs, and today, we we’re taking the first step toward opening up our available adapters in dbt Cloud to include partner-maintained adapters that have been verified by dbt Labs, starting with this one from the Databricks team. We’ve spent time ensuring this new adapter meets our bar as a trusted first class experience on dbt and Databricks, and we expect to work closely with partners to have more adapters meet a similar bar in the future too. This is subtly a big step forward for what customers can expect with dbt Cloud. There’s tremendous interest from data platforms and dbt Cloud customers to expand the menu of adapters in dbt Cloud, and in the coming months, we will deliver more choice while providing the same assurances you have today.
Last modified on: Nov 17, 2022