At Fishtown Analytics we believe that analysts should function more like software engineers — building and maintaining code that can be used across dbt projects in packages is one way we enable this.
Source data from third parties typically comes through an ETL tool like Fivetran or Stitch in the same structure for all companies. This means that once one analyst has done the work to model the data for that source, that code can (and should!) be shared with other analysts.
TL;DR: want to go from syncing new data to having models built in minutes? Packages are going to be your friend.
A Walkthrough: Installing the Zendesk Package
The below walks through installing the Zendesk package into an existing dbt project. (If you don’t have a dbt project up and running yet, check out the docs.)
Once you have connected Zendesk in your ETL tool of choice, allow some time for the data to replicate (maybe a day in case there is a lot of data). When you see the data in your data warehouse, you can implement the package.
Configuring your dbt_project.yml file
- Open your dbt project in your text editor and go to the
dbt_project.yml
file. - In the Github repository for the package you are looking to install,
click on
dbt_project.yml
(here’s the one for Zendesk). - Copy the “Zendesk” model information from the bottom section. If you
already a have
models
specified in your project, you do not need to include it again, simply add the Zendesk section and below rows (highlighted in red below). Note: the spacing has to match with existing fields in yourdbt_project.yml
file. The YAML file format is very sensitive to this so if you get any errors after implementing the below step, check spacing!
- What the “vars” or variables are asking for is where to look in your
data warehouse to grab the relevant data from. You should remove the
# schema.table
placeholder and fill in with your tables (e.g.zendesk.organizations
).
Pulling from the package repository
- Now that you have told the package where to get the data, you need to tell your dbt project where to pull the models from to build the package.
- For dbt versions 0.10.0 and after: Add a
packages.yml
file to your project. Use the format listed below to past the https URL from the package repository and release or branch listed as the revision.
- For dbt versions prior to 0.10.0: Add a “repositories” section at
the bottom of your
dbt_project.yml
. Beneath it, copy and paste the https URL from the package repository. For our example, that’s https://github.com/fishtown-analytics/zendesk.git.
- You may notice that the above images have either
revision: 0.1.0
or “@0.1.0” appended to the url we copied from the package—this references a specific release of the package. As package authors make changes and improvements to a package, it’s generally better for you to manually upgrade to these new versions instead of upgrading automatically. New versions could introduce incompatibilities with the rest of your project, so testing these upgrades by hand is recommended. - To figure out what the latest version is when you install, click on “release”.
- This will bring you to a list of all releases. In Zendesk’s case, there is currently only one release but if there were multiple, the latest release will be listed at the top of the page
dbt deps
Ok — so now your project knows where to look for the models and the models know what tables to look to build from. Now you need to actually download the code so that dbt can run it.
- From the command line, run
dbt deps
. This will go fetch the code from the URL you specified. - Then do a
dbt run
to build the models.
FAQs
Can I have multiple packages in my project?
Yes! Just follow the above steps and add addition URLs.
Wait — I’m not seeing the models in my project where they normally would be…
All downloaded packages show up in the a gitignored folder called
dbt_modules
.
Can I make changes to the models for customizations?
You can’t directly customize the code within your dbt_modules
folder,
because the next time you run dbt deps
all of your changes will be
overwritten. But it’s very common to want to modify or build on top of
code in a package! Here are some options for how to do that:
- Build on top of the models in the package. Everything from the
package you referenced is available in your graph, so just call
ref()
and start building like you would on top of any other model! - Override a model in the package with one that you build yourself.
Create a model in your local project with the same name as one from
the package, then set
enabled: false
for the model in the package. - Best of all: submit a PR to the repository to get your change added in to the package!
How do I know when updates have come out to a package I use and how do I upgrade?
You should “watch” any package repository that you add so you are updated from Github when new releases come out (there is a red box around the “watch” button in the above image). When a new update comes out, update the “@0.1.0” pin to the newest version and read the release notes to understand what has been changed (maybe it isn’t something you want!).
Last modified on: Oct 11, 2022