Data warehouse design with dbt 0.10.2
We’re thrilled to announce a new release of dbt, version 0.10.2. This release is packed with bug fixes, improved support for BigQuery, and more. Check out the changelog for a full list of changes in this release. Besides these incremental updates, version 0.10.2 introduces a powerful mechanism for designing data warehouses: model aliasing.
Designing a Data Warehouse
dbt users are tasked with building out data models for the data consumers at their organizations. Crucially, these models are only useful if the downstream consumers know where to find them and how to use them. While comprehensive documentation is a big part of this equation (more to come on that soon!), effective naming practices can make the process of exploration in a warehouse both approachable and intuitive.
Because of the rigid structure of modern warehouses, dbt users really only have control over two labels for each of their models:
- the schemas they’re built into
- The names of the tables themselves
While dbt has supported custom schemas since version 0.9.0, it was previously not possible to specify the database identifier for a model independently from the model’s filename. While a model filename like snowplow_sessions_with_user_stitching_xf.sql might be totally appropriate and precise for the dbt users on your team, it’s probably not what downstream consumers expect (or want!) to see. To this end, dbt v0.10.2 introduces model aliasing.
To alias a model, simply set a value for the alias configuration option for the model. That’s it! Let’s take a look at an example:
Here, we alias a long and unwieldy model filename to a simpler and more accessible database identifier: sessions. With this config change, the model above will be built into a table called “clickstream”.”sessions”. Further, other models like page_views and link_clicks can be put into the same clickstream schema. These clean model aliases are ideal for consumers that don’t (and shouldn’t need to) know about all of the intricacies of web event sessionization, user stitching, and so on.
The alias configuration changes the name of the identifier in the database, but it does not change the way that the model is referenced inside of the dbt project. In a separate dbt model, we can still reference the model above using its fully-qualified filename. That might look like:
In the abandoned_carts model, it’s obvious that one of the inputs is a user-stitched Snowplow sessions model. While a downstream data consumer might prefer to see clickstream.sessions, a dbt user needs to know that this model uses the user-stitched Snowplow sessions model at a glance. With alias, both dbt users and downstream consumers of the data can operate at the level of abstraction that’s right for them.
Interested in using the new alias functionality in dbt? Check out v0.10.2! You can find release notes here and documentation for alias here. Finally, thanks to abelsonlive for proposing and initially implementing the alias feature, along with jon-rtr for his help with getting this code into dbt! Nice work!
Last modified on: Sep 25, 2023