Answering community questions

We welcome in-depth discussion in our Slack workspace.

As our community grows, we appreciate your help in answering questions! That being said, we want to make sure we continue to build goodwill in the community, and want to encourage a certain level of quality.

In general, if a user has put effort into asking their question, pay it forward by putting effort into providing a good answer. In our opinion, when writing an answer you should:

  • Demonstrate empathy for a wide range of experiences
  • Solve the actual problem, which may not be the same as the problem a user is asking about
  • Explain the why
  • Take the opportunity to teach

Demonstrate empathy #

Many new dbt users come from a background where software engineering practices may be new to them. Some of our users come from a data engineering background, and may be more familiar with an ETL paradigm (as opposed to ELT).

When we answer questions on Slack, we should consider the experience of the user, and adjust our answers accordingly (see: curse of knowledge).

Help solve the actual problem #

Sometimes users ask for help because they are trying to work against dbt, and while there’s normally a reasonable workaround, that may just be a bandaid for a bigger problem. If something feels off with the way someone is using dbt, you should ask what they are trying to achieve.

Along these same lines, if you’re not sure of the answer, it might be best to let someone else provide an answer rather than leading someone down the wrong path! Of course, some problems end up being thornier than they appear on the surface, so it’s ok if you find yourself in too deep. Just let the user know that you’re not the best person to answer their question.

Explain the why #

Sometimes users ask how to do something with dbt that is simply unsupported. If that’s the case, try to add context to your answer rather than just saying “you can’t do that”.

If we’ve chosen not to design dbt to do that, explain why this design decision was made. If it’s something we want to change in the future, share a link to a GitHub issue or if one doesn’t exist, ask the user to create a new GitHub issue. Alternatively, provide a workaround if one exists!

Take the opportunity to teach #

If someone is asking a simple troubleshooting question, still take the time to explain why that’s the solution along with the snippet of code — hopefully in the future they will be able to avoid similar problems, or someone else experiencing a similar, but different, problem will be able to solve their problem too!

Where possible, link to documentation so that users can understand where to find answers to these questions independently in the future.

Examples of great answers #

How do I pass a string to a macro? #

The question

I’m using a macro like so:

{% macro cents_to_dollars(column_name, precision=2) %}
    ({{ column_name }} / 100)::numeric(16, {{ precision }})
{% endmacro %}

But when I call it in my model:

select
  id as payment_id,
  {{ cents_to_dollars(amount) }} as amount_usd,
  ...
from app_data.payments

I get back this error:

Completed with 1 error and 0 warnings:

Database Error in model stg_payments (models/stg_payments.sql)
  001003 (42000): SQL compilation error:
  syntax error line 10 at position 6 unexpected '/'.
  syntax error line 10 at position 12 unexpected '::'.
  compiled SQL at target/run/jaffle_shop/models/stg_payments.sql

Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

An unhelpful answer

For an experienced dbt user, it’s tempting answer:

Add quotes! {{ cents_to_dollars('amount') }}

And leave it at that.

A good answer This answer answer instead:

  • Provides the solution
  • Explains why they are getting back a Database Error, and how to dig deeper into it (empowering the user to debug future problems)
  • Explains why you need to quote the string

To fix this, you need to quote your argument, like so:

{{ cents_to_dollars('amount') }} as amount_usd,

The quotes tell the Jinja parser to use amount as a string in the macro. Without the quotes, the Jinja parser is looking for a variable named amount to use instead. Since no variable named amount exists, your macro is getting compiled to the following:

select
  id as payment_id,
  ( / 100)::numeric(16, 2) as amount_usd,

You’ll be able to see this SQL if you open the file listed in the error message (target/run/jaffle_shop/models/stg_payments.sql).

Since the above is invalid SQL, when dbt tried to run the query, Snowflake gave back an error that dbt passes on. This is why we see a Database Error (docs) even though it’s a Jinja problem.

How do I create temporary tables with dbt? #

The question

I can’t find a way to create temporary tables in dbt. Using an ephemeral model that gets injected as a CTE slows down my run significantly compared to using a temporary table, so I worked around this by creating a temporary table using a pre-hook, but then I don’t see it in the lineage graph.

A good answer

The following answer:

  • Demonstrates empathy for a wide range of experience by taking the time to explain what temporary tables are
  • Explains why temporary tables aren’t supported in dbt
  • Attempts to understand the actual problem, i.e. that a table is more performant that using a CTE
  • Empowers the user to make a decision for their use case by listing the pros and cons, and links the docs so that they can answer any additional questions.

You’re right in thinking that dbt does not support temporary tables. That’s because temporary tables only persist in a single session, and dbt opens one connection/session per thread. Therefore any temporary tables created on one thread would not be visible to a model running on a different thread.

It sounds like CTEs are a performance drag for you though — out of interest, which warehouse are you using?

You’ve identified two workarounds, and there’s another one worth discussing:

Option 1: Materialize your model as CTEs using the ephemeral materialization (docs)

Pros:

  • The models show up in the lineage graph
  • You can re-use these transformations in multiple downstream models by ref-ing them
  • You can test and document these models

Cons:

  • At some point there is a performance degradation with too many stacked CTEs (especially on older versions of postgres, where CTEs are an optimization fence)
  • Compiled SQL can be harder to debug

Option 2: Use pre-hooks to create temp tables

I would generally recommend against this — you can’t test or document your models, and they won’t be in the lineage graph (as you’ve noted).

Option 3: Materialize these models as tables in a separate schema, and drop the schema at the end of a run

  • Use the schema config to materialize a model in a separate schema
{{ config(
  materialized='table',
  schema='my_temporary_schema'
) }}
  • Then, at the end of a run, use an on-run-end hook (docs) to drop that schema — in your dbt_project.yml:
on-run-end: "drop schema my_temporary_schema cascade"

Pros:

  • All the benefits of Option 1 (lineage, testing, reusable)
  • Sounds like it might be more performant than using CTEs

Cons:

  • Make sure you don’t have any dependent views that select from that schema! They might get dropped when you run a drop cascade command! This introduces fragility into your project!