In a formal sense, a Common Table Expression (CTE), is a temporary result set that can be used in a SQL query. You can use CTEs to break up complex queries into simpler blocks of code that can connect and build on each other. In a less formal, more human-sense, you can think of a CTE as a separate, smaller query within the larger query you’re building up. Creating a CTE is essentially like making a temporary view that you can access throughout the rest of the query you are writing.
There are two-types of CTEs: recursive and non-recursive. This glossary focuses on non-recursive CTEs.
Why you should care about CTEs
Have you ever read through a query and thought:
- “What does this part of the query do?”
- “What are all the sources referenced in this query? Why did I reference this dependency?”
- “My query is not producing the results I expect and I’m not sure which part of the query is causing that.”
These thoughts often arise when we’ve written SQL queries and models that utilize complex business logic, references and joins multiple upstream dependencies, and are not outputting expected results. In a nutshell, these thoughts can occur often when you’re trying to write data models!
How can you make these complexities in your code more digestible and usable? CTEs to the rescue!
CTE Syntax: How it works
To use CTEs, you begin by defining your first CTE using the WITH
statement followed by a SELECT
statement.
Let’s break down this example involving a rename_columns
CTE below:
with rename_columns as (
select
id as customer_id,
lower(first_name) as customer_first_name,
lower(last_name) as customer_last_initial
from {{ ref('raw_customers') }}
)
select * from rename_columns
In this query above, you first create a CTE called rename_columns
where you conduct a simple SELECT
statement that renames and lower cases some columns from a raw_customers
table/model. The final select * from rename_columns
selects all results from the rename_columns
CTE.
While you shouldn't always think of CTEs as having classical arguments like SQL functions, you’ve got to call the necessary inputs for CTEs something.
- CTE_EXPRESSION_NAME: This is the name of the CTE you can reference in other CTEs or SELECT statements. In our example,
rename_columns
is the CTE_EXPRESSION_NAME. If you are using multiple CTEs in one query, it’s important to note that each CTE_EXPRESSION_NAME must be unique. - CTE_QUERY: This is the
SELECT
statement whose result set is produced by the CTE. In our example above, theselect … from {{ ref('raw_customers') }}
is the CTE_QUERY. The CTE_QUERY is framed by parenthesis.
When to use CTEs
The primary motivation to implement CTEs in your code is to simplify the complexity of your queries and increase your code’s readability. There are other great benefits to using CTEs in your queries which we’ll outline below.
Simplification
When people talk about how CTEs can simplify your queries, they specifically mean how CTEs can help simplify the structure, readability, and debugging process of your code.
Establish Structure
In leveraging CTEs, you can break complex code into smaller segments, ultimately helping provide structure to your code. At dbt Labs, we often like to use the import, logical, and final structure for CTEs which creates a predictable and organized structure to your dbt models.
Easily identify dependencies
When you import all of your dependencies as CTEs in the beginning of your query/model, you can automatically see which models, tables, or views your model relies on.
Clearly label code blocks
Utilizing the CTE_EXPRESSION_NAME, you can title what your CTE is accomplishing. This provides greater insight into what each block of code is performing and can help contextualize why that code is needed. This is incredibly helpful for both the developer who writes the query and the future developer who may inherit it.
Test and debug more easily
When queries are long, involve multiple joins, and/or complex business logic, it can be hard to understand why your query is not outputting the result you expect. By breaking your query into CTEs, you can separately test that each CTE is working properly. Using the process of elimination of your CTEs, you can more easily identify the root cause.
Substitution for a view
Oftentimes you want to reference data in a query that could, or may have existed at one point, as a view. Instead of worrying about the view actually existing, you can leverage CTEs to create the temporary result you would want from the view.
Support reusability
Using CTEs, you can reference the same resulting set multiple times in one query without having to duplicate your work by referencing the CTE_EXPRESSION_NAME in your from statement.
CTE example
Time to dive into an example using CTEs! For this example, you'll be using the data from our jaffle_shop demo dbt project. In the jaffle_shop
, you have three tables: one for customers, orders, and payments.
In this query, you're creating three CTEs to ultimately allow you to segment buyers by how many times they’ve purchased.
with import_orders as (
select * from {{ ref('orders') }}
),
aggregate_orders as (
select
customer_id,
count(order_id) as count_orders
from import_orders
where status not in ('returned', 'return pending')
group by 1
),
segment_users as (
select
*,
case
when count_orders >= 3 then 'super_buyer'
when count_orders <3 and count_orders >= 2 then
'regular_buyer'
else 'single_buyer'
end as buyer_type
from aggregate_orders
)
select * from segment_users
Let’s break this query down a bit:
- In the first
import_orders
CTE, you are simply importing theorders
table which holds the data I’m interested in creating the customer segment off of. Note that this first CTE starts with aWITH
statement and no following CTEs begin with aWITH
statement. - The second
aggregate_orders
CTE utilizes theimport_orders
CTE to get a count of orders per user with a filter applied. - The last
segment_users
CTE builds off of theaggregate_orders
by selecting thecustomer_id
,count_orders
, and creating yourbuyer_type
segment. Note that the finalsegment_users
CTE does not have a comma after its closing parenthesis. - The final
select * from segment_users
statement simply selects all results from thesegment_users
CTE.
Your results from running this query look a little like this:
CTE vs Subquery
A subquery is a nested query that can oftentimes be used in place of a CTE. Subqueries have different syntax than CTEs, but often have similar use cases. This content won’t go too deep into subqueries here, but it'll highlight some of the main differences between CTEs and subqueries below.
Data warehouse support for CTEs
CTEs are likely to be supported across most, if not all, modern data warehouses. Please use this table to see more information about using CTEs in your specific data warehouse.
Conclusion
CTEs are essentially temporary views that can be used throughout a query. They are a great way to give your SQL more structure and readability, and offer simplified ways to debug your code. You can leverage appropriately named CTEs to easily identify upstream dependencies and code functionality. CTEs also support recursiveness and reusability in the same query. Overall, CTEs can be an effective way to level-up your SQL to be more organized and understandable.
Further Reading
If you’re interested in reading more about CTE best practices, check out some of our favorite content around model refactoring and style:
Want to know why dbt Labs loves CTEs? Check out the following pieces:
Last modified on: Nov 27, 2024
Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.