Our favorite SQL functions

We wouldn’t be a group of analytics engineers and analysts without our own little rolodex of SQL functions.

The SQL tooth fairies #

As far as we know, there’s only really one way to have clean teeth: using a toothbrush and some toothpaste. Just as you (hopefully) brush your teeth everyday, we also have our own toothbrush and toothpaste we use almost every time we’re data modeling to help us clean up our data.

  • COALESCE: This handy gem will help you fill up unwanted nulls and blanks in your data by replacing them with fields or values you explicitly specify. Just make sure you’re not overwriting fields that actually matter…
  • LOWER: One of the more chaotic things in the data world is the fact that there is no official standard for the casing of string values. Luckily, the lower function allows you to lowercase string values in your data. Use it generously to create a standardized and expected format for string values in your data warehouse.

The most loyal SQL date functions #

Can you even call yourself a data person if you’ve never Googled “date diff in big query”? Date functions where you truncate, add, and manipulate date values are unsurprisingly super important and we too can’t live without them.

  • DATEADD: Let’s just say it: you’re going to find yourself needing to subtract and add time intervals to dates and timestamps in SQL. Your overall use cases might differ, but we can guarantee there’s a dateadd somewhere in your dbt project. Like a few of the SQL date functions, the syntax for dateadd is somewhat data warehouse-specific. Snowflake supports more granular time intervals, Google BigQuery doesn’t, and Postgres doesn’t support a dateadd function at all! With dbt v1.2, dbt now supports out-of-the-box support for the dateadd macro which will allow you to to use the dateadd function without having to worry about sticky syntax.
  • DATEDIFF: We thought there was no better way to get the difference between two dates or timestamps in SQL than using the datediff function. We thought wrong. Just as dateadd got the dbt v1.2 treatment, you can use a handy dbt datediff macro to help you avoid the syntax differences for the datediff function across different data warehouses.
  • DATE_TRUNC: A classic date function that helps us look at data at an aggregated-level, the date_trunc function will truncate a date or time to the first instance for a given date part. Similar to dateadd and datediff, date_trunc can have varied syntax across different data warehouses. dbt v1.2 also now provides a date_trunc macro to help you use the date_trunc function without having to worry about minute syntax differences.
  • EXTRACT: Ah, the fraternal twin of date_trunc. Unlike the date_trunc function whose output will retain a date format, the extract function will literally extract out the specific date part from a date/time. This function is useful for when we need to get the numeric values of dates, such as the specific week of the year, a date took place.

The dbt wild cards #

We guess we’re cheating here a bit, but there’s a handful of incredibly useful macros and generators that fall outside our typical understanding of SQL functions that help us write more DRY and idempotent code. For us, dbt utilsstar, union_relations, and surrogate_key macros are some of the most helpful ones.

  • STAR: A true MVP of dbt utils in many ways. The star macro generates a comma-separated list of all fields that exist in a from relation and will exclude any fields that you explicitly specify. When you need to select more columns than you would like to type out, skip the carpal tunnel and level up your data models with the star macro.
  • UNION_RELATIONS: We can’t live without a UNION ALL. We additionally can’t escape the inevitable pain of unioning tables that have mismatched or missing columns. Thankfully, dbt utils supports a union_relations macro that helps us union tables together without having to worry about propagating null or 0 values to mismatched columns. We beg you to stop writing endless null as column_name rows of code and replace them with an incredibly useful dbt utils macro.
  • SURROGATE_KEY: We’re really not sure if there’s anything more idempotent in analytics engineering than surrogate keys, a type of primary key that is derived from hashing fields that create a uniqueness constraint on the data. The surrogate_key macro allows you to quickly create surrogate keys without having to worry about null values or finicky concatenation syntax. We use the surrogate_key macro almost countless times throughout our internal dbt project, and we’re honestly not sure how we would get on without it.

There you have it—some of our favorite SQL functions, generators, and macros to help us write DRYer and smarter dbt models. Want to feel like a SQL god? Check out dbt Learn and discover dbt’s power!

Last modified on: Jul 27, 2022

Join us for Coalesce 2022

The annual conference dedicated to advancing analytics engineering