dbt
Blog Why compilers matter

Why compilers matter

This post first appeared in The Analytics Engineering Roundup.

Tristan Handy dives deep into the world of compilers in this episode of The Analytics Engineering Podcast with Lukas Schulte, cofounder of SDF Labs (not to be confused with last episode’s guest—Lukas’ dad and fellow SDF cofounder Wolfram Schulte). Tristan and Lukas discuss what compilers are, how they work, and what they mean for the data ecosystem. SDF, which was recently acquired by dbt Labs, builds a world-class SQL compiler aimed at abstracting away the complexity of warehouse-specific SQL.

The conversation covers the evolution of compiler technology, what software engineering has gotten right over the past several decades, and why the data ecosystem is poised for similar transformation. Lucas and Tristan explore why SQL has lagged behind other programming ecosystems, and how new compiler infrastructure could lead to package management, interoperability, and greater innovation across data platforms. It’s a fascinating (and timely) episode: Get ready for the new dbt engine.

Please reach out at podcast@dbtlabs.com for questions, comments, and guest suggestions.

Listen & subscribe from:

Chapters

  • 02:40 The vision behind SDF Labs
  • 04:00 What is a compiler?
  • 05:00 Components of a compiler: frontend, IR, backend
  • 08:00 Syntax vs. semantics and the role of parsing
  • 10:00 Logical vs. physical plans in SQL compilers
  • 13:00 Historical context: mainframes to LLVM
  • 16:00 Cross-architecture portability in Rust & other compilers
  • 18:00 What is LLVM and why it matters
  • 20:00 Bootstrapping and the self-recursive nature of compilers
  • 21:00 Compilers in Java, TypeScript, and dbt
  • 23:00 Why compilers are foundational to software ecosystems
  • 26:00 The SQL dialect problem in data warehouses
  • 29:00 Can SQL get its own LLVM?
  • 31:00 How Substrate and DataFusion aim to standardize SQL
  • 35:00 Package management and the path toward SQL abstractions
  • 38:00 The future of the data ecosystem with a common SQL compiler

Key takeaways from this episode

What is a compiler?

Tristan Handy: What is a compiler?

Lukas Schulte: It's something that takes higher-level human-readable code and translates, compiles, rewrites it into lower-level machine code that is much harder for humans to understand and much easier for machines to understand.

Compilers typically have phases. They have a frontend that deals with the language you're working with, a middle component—usually called an IR or intermediate representation—and a backend that takes that IR and compiles it into machine code.

Compiler phases: frontend, IR, backend

Tristan Handy: How does it all come together?

Lukas Schulte: There’s a preprocessor that handles macros, removes comments, and prepares the text. Then a lexer converts it into tokens. These tokens get assembled into a tree that the compiler can understand. That’s where syntax validation and semantic analysis happen.

From there, we build a logical representation of the operations we want to perform. That transitions to a physical plan, which starts considering the hardware: how many cores, how much memory, which files we’re accessing. After that, optimizations are applied and it compiles to actual machine code using a toolchain like LLVM.

Syntax vs. semantics

Lukas Schulte: Let’s break down syntax vs. semantics.

Imagine the code x = x + 1. That has valid syntax. Its meaning—its semantics—is that we’re incrementing x by 1.

Now, you could also write x += 1. Different syntax, same semantics. So syntax defines structure, and semantics define meaning. That distinction is important when you’re analyzing or transforming code.

LLVM and portability

Tristan Handy: Have we been building abstraction layers like this for decades?

Lukas Schulte: Absolutely. That’s what LLVM does. It provides a consistent intermediate representation that compilers can use to target multiple backends—Intel, ARM, different OSes. Apple invested early in LLVM to support custom chips.

With Rust, for example, LLVM is what lets us build binaries that behave the same on macOS, Windows, and Linux with relatively little effort.

Bootstrapping compilers

Tristan Handy: So there’s this recursive loop—compilers being built with other compilers?

Lukas Schulte: Exactly. Rust wasn’t always written in Rust—it started in C++. Eventually, the compiler was rewritten in Rust itself. Now, Rust compiles Rust. It’s fully self-hosted. That’s common with mature languages—it shows the compiler ecosystem is stable and powerful enough to sustain itself.

Why compilers matter

Tristan Handy: You said once that compilers are the foundation of every software ecosystem. What did you mean?

Lukas Schulte: There are two big drivers in software: abstractions and standards. You want one way to interface with a USB device—not ten. Same for software. You want one standard way to express a Python program, a JavaScript app, etc.

Compilers enforce those standards and make sure the same code works across platforms. That consistency powers things like package managers, shared libraries, and open ecosystems.

SQL dialects and fragmentation

Tristan Handy: Are there ecosystems that are doing worse than others?

Lukas Schulte: SQL does a particularly bad job. Anyone who's used more than one data warehouse knows you can't take the same SQL statement and expect it to work the same way. Casting, case sensitivity, functions—every engine handles these things differently.

Toward a universal SQL compiler

Tristan Handy: Can you convince me this problem is solvable?

Lukas Schulte: Yes. That's what we're working on with SDF—creating a shared intermediate representation for SQL. If we can express SQL logic in a unified form, we can compile it to any dialect—BigQuery, Snowflake, Redshift, and so on.

That allows developers to build reusable libraries, just like in other languages. It also makes governance, validation, and testing easier.

Future of data ecosystems

Tristan Handy: What would that future look like for practitioners?

Lukas Schulte: One major change would be the emergence of robust SQL libraries. Today, there’s no import system for SQL. Everyone writes similar logic over and over.

A shared compiler abstraction would let us reuse components, collaborate across companies, and build an ecosystem of packages for transformations, metrics, and validations—similar to how we use NPM or PyPI.

Last modified on: May 12, 2025

2025 dbt Launch Showcase

Join us on May 28 to hear from our executives and product leaders about the latest features landing in dbt.

Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.

Read now

Recent Posts

Great data professionals never work alone

Every industry leader understands one thing: you need the right network to grow. The dbt Community connects you with 100,000+ data professionals—people who share your challenges, insights, and ambitions.

If you’re looking for trusted advice, expert discussions, and real career growth, this is the place for you.

Solve your toughest challenges

Join today and get real-world advice from experienced pros.

Expand your network

Foster connections with meetups, local groups, and like-minded peers.

Advance your career

The dbt community is full of learning opportunities and shared job postings.