Why compilers matter

on May 25, 2025

This post first appeared in The Analytics Engineering Roundup.

Tristan Handy dives deep into the world of compilers in this episode of The Analytics Engineering Podcast with Lukas Schulte, cofounder of SDF Labs (not to be confused with last episode’s guest—Lukas’ dad and fellow SDF cofounder Wolfram Schulte). Tristan and Lukas discuss what compilers are, how they work, and what they mean for the data ecosystem. SDF, which was recently acquired by dbt Labs, builds a world-class SQL compiler aimed at abstracting away the complexity of warehouse-specific SQL.

The conversation covers the evolution of compiler technology, what software engineering has gotten right over the past several decades, and why the data ecosystem is poised for similar transformation. Lucas and Tristan explore why SQL has lagged behind other programming ecosystems, and how new compiler infrastructure could lead to package management, interoperability, and greater innovation across data platforms. It’s a fascinating (and timely) episode: Get ready for the new dbt engine.

Please reach out at podcast@dbtlabs.com for questions, comments, and guest suggestions.

Listen & subscribe from:

Chapters

02:40 The vision behind SDF Labs
04:00 What is a compiler?
05:00 Components of a compiler: frontend, IR, backend
08:00 Syntax vs. semantics and the role of parsing
10:00 Logical vs. physical plans in SQL compilers
13:00 Historical context: mainframes to LLVM
16:00 Cross-architecture portability in Rust & other compilers
18:00 What is LLVM and why it matters
20:00 Bootstrapping and the self-recursive nature of compilers
21:00 Compilers in Java, TypeScript, and dbt
23:00 Why compilers are foundational to software ecosystems
26:00 The SQL dialect problem in data warehouses
29:00 Can SQL get its own LLVM?
31:00 How Substrate and DataFusion aim to standardize SQL
35:00 Package management and the path toward SQL abstractions
38:00 The future of the data ecosystem with a common SQL compiler

Key takeaways from this episode

What is a compiler?

Tristan Handy: What is a compiler?

Lukas Schulte: It's something that takes higher-level human-readable code and translates, compiles, rewrites it into lower-level machine code that is much harder for humans to understand and much easier for machines to understand.

Compilers typically have phases. They have a frontend that deals with the language you're working with, a middle component—usually called an IR or intermediate representation—and a backend that takes that IR and compiles it into machine code.

Compiler phases: frontend, IR, backend

Tristan Handy: How does it all come together?

Lukas Schulte: There’s a preprocessor that handles macros, removes comments, and prepares the text. Then a lexer converts it into tokens. These tokens get assembled into a tree that the compiler can understand. That’s where syntax validation and semantic analysis happen.

From there, we build a logical representation of the operations we want to perform. That transitions to a physical plan, which starts considering the hardware: how many cores, how much memory, which files we’re accessing. After that, optimizations are applied and it compiles to actual machine code using a toolchain like LLVM.

Syntax vs. semantics

Lukas Schulte: Let’s break down syntax vs. semantics.

Imagine the code x = x + 1. That has valid syntax. Its meaning—its semantics—is that we’re incrementing x by 1.

Now, you could also write x += 1. Different syntax, same semantics. So syntax defines structure, and semantics define meaning. That distinction is important when you’re analyzing or transforming code.

LLVM and portability

Tristan Handy: Have we been building abstraction layers like this for decades?

Lukas Schulte: Absolutely. That’s what LLVM does. It provides a consistent intermediate representation that compilers can use to target multiple backends—Intel, ARM, different OSes. Apple invested early in LLVM to support custom chips.

With Rust, for example, LLVM is what lets us build binaries that behave the same on macOS, Windows, and Linux with relatively little effort.

Bootstrapping compilers

Tristan Handy: So there’s this recursive loop—compilers being built with other compilers?

Lukas Schulte: Exactly. Rust wasn’t always written in Rust—it started in C++. Eventually, the compiler was rewritten in Rust itself. Now, Rust compiles Rust. It’s fully self-hosted. That’s common with mature languages—it shows the compiler ecosystem is stable and powerful enough to sustain itself.

Why compilers matter

Tristan Handy: You said once that compilers are the foundation of every software ecosystem. What did you mean?

Lukas Schulte: There are two big drivers in software: abstractions and standards. You want one way to interface with a USB device—not ten. Same for software. You want one standard way to express a Python program, a JavaScript app, etc.

Compilers enforce those standards and make sure the same code works across platforms. That consistency powers things like package managers, shared libraries, and open ecosystems.

SQL dialects and fragmentation

Tristan Handy: Are there ecosystems that are doing worse than others?

Lukas Schulte: SQL does a particularly bad job. Anyone who's used more than one data warehouse knows you can't take the same SQL statement and expect it to work the same way. Casting, case sensitivity, functions—every engine handles these things differently.

Toward a universal SQL compiler

Tristan Handy: Can you convince me this problem is solvable?

Lukas Schulte: Yes. That's what we're working on with SDF—creating a shared intermediate representation for SQL. If we can express SQL logic in a unified form, we can compile it to any dialect—BigQuery, Snowflake, Redshift, and so on.

That allows developers to build reusable libraries, just like in other languages. It also makes governance, validation, and testing easier.

Future of data ecosystems

Tristan Handy: What would that future look like for practitioners?

Lukas Schulte: One major change would be the emergence of robust SQL libraries. Today, there’s no import system for SQL. Everyone writes similar logic over and over.

A shared compiler abstraction would let us reuse components, collaborate across companies, and build an ecosystem of packages for transformations, metrics, and validations—similar to how we use NPM or PyPI.

Published on: May 11, 2025

2025 dbt Launch Showcase

Catch our Showcase launch replay to hear from our executives and product leaders about the latest features landing in dbt.

Watch the launch event replay

Set your organization up for success. Read the business case guide to accelerate time to value with dbt.

Read now

Latest posts

Insights11 min

Empowering analysts with dbt: Who they are and how we help

Patrick Barch

on Jul 01, 2025

Insights10 min

Empowering the enterprise for the next era of AI and BI

Kathryn Chubb

on Jun 25, 2025

Learn15 min

Analyst autonomy vs data governance: How to have both

Kathryn Chubb

on Jun 24, 2025

The dbt Community

Join the largest community shaping data

The dbt Community is your gateway to best practices, innovation, and direct collaboration with thousands of data leaders and AI practitioners worldwide. Ask questions, share insights, and build better with the experts.

Join the Community Explore the community

100,000+active members

50k+teams using dbt weekly

50+Community meetups

Why compilers matter

Chapters

Key takeaways from this episode

What is a compiler?

Compiler phases: frontend, IR, backend

Syntax vs. semantics

LLVM and portability

Bootstrapping compilers

Why compilers matter

SQL dialects and fragmentation

Toward a universal SQL compiler

Future of data ecosystems

2025 dbt Launch Showcase

Share this article

Latest posts

Empowering analysts with dbt: Who they are and how we help

Empowering the enterprise for the next era of AI and BI

Analyst autonomy vs data governance: How to have both

Join the largest community shaping data