The history and future of the data ecosystem

last updated on Jun 27, 2025

This post first appeared in The Analytics Engineering Roundup.

In this decades-spanning episode, Tristan talks with Lonne Jaffe, Managing Director at Insight Partners and former CEO of Syncsort (now Precisely), to trace the history of the data ecosystem—from its mainframe origins to its AI-infused future.

Lonne reflects on the evolution of ETL, the unexpected staying power of legacy tech, and why AI may finally erode the switching costs that have long protected incumbents. The future of the AI and standards era is bright.

Please reach out at podcast@dbtlabs.com for questions, comments, and guest suggestions.

Listen & subscribe from:

Episode chapters

00:46 – Meet Lonne Jaffe: background & career jurney

Lonne shares his career highlights from Insight Partners, Syncsort/Precisely, and IBM, including major acquisitions and tech focus areas.

04:20 – The origins of Syncsort & sorting in mainframes

Discussion on why sorting was a critical early problem in hierarchical databases and how early systems like IMS worked.

07:00 – M&A as innovation strategy

How Syncsort used inorganic growth to modernize its platform, including an early example of migrating data from IMS to DB2 without rewriting apps.

09:35 – Technical vs. strategic experience

Tristan probes Lonne’s technical depth despite his business titles; Lonne shares his background in programming and a fun fact about juggling.

11:55 – Why this history matters

Tristan sets up the key question: what lessons from 1970s-2000s ETL tooling still shape the modern data stack?

13:00 – Proto-ETL: The real OGs

Lonne traces the origins of ETL to 1970s CDC, JCL, and early IBM tools. Prism Solutions in 1988 gets credit as the first real ETL startup.

15:40 – Rise of the ETL market (1990s)

From Prism to Informatica and DataStage—early 90s vendors brought visual development to what was once COBOL-heavy backend work.

18:00 – Why people offloaded Teradata to Hadoop

Exploring how cost, contention, and capacity drove ETL out of the warehouse and into Hadoop in the 2000s.

20:00 – Performance vs. price: Jevons Paradox in ETL

Why lower compute and storage costs led to more ETL, not less—and how parallelization changed the game.

22:30 – Evolution of data management suites

How ETL expanded into app-to-app integration, catalogs, metadata management, and why these bundles got bloated.

25:00 – Rise of data prep & self-service analytics

Tools like Kettle, Pentaho, and Tableau mirrored ETL for business users—spawning a whole “data prep” category.

27:30 – Clickstream, logs & big data chaos

How clickstream and log data changed the ETL landscape, and the hope (and letdown) of zero-copy analytics.

29:10 – Why is old software so sticky?

Tristan and Lonne explore the economics of switching costs, the illusion of freedom, and whether GenAI could break the lock-in.

33:30 – Are old tools actually… good?

Defending mainframes and 30-year-old databases like Cache. Sometimes the mature option is better—just not sexy.

36:00 – The new vs. the durable

Modern tools must prove themselves against decades of reliability and robustness in finance, healthcare, and compliance.

38:20 – GenAI in data: The early movers

Lonne highlights why companies like Atlan and dbt Labs are in the best position to win—distribution, trust, and product maturity.

41:00 – TAM and the Jevons Paradox, again

Revisiting how price drops expand TAM. Some categories vanish, others explode—depending on elasticity of demand.

43:15 – Unlocking new personas with LLMs

Structured data access for non-technical users is finally viable, but “it has to be right”—trust and quality remain the barrier.

46:00 – Real-world examples: dbt’s MCP server win

Tristan shares how dbt’s Metadata API became a catalog replacement for a traditional financial institution—an unplanned AI GTM success.

48:30 – Agents, not interfaces

New pattern: LLMs as agents interacting directly with infrastructure via APIs. Tool use is becoming table stakes for AI integration.

50:30 – Are LLMs birthright tools yet?

Discussion around adoption of ChatGPT Enterprise, Claude, etc. Lonne suggests adoption is accelerating fast—and the usage model matters.

52:00 – Looking ahead

The conversation ends with a reflection on GenAI’s near future in data workflows, TAM expansion, and what the next episode might tackle.

Webinar: Join dbtLabs and Infinite Lambda for a deep dive into how enterprises are rapidly modernising their data estates

Key takeaways from this episode

Tristan Handy: You've had a long career in tech. Maybe start by giving us the 30,000-foot view of what you've been up to over the last couple decades?

Lonne Jaffe: I’ve been at Insight Partners for about eight years now, working mostly on deep tech investments—AI infrastructure companies like Run AI and deci.ai, both acquired by Nvidia. I’ve also done work with data infrastructure companies like SingleStore. Before Insight, I was CEO of a portfolio company called Syncsort, now Precisely. It was founded in 1968.

Prior to that, I was at IBM for 13 years, working in middleware and mainframe technologies. Products like WebSphere, CICS, and TPF—foundational systems for enterprise computing.

Tristan Handy: And Syncsort's origin was in sorting, right? Literally sorting files?

Lonne Jaffe: Exactly. In the early days of computing, sorting was a huge part of what you did. Much of the data was hierarchical—stored in IMS—and had to be flattened into files to process. The algorithms were optimized to run in extremely resource-constrained environments.

Tristan Handy: Fascinating. And I assume as compute and storage improved, the data integration landscape evolved?

Lonne Jaffe: Yes. We saw a move from hierarchical to relational databases, then toward ETL tools in the 80s and 90s. The first real ETL startup was probably Prism Solutions in 1988. Informatica and DataStage showed up in the early 90s, followed by Talend and others.

Tristan Handy: It seems like we got a whole bundle of tools over time—ETL, CDC, app integration, metadata, and so on.

Lonne Jaffe: Yes, often bundled together, even though data prep and app integration were treated separately. That persisted for longer than you'd expect. At Syncsort, we acquired a company with a "transparency" solution that allowed IMS applications to use data stored in DB2 without rewriting code—a clever way to manage switching costs.

Tristan Handy: Speaking of switching costs—why are these legacy tools so sticky?

Lonne Jaffe: Great question. In many cases, no customer loves the product. They’d switch in a heartbeat—if it were easy. But rewriting jobs and ensuring reliability is a heavy lift. The best outcome is a new system that replicates old functionality. And for many organizations, that’s not worth the risk.

Tristan Handy: But if generative AI could reduce those switching costs?

Lonne Jaffe: That’s the potential. Code generation, agents that explore and iterate—those could erode the moat that’s protected these incumbents for decades. Not tomorrow, but it’s a real possibility.

Tristan Handy: It also seems like some of these systems are more robust than people give them credit for.

Lonne Jaffe: Absolutely. Mainframes are IO supercomputers. Products like InterSystems Cache, used by Epic, are incredibly performant. But new systems must match or exceed those capabilities in reliability and scale, which is a high bar.

Tristan Handy: As you look at the evolution of the modern data stack, how do you think about its impact on the market?

Lonne Jaffe: In the 2010s, we saw disaggregation—tools like Fivetran, dbt, and Snowflake each tackled a slice of the old enterprise bundle. But the TAM isn’t infinite. Some categories may compress or vanish entirely if price drops aren’t offset by new demand.

Tristan Handy: Do you think AI expands or compresses the data stack?

Lonne Jaffe: It depends. High elasticity of demand—like with dashboards or analytics—can drive massive TAM expansion. But some categories, like logo redesign or simple data movement, might get commoditized. For more complex workflows, AI agents accessing platforms like dbt or Atlan could dramatically increase value by automating common tasks and enabling new personas.

Tristan Handy: We’ve seen an example already—a customer replaced their data catalog with our dbt Cloud metadata server and AI interface.

Lonne Jaffe: That’s telling. If AI interfaces can connect to tools like dbt and generate value—self-service, documentation, lineage—it changes the game. Especially for organizations already standardized on those platforms.

Tristan Handy: What’s your view on how these AI interfaces get distributed?

Lonne Jaffe: ChatGPT Enterprise, Claude, and others are spreading fast. Eventually, you’ll want those tools to search files, access internal metadata, and interact with your data stack—not just answer questions from the open web.

Tristan Handy: It makes a lot of sense. If AI is going to serve enterprise users, it needs access to the real data. Otherwise, it’s just a toy.

Lonne Jaffe: Exactly. A model that can’t query or verify against your actual environment won’t be reliable. And data quality and observability—something dbt Cloud is already good at—become foundational.

VS Code Extension

The free dbt VS Code extension is the best way to develop locally in dbt.

Install free extension

Latest posts

Product10 min

Bring structured context to conversational analytics with dbt

Sai Maddali,Chakshu Mehta

on Dec 03, 2025

Learn8 min

Using state-aware orchestration to slash your data costs

Kathryn Chubb

on Nov 26, 2025

Learn9 min

Reducing ETL licensing costs with the dbt Fusion engine

Kathryn Chubb

on Nov 26, 2025

The dbt Community

Join the largest community shaping data

The dbt Community is your gateway to best practices, innovation, and direct collaboration with thousands of data leaders and AI practitioners worldwide. Ask questions, share insights, and build better with the experts.

Join the CommunityExplore the community

100,000+active members

50k+teams using dbt weekly

50+Community meetups