Table of Contents
- • No silver bullets: Building the analytics flywheel
- • Identity Crisis: Navigating the Modern Data Organization
- • Scaling Knowledge > Scaling Bodies: Why dbt Labs is making the bet on a data literate organization
- • Down with 'data science'
- • Refactor your hiring process: a framework
- • Beyond the Box: Stop relying on your Black co-worker to help you build a diverse team
- • To All The Data Managers We've Loved Before
- • From Diverse "Humans of Data" to Data Dream "Teams"
- • From 100 spreadsheets to 100 data analysts: the story of dbt at Slido
- • New Data Role on the Block: Revenue Analytics
- • Data Paradox of the Growth-Stage Startup
- • Share. Empower. Repeat. Come learn about how to become a Meetup Organizer!
- • Keynote: How big is this wave?
- • Analytics Engineering Everywhere: Why in the Next Five Years Every Organization Will Adopt Analytics Engineering
- • The Future of Analytics is Polyglot
- • The modern data experience
- • Don't hire a data engineer...yet
- • Keynote: The Metrics System
- • This is just the beginning
- • The Future of Data Analytics
- • Coalesce After Party with Catalog & Cocktails
- • The Operational Data Warehouse: Reverse ETL, CDPs, and the future of data activation
- • Built It Once & Build It Right: Prototyping for Data Teams
- • Inclusive Design and dbt
- • Analytics Engineering for storytellers
- • When to ask for help: Modern advice for working with consultants in data and analytics
- • Smaller Black Boxes: Towards Modular Data Products
- • Optimizing query run time with materialization schedules
- • How dbt Enables Systems Engineering in Analytics
- • Operationalizing Column-Name Contracts with dbtplyr
- • Building On Top of dbt: Managing External Dependencies
- • Data as Engineering
- • Automating Ambiguity: Managing dynamic source data using dbt macros
- • Building a metadata ecosystem with dbt
- • Modeling event data at scale
- • Introducing the activity schema: data modeling with a single table
- • dbt in a data mesh world
- • Sharing the knowledge - joining dbt and "the Business" using Tāngata
- • Eat the data you have: Tracking core events in a cookieless world
- • Getting Meta About Metadata: Building Trustworthy Data Products Backed by dbt
- • Batch to Streaming in One Easy Step
- • dbt 101: Stories from real-life data practitioners + a live look at dbt
- • The Modern Data Stack: How Fivetran Operationalizes Data Transformations
- • Implementing and scaling dbt Core without engineers
- • dbt Core v1.0 Reveal ✨
- • Data Analytics in a Snowflake world
- • Firebolt Deep Dive - Next generation performance with dbt
- • The Endpoints are the Beginning: Using the dbt Cloud API to build a culture of data awareness
- • dbt, Notebooks and the modern data experience
- • You don’t need another database: A conversation with Reynold Xin (Databricks) and Drew Banin (dbt Labs)
- • Git for the rest of us
- • How to build a mature dbt project from scratch
- • Tailoring dbt's incremental_strategy to Artsy's data needs
- • Observability within dbt
- • The Call is Coming from Inside the Warehouse: Surviving Schema Changes with Automation
- • So You Think You Can DAG: Supporting data scientists with dbt packages
- • How to Prepare Data for a Product Analytics Platform
- • dbt for Financial Services: How to boost returns on your SQL pipelines using dbt, Databricks, and Delta Lake
- • Stay Calm and Query on: Root Cause Analysis for Your Data Pipelines
- • Upskilling from an Insights Analyst to an Analytics Engineer
- • Building an Open Source Data Stack
- • Trials and Tribulations of Incremental Models
Data Analytics in a Snowflake world
Snowflake is at the forefront of changing the way modern data teams work with the Data Cloud.
In this session, Christian Kleinerman, Snowflake’s SVP of Product, joins Tristan Handy, founder and CEO of dbt Labs, for a casual conversation about what the future has in store.
Where does Snowflake go from here? What meta trends and technologies play into that vision? How does that impact the world of data analytics? Christian and Tristan have no shortage of opinions or ideas. This is your chance to hear some of them, live and unfiltered.
Browse this talk’s Slack archives #
The day-of-talk conversation is archived here in dbt Community Slack.
Not a member of the dbt Community yet? You can join here to view the Coalesce chat archives.
Full transcript #
Amada Echeverria: [00:00:00] Welcome everyone. And thank you for joining us at Coalesce. I’m Amada Echeverria . I use she/ her pronouns and I’m a developer relations advocate on the community team at dbt Labs. I am absolutely thrilled to be hosting today’s session: Data Analytics in a Snowflake world: A conversation with Christian Kleinerman and Tristan Handy.
Christian Kleinerman is a database expert with over 20 years of experience working with various database technologies. He is currently serving a senior vice president of product at Snowflake and Christian has more than 15 years of management and leadership experience. At Microsoft, he served as general manager of the data warehousing product unit, where he was responsible for a broad portfolio of products.
Most recently, he worked at Google leading YouTube infrastructure and data systems. Christian earned his bachelor’s in industrial engineering from Los Angeles university and is a named inventor on numerous [00:01:00] Snowflake patents. Tristan Handy is an unrepentant Snowflake fan. He first used Snowflake in 2017 and says he was pretty grumpy about it because his team at what was then Fishtown Analytics had a hard time getting capitalization and quoting to work right in dbt.
Tristan also happens to be a founder and our CEO at dbt. It’s no secret that Snowflake is at the forefront of changing the way modern data teams work with the data cloud. This 30 minutes session will be a casual conversation about what the future has in store. We’ll tackle questions like where does Snowflake go from here? What meta trends and technologies play into that vision? How does that impact the world of data analysts?
Before we jump into things, some recommendations for making the best out of this session. I’ll chat conversation is taking place in the #coalesce-snowflake channel of dbt Slack. If you are not yet a part of [00:02:00] the dbt Slack community, you have time to join now.
Seriously, go do it. Visit getdbt.com/community and search for #coalesce-snowflake when you arrive. We encourage you to set up Slack and your browser side-by-side. In Slack, I think you’ll have a great experience if you ask other attendees questions, make comments to your memes or react in the channel at any point during the session. Tristan has taken questions for question ahead of time on Twitter and LinkedIn.
Our guests may answer some questions coming in live. To kick us off our chat champion, Nikhil Kothari head of technology partnerships at dbt Labs has started a thread to have you introduce yourself, let us know where you’re tuning in from and share your favorite XL function. And yes, you can only choose one.
After the session. Tristan, as well as a few members from the Snowflake team will be available in the Slack channel to answer questions. Let’s get started. Over to you Christian and Tristan.
Tristan Handy: Hey, [00:03:00] thanks for the intro. And just to be clear motto was not lying. I wore the Snowflake t-shirt today. This is actually, I don’t wear it that often anymore because it’s getting a little bit thin.
I got this citizen event back in 2017 and I’ve worn it so many times that I think that the washer has had its way with it. So thanks for joining Christian.
Christian Kleinerman: It’s great to be here and I’ll make sure to send you a another,
Tristan Handy: Are you still making this color? I like this color.
Christian Kleinerman: Yeah. Yeah. We have him. I’ll get you another one.
Tristan Handy: Nice. And I we definitely did collect a bunch of questions but I have to admit that. I am going to use this session as my own personal opportunity to ask you all of the questions that I’ve always had as a Snowflake user over the years. So maybe we’ll get to some other people’s questions.
Before we get into the serious stuff, I’m curious if you could tell us you’ve been at Snowflake for a little while now, and really I think you’re, you joined at a point in time the technology was just maturing. Like you look at the Snowflake user graph and it likes, it started to go upstairs to go up and [00:04:00] then it was just like pink.
And that, I don’t know if you can take all the credit in the world for that but I think that happened right around when you joined. What was that world like? Did you still have the like super nice office? Like how many people worked at the company?
[00:04:13] When you koined Snowflake, what was that world like? #
Christian Kleinerman: Yeah. Let’s rewind a been four years next month with Snowflake. When I joined, it was just over 300 people. Most people asked Snowflake what, who is Snowflake? What is it? So definitely it was a, not a well-known I had seen the Sigma paper and I could appreciate the architecture, they, Hey, this going to be interesting. But it was a very small team.
I say 300 something people, but engineering, the tech, or it was maybe 50 people, 60 people, it was smaller.
Tristan Handy: Yeah. When we were in, in the green room, right before this, I made a connection that I had not made before, but I think you probably had a good opportunity to appreciate some of Snowflake’s way of looking at the world.
Certainly [00:05:00] in your days at Microsoft but also at YouTube, there was a thing that happened while you were there called Priscilla. And I remember, I, I probably read this page. After, once it got released into the public, but I remember reading this back in the day and was pretty blown away by it.
Can you say a little bit about what Priscilla is slash was? And to what extent that there is some similarities between that and what you folks are building.
Christian Kleinerman: Yeah. So it’s a, Priscilla was a one of many database analytics systems. I Google the original motivation was to power the YouTube analytics system, where YouTube has millions of content creators.
And you want to see how’s your video doing? How’s your channel doing your fans? So it was how do you power a very low latency analytics? At a very large scale. Understandably and frankly, the different solutions that were there were not quite meeting the requirements that we had. And that’s what led to the creation of that system.[00:06:00]
Tristan Handy: I remember that the thing that really impressed me about it from reading the white paper was that it wrapped up kind of two different data access paths. Behind a single API. Like you could like a little bit have your cake and eat it too. Which the way that I think about designing databases, Not that I’ve ever done it myself, but the way that I like understand the exercise is that there’s this pretty broad design space.
And you get to pick like, where do you want to locate yourself in that design space? You have to make certain trade-offs like either your batch reads are like truly fantastic, but you’re, less real, you have to trade off all these things. And the way I read the paper was the Priscilla had found a way to have like two points in the design space, but not make it more complicated for users is my understanding of that. And to what extent do you feel like what you folks are doing at [00:07:00] Snowflake right now is also trying to like have a similar, you get multiple points in the design space for users.
[00:07:08] To what extent do you feel like what you folks are doing at Snowflake right now is also trying to you get multiple points in the design space for users? #
Christian Kleinerman: So it is accurate that whenever you’re building one of these platforms, you need to figure out what you’re optimizing for. And there’s probably the biggest trade off is how narrow you are in terms of usage. Versus how bread here in terms of the platform, you can be really good at one use case.
Or it can be maybe a little bit less, less precise on the targeting, but of a broader nature. I think what Priscilla data at YouTube was a reasonably narrow case, and I’m not gonna say ultra an error, but it was a reasonably narrow Instead of query patterns. We understood the data when the data model.
So that’s how you build a system that is quite fast because you understand the effectively, the use case, the data, the patterns, the growth rates, et cetera. That’s a benefit that platforms like Snowflake don’t have we see [00:08:00] people in, it’s not like showing up with a handful of gigabytes of data with very repeatable reports, but also we see petabytes size tables with ad hoc exploration or so recently a seven way join. So I think that may be the biggest difference on how broad of a platform you are. And of course, what we aim to do is make sure that we appeal to a very large set of use cases.
Tristan Handy: Yeah. I’ll connect these dots in a second, but I want to go over to clean rooms.
Clean rooms is like a thing that I have only learned about recently. I have never had the opportunity to use a Snowflake powered cleaning room before. But it’s like such a fascinating concept. Can you talk about what this is?
[00:08:45] Talk about Clean Rooms #
Christian Kleinerman: Yeah. So at a high level, a clean room is a collaboration workspace by which one or two or more parties. I can go on do analytics do even machine learning [00:09:00] with that on uncombined data without necessarily seeing each of the other party’s data? The, my, my simplest example is something that I think you and I have talked in the past on if we wanted to figure out which customer is, does dbt Labs install and having common maybe you’re not super incentivized to give me all the list of your customers.
Maybe I don’t want to give you the whole list of my. But how do we identify who are the mutual customers? So we can go and provide a more targeted use case. A clean room is exactly the type of construct where we both get to contribute our data. We get to ask questions on the common data without disclosing each other’s data.
Traditionally, this has been done by a trusted third party. So you hire a company, we both give the data to a company and then they give you the. We’ve done a Snowflake. It’s just part of a bigger journey around collaboration. Four plus years ago, we started doing data sharing, but very quickly we said, you know what?
If we expand the concept to function [00:10:00] sharing, I can give you trust in a function. You can operate on my data without seeing my data. You can give me a function. And that is the foundation. And let’s say the clean room concept is red hot in terms of popularity and value, but there are also macro trends, deprecation of third party cookies, IDFA type of a changes, which has created an even bigger set of needs on how do companies collaborate on data without disclosing full data sets to each other.
I’ll pause there. It’s a topic we can use all 30 minutes.
Tristan Handy: Yes. We’ll do it again tomorrow.
So th this is like a very unusual thing, I think, in the world of database functionality. Certainly if you can’t be. The most popular 20 databases in the world. I don’t think you will frequently see this type of feature.
I, the reason I brought it up is that I think it points to a certain uniqueness, [00:11:00] a Snowflake in that. And my guess is that this has to do with your, the way that you see yourself as a cloud first Can you fill in the blanks here? Yeah,
Christian Kleinerman: So you’re right. That the big cloud first and they’re designed for the cloud is what makes the difference.
And it’s what enables something like the clean rooms. Something that, that was very early on way before I was at Snowflake, all credit goes through to our founders. It was, they took every single. Subsystem of a traditional database system and said, how would you rethink this? If you had not only a virtually unlimited resources, but also femoral resources you can instantiate and release resources that traditional databases were always constrained and memory constrained on disk space, like running out of this space is a very real thing.
But the moment that someone tells you assume that you have unlimited this space, you start to define something different and that’s what leads to the, Hey, what if different customers could have compute clusters, [00:12:00] all sharing, same story and substrate. That was the insight that led toward data sharing. And then one thing leads to another one.
And that’s how we ended up with clean room. So what enables it is the cloud to be on it? If you had a server under your desk, With your data. And I have the same thing here. There is no clean room we’re going to do. Cause you don’t have this ability to have the platform that enables the collaboration.
Does that make sense?
Tristan Handy: Yeah, it can. Can you relate this back? So it was it’s been a funny experience for me outside to, to watch the banner on the the Snowflake homepage change over time from. Data warehouse in the cloud to cloud data platform to data cloud. Is, are these changes over time related to your own self conception of what it is that you’re building had?
How does it actually manifest in the technology?
[00:12:55] How does what you’re building manifest in the tech? #
Christian Kleinerman: Yeah. Great question. I would say that our positioning, which [00:13:00] is manifested in in. It has generally trailed what the product and what the division had been doing. There were housing for the cloud was a very easy way to explain what we did.
But early on, we have data sharing and data sharing. Didn’t add up as part of the data warehousing terminology. And then we started to see that customers were doing more. I want to do data transformation. I want to go and potentially build an application. And you don’t think of data warehouses for that.
That’s when we said, okay. Cloud data platform, but very quickly we saw that having great technology in the company. It’s only part of the requirements to having a great data capability. What we think is that the big insight behind the data cloud is we want to deliver great technology. We want to deliver an ecosystem.
Data providers, function, providers, application providers that enrich the experience. That’s what is the data cloud. But if you look at our product journey, I would say we’ve [00:14:00] always been enabling newer things ahead of. I would say the broader positioning or the broader description of what we’re doing.
Tristan Handy: So there’s this kind of broadening that is happening. Maybe data warehouse is like the most narrow definition and overtime you’ve understood it to be more and more broad. What’s the limit. If this is like a limit, a mathematical limit function, like at some point in time, are you going to be competing with Postgres for LLTP workloads to not just, I’m sure that there are some folks that actually do use Snowflake to like, as some type of a transactional system, but I don’t think that you’re going to compete with Postgres today on.
Christian Kleinerman: So I would say you, you asking that question of a limit. So I want to be an answer in terms of art of the possible. What informs, how we think about it is we have a platform where we’re every bit as proud on the technology that we have. And a lot of the emphasis that we’ve done for our customers is around data governance.
Data governance includes [00:15:00] security, but also privacy, but also in knowing that understanding data and what we are systematically doing. Understanding, what are the reasons that get customers to deviate from a single central governed platform? So you can say maybe advanced analytics is part of that, then let’s figure out how do we bring advanced analytics into the platform?
Maybe there are use cases where someone is copying data out into a Postgres database for delivery of data, actually serving one possible use case. We’re saying maybe or not, that one is a more concrete. We are expanding around high concurrency, low latency use cases. So you can say it’s chipping away from to use Postgres as the example chipping away from the use cases.
But the way we think about it is not we’re going after Postgres, where we’re going after something technology is more, we’re slowly expanding the set of use cases we support and the workloads we see. All in the name [00:16:00] of a helping our customers preserve governance. That’s what matters. That’s the north star.
[00:16:05] Snowpark #
Tristan Handy: Gosh, so much to talk about. So little time I want to make sure to get this new park Snowpark is one of the things where when you folks first talked about that, I was just like oh, let me definitely pay attention. The part of the challenge for me following along in Snowpark news is that until recently it has not been focused on my language, my like non sequel language of choice Python.
And I don’t think the Python docs are out yet, but I’m going to be all over them when they’re out. So I might not fully understand the vision here, but what. As a practitioner, I’m hoping that so park over time gets me to is an environment where I can stop thinking about running any computation on my local device.
I want not only to push my SQL workloads to a data platform, a cloud data platform, I want to [00:17:00] also push predictive workloads to that same platform. And I, my understanding other you correct me. Not yet. We’re not yet at a place where I can just like input, SK learn and go train, the next great machine learning model, but ma maybe that’s wrong.
That’s why I want to get to anyway.
Christian Kleinerman: So the directionally you’re right. The motivation behind a park. How do we bring extensibility into Snowflake? And how do we give a choice to our users? Choice goes back to the programming language. There’s a strong contingent of people on a SQL is my answer to, to, to any problem.
I am a SQL fan and junkie, but I’m also the first one to admit it is not the best tool for certain problems, spaces. Java has. It has its place. And Python has a very long. Plays in the world of manipulating data enrichment through data. So we started with less extent Snowflakes as [00:18:00] that we can bring computation, bring interest in business logic onto the Snowflake.
Again, within that governance boundary of which one category of extensions could be predictive analytics, machine learning, et cetera. I have people that have shown me not only Java-based extensions to do machine learning both training and scoring, but of course there’s a long list of customers lining up to our Python private preview, or we’re in private preview now.
And they’re lining up to get access, to be able to do this type of analytics. But it’s not about only data science. We’re seeing all sorts of interesting use cases around security and encryption and tokenization. How do you effectively us as a database? Get out of a, we had to produce every single building function and an extension.
How do we enable literally partners? We were strong believers in the ecosystem, but customers themselves, that is the big vision, which really is a prereq for if we want to go bring [00:19:00] additional workloads. How do you do it?
Tristan Handy: I talk a lot about S curves, the concept in the book by Carlotta Perez.
And I see them in my sleep now. So my question is if all technologies are in some ways S curves, where are we along the trajectory of Snowpark .
Christian Kleinerman: I would say from the investment on our side, we’re very far along a state started many years ago, or two and a half, three years ago. The fact that Java is in public preview and Python is in prayer for you that’s the beginning of the end of the delivery, but I think we’re very early on, on the adoption.
The interest that we see is through the roof. Frankly, the technology has been Java in the market for six months. Fivetran a couple of weeks in since our snow day. So a maturity of what we’ve delivered, I would say quite far along the adoption of the early on, based on timing, [00:20:00]
Tristan Handy: It’s a, it’s such a funny building databases. I feel like it’s funny cultural environment. So like we release technology out quickly and iterate on it very quickly with our community. The syphilis story is really like the founders and a small group of engineers were like in the labs for, I don’t know, pretty large number of years before this technology was actually like usable by a real customers.
And that’s like when I started poking around at it. And it seems not unreasonable to imagine that Snowpark might have its own like a gestation period before it like gets to maturity.
Christian Kleinerman: Yeah. So yeah, the, I like to say our founders boarded on insanity when they decided to build a new platform from scratch, literally they started building everything Snowpark I have some early use cases where I’m seeing dramatic, like improvements relative to prior [00:21:00] art, how concise the code is, et cetera.
So yes, you can say back to your. We will see very interesting use cases. We have a snow accelerated program 50 plus companies, partners signing up to deliver solutions on Snowpark. So think of it as a runtime to bring logic, to run closer to the data. And of course, by virtual, how we do it is cross cloud which is compelling for anyone that’s trying to build solutions for data.
Tristan Handy: Okay. I we’re running at a time, but maybe I’ll get your thoughts on this. Overall. I had this edit conversation with Martin on, was it Monday? We the title of the session was How Big is This wave? And we were talking about the persistence and just overall size of this thing that we’re all doing here together.
I’m curious what you think. Five years from now, it looks like, and this is not really a Snowflake question per se. It’s more like, how are people going to be using data in ways that they are [00:22:00] not yet? What problems do we still need to solve? Yeah.
[00:22:03] How are people going to be using data in ways that they are not yet? What problems do we still need to solve? #
Christian Kleinerman: So a, I also compare notes with Martin on some regular basis, and I don’t know who I haven’t watched her your Monday conversation, but I’ll say that we’re very early on as an industry.
On what is going to be possible with data? We have no end of tailwinds as an industry, the amount of data being created is through the roof. And the vast majority of data is, has been created the last five, five years. They acknowledgment, acknowledgment or realization that competitive advantage is based on data.
That’s only starting to down into every single industry. I’ve heard they yeah. Software using the world. I think data is eating the software. And the last one is cloud. The cloud is forcing everyone to rethink systems, rethink how things are being done. We’re very early on it as an industry. And I think the opportunity for all of us in the data world, It’s huge.
Tristan Handy: I very much agree and it’s so like for me, there’s this [00:23:00] clarity of what the next 12 months is going to look like. And then there’s this clarity of what things will look like in a decade. And then there’s this, I have this like uncertainty around specifically, like how.
Our community analytics, engineers data analysts. How do their jobs look differently in the three to five year time horizon? I think that there’s only good things involved but it’s probably not just going to be like folks like churning out dashboards and just like working ever faster than any.
Christian Kleinerman: So something that we obsess about and you’re also a big part of it is help everyone focus on the data model, the metric definition, not on the tools and the infrastructure. We jointly want to take care of that. So I think we’re going to see a huge focus on what is the metric definition that will give me the competitive advantage or what is the timeliness to my decisions and my insights.
Massive improvements on all of these dimensions so much to be done.
Tristan Handy: Awesome. Christian, thanks so much for joining us. [00:24:00]
Christian Kleinerman: Tristan, thank you for having me here. It’s awesome to be part of Coalesce. Thank you.
Last modified on: Oct 11, 2022