Table of Contents
- ⢠No silver bullets: Building the analytics flywheel
- ⢠Identity Crisis: Navigating the Modern Data Organization
- ⢠Scaling Knowledge > Scaling Bodies: Why dbt Labs is making the bet on a data literate organization
- ⢠Down with 'data science'
- ⢠Refactor your hiring process: a framework
- ⢠Beyond the Box: Stop relying on your Black co-worker to help you build a diverse team
- ⢠To All The Data Managers We've Loved Before
- ⢠From Diverse "Humans of Data" to Data Dream "Teams"
- ⢠From 100 spreadsheets to 100 data analysts: the story of dbt at Slido
- ⢠New Data Role on the Block: Revenue Analytics
- ⢠Data Paradox of the Growth-Stage Startup
- ⢠Share. Empower. Repeat. Come learn about how to become a Meetup Organizer!
- ⢠Keynote: How big is this wave?
- ⢠Analytics Engineering Everywhere: Why in the Next Five Years Every Organization Will Adopt Analytics Engineering
- ⢠The Future of Analytics is Polyglot
- ⢠The modern data experience
- ⢠Don't hire a data engineer...yet
- ⢠Keynote: The Metrics System
- ⢠This is just the beginning
- ⢠The Future of Data Analytics
- ⢠Coalesce After Party with Catalog & Cocktails
- ⢠The Operational Data Warehouse: Reverse ETL, CDPs, and the future of data activation
- ⢠Built It Once & Build It Right: Prototyping for Data Teams
- ⢠Inclusive Design and dbt
- ⢠Analytics Engineering for storytellers
- ⢠When to ask for help: Modern advice for working with consultants in data and analytics
- ⢠Smaller Black Boxes: Towards Modular Data Products
- ⢠Optimizing query run time with materialization schedules
- ⢠How dbt Enables Systems Engineering in Analytics
- ⢠Operationalizing Column-Name Contracts with dbtplyr
- ⢠Building On Top of dbt: Managing External Dependencies
- ⢠Data as Engineering
- ⢠Automating Ambiguity: Managing dynamic source data using dbt macros
- ⢠Building a metadata ecosystem with dbt
- ⢠Modeling event data at scale
- ⢠Introducing the activity schema: data modeling with a single table
- ⢠dbt in a data mesh world
- ⢠Sharing the knowledge - joining dbt and "the Business" using TÄngata
- ⢠Eat the data you have: Tracking core events in a cookieless world
- ⢠Getting Meta About Metadata: Building Trustworthy Data Products Backed by dbt
- ⢠Batch to Streaming in One Easy Step
- ⢠dbt 101: Stories from real-life data practitioners + a live look at dbt
- ⢠The Modern Data Stack: How Fivetran Operationalizes Data Transformations
- ⢠Implementing and scaling dbt Core without engineers
- ⢠dbt Core v1.0 Reveal āØ
- ⢠Data Analytics in a Snowflake world
- ⢠Firebolt Deep Dive - Next generation performance with dbt
- ⢠The Endpoints are the Beginning: Using the dbt Cloud API to build a culture of data awareness
- ⢠dbt, Notebooks and the modern data experience
- ⢠You donāt need another database: A conversation with Reynold Xin (Databricks) and Drew Banin (dbt Labs)
- ⢠Git for the rest of us
- ⢠How to build a mature dbt project from scratch
- ⢠Tailoring dbt's incremental_strategy to Artsy's data needs
- ⢠Observability within dbt
- ⢠The Call is Coming from Inside the Warehouse: Surviving Schema Changes with Automation
- ⢠So You Think You Can DAG: Supporting data scientists with dbt packages
- ⢠How to Prepare Data for a Product Analytics Platform
- ⢠dbt for Financial Services: How to boost returns on your SQL pipelines using dbt, Databricks, and Delta Lake
- ⢠Stay Calm and Query on: Root Cause Analysis for Your Data Pipelines
- ⢠Upskilling from an Insights Analyst to an Analytics Engineer
- ⢠Building an Open Source Data Stack
- ⢠Trials and Tribulations of Incremental Models
Data Analytics in a Snowflake world
Snowflake is at the forefront of changing the way modern data teams work with the Data Cloud.
In this session, Christian Kleinerman, Snowflakeās SVP of Product, joins Tristan Handy, founder and CEO of dbt Labs, for a casual conversation about what the future has in store.
Where does Snowflake go from here? What meta trends and technologies play into that vision? How does that impact the world of data analytics? Christian and Tristan have no shortage of opinions or ideas. This is your chance to hear some of them, live and unfiltered.
Browse this talkās Slack archives #
The day-of-talk conversation is archived here in dbt Community Slack.
Not a member of the dbt Community yet? You can join here to view the Coalesce chat archives.
Full transcript #
Amada Echeverria: [00:00:00] Welcome everyone. And thank you for joining us at Coalesce. Iām Amada Echeverria . I use she/ her pronouns and Iām a developer relations advocate on the community team at dbt Labs. I am absolutely thrilled to be hosting todayās session: Data Analytics in a Snowflake world: A conversation with Christian Kleinerman and Tristan Handy.
Christian Kleinerman is a database expert with over 20 years of experience working with various database technologies. He is currently serving a senior vice president of product at Snowflake and Christian has more than 15 years of management and leadership experience. At Microsoft, he served as general manager of the data warehousing product unit, where he was responsible for a broad portfolio of products.
Most recently, he worked at Google leading YouTube infrastructure and data systems. Christian earned his bachelorās in industrial engineering from Los Angeles university and is a named inventor on numerous [00:01:00] Snowflake patents. Tristan Handy is an unrepentant Snowflake fan. He first used Snowflake in 2017 and says he was pretty grumpy about it because his team at what was then Fishtown Analytics had a hard time getting capitalization and quoting to work right in dbt.
Tristan also happens to be a founder and our CEO at dbt. Itās no secret that Snowflake is at the forefront of changing the way modern data teams work with the data cloud. This 30 minutes session will be a casual conversation about what the future has in store. Weāll tackle questions like where does Snowflake go from here? What meta trends and technologies play into that vision? How does that impact the world of data analysts?
Before we jump into things, some recommendations for making the best out of this session. Iāll chat conversation is taking place in the #coalesce-snowflake channel of dbt Slack. If you are not yet a part of [00:02:00] the dbt Slack community, you have time to join now.
Seriously, go do it. Visit getdbt.com/community and search for #coalesce-snowflake when you arrive. We encourage you to set up Slack and your browser side-by-side. In Slack, I think youāll have a great experience if you ask other attendees questions, make comments to your memes or react in the channel at any point during the session. Tristan has taken questions for question ahead of time on Twitter and LinkedIn.
Our guests may answer some questions coming in live. To kick us off our chat champion, Nikhil Kothari head of technology partnerships at dbt Labs has started a thread to have you introduce yourself, let us know where youāre tuning in from and share your favorite XL function. And yes, you can only choose one.
After the session. Tristan, as well as a few members from the Snowflake team will be available in the Slack channel to answer questions. Letās get started. Over to you Christian and Tristan.
Tristan Handy: Hey, [00:03:00] thanks for the intro. And just to be clear motto was not lying. I wore the Snowflake t-shirt today. This is actually, I donāt wear it that often anymore because itās getting a little bit thin.
I got this citizen event back in 2017 and Iāve worn it so many times that I think that the washer has had its way with it. So thanks for joining Christian.
Christian Kleinerman: Itās great to be here and Iāll make sure to send you a another,
Tristan Handy: Are you still making this color? I like this color.
Christian Kleinerman: Yeah. Yeah. We have him. Iāll get you another one.
Tristan Handy: Nice. And I we definitely did collect a bunch of questions but I have to admit that. I am going to use this session as my own personal opportunity to ask you all of the questions that Iāve always had as a Snowflake user over the years. So maybe weāll get to some other peopleās questions.
Before we get into the serious stuff, Iām curious if you could tell us youāve been at Snowflake for a little while now, and really I think youāre, you joined at a point in time the technology was just maturing. Like you look at the Snowflake user graph and it likes, it started to go upstairs to go up and [00:04:00] then it was just like pink.
And that, I donāt know if you can take all the credit in the world for that but I think that happened right around when you joined. What was that world like? Did you still have the like super nice office? Like how many people worked at the company?
[00:04:13] When you koined Snowflake, what was that world like? #
Christian Kleinerman: Yeah. Letās rewind a been four years next month with Snowflake. When I joined, it was just over 300 people. Most people asked Snowflake what, who is Snowflake? What is it? So definitely it was a, not a well-known I had seen the Sigma paper and I could appreciate the architecture, they, Hey, this going to be interesting. But it was a very small team.
I say 300 something people, but engineering, the tech, or it was maybe 50 people, 60 people, it was smaller.
Tristan Handy: Yeah. When we were in, in the green room, right before this, I made a connection that I had not made before, but I think you probably had a good opportunity to appreciate some of Snowflakeās way of looking at the world.
Certainly [00:05:00] in your days at Microsoft but also at YouTube, there was a thing that happened while you were there called Priscilla. And I remember, I, I probably read this page. After, once it got released into the public, but I remember reading this back in the day and was pretty blown away by it.
Can you say a little bit about what Priscilla is slash was? And to what extent that there is some similarities between that and what you folks are building.
Christian Kleinerman: Yeah. So itās a, Priscilla was a one of many database analytics systems. I Google the original motivation was to power the YouTube analytics system, where YouTube has millions of content creators.
And you want to see howās your video doing? Howās your channel doing your fans? So it was how do you power a very low latency analytics? At a very large scale. Understandably and frankly, the different solutions that were there were not quite meeting the requirements that we had. And thatās what led to the creation of that system.[00:06:00]
Tristan Handy: I remember that the thing that really impressed me about it from reading the white paper was that it wrapped up kind of two different data access paths. Behind a single API. Like you could like a little bit have your cake and eat it too. Which the way that I think about designing databases, Not that Iāve ever done it myself, but the way that I like understand the exercise is that thereās this pretty broad design space.
And you get to pick like, where do you want to locate yourself in that design space? You have to make certain trade-offs like either your batch reads are like truly fantastic, but youāre, less real, you have to trade off all these things. And the way I read the paper was the Priscilla had found a way to have like two points in the design space, but not make it more complicated for users is my understanding of that. And to what extent do you feel like what you folks are doing at [00:07:00] Snowflake right now is also trying to like have a similar, you get multiple points in the design space for users.
[00:07:08] To what extent do you feel like what you folks are doing at Snowflake right now is also trying to you get multiple points in the design space for users? #
Christian Kleinerman: So it is accurate that whenever youāre building one of these platforms, you need to figure out what youāre optimizing for. And thereās probably the biggest trade off is how narrow you are in terms of usage. Versus how bread here in terms of the platform, you can be really good at one use case.
Or it can be maybe a little bit less, less precise on the targeting, but of a broader nature. I think what Priscilla data at YouTube was a reasonably narrow case, and Iām not gonna say ultra an error, but it was a reasonably narrow Instead of query patterns. We understood the data when the data model.
So thatās how you build a system that is quite fast because you understand the effectively, the use case, the data, the patterns, the growth rates, et cetera. Thatās a benefit that platforms like Snowflake donāt have we see [00:08:00] people in, itās not like showing up with a handful of gigabytes of data with very repeatable reports, but also we see petabytes size tables with ad hoc exploration or so recently a seven way join. So I think that may be the biggest difference on how broad of a platform you are. And of course, what we aim to do is make sure that we appeal to a very large set of use cases.
Tristan Handy: Yeah. Iāll connect these dots in a second, but I want to go over to clean rooms.
Clean rooms is like a thing that I have only learned about recently. I have never had the opportunity to use a Snowflake powered cleaning room before. But itās like such a fascinating concept. Can you talk about what this is?
[00:08:45] Talk about Clean Rooms #
Christian Kleinerman: Yeah. So at a high level, a clean room is a collaboration workspace by which one or two or more parties. I can go on do analytics do even machine learning [00:09:00] with that on uncombined data without necessarily seeing each of the other partyās data? The, my, my simplest example is something that I think you and I have talked in the past on if we wanted to figure out which customer is, does dbt Labs install and having common maybe youāre not super incentivized to give me all the list of your customers.
Maybe I donāt want to give you the whole list of my. But how do we identify who are the mutual customers? So we can go and provide a more targeted use case. A clean room is exactly the type of construct where we both get to contribute our data. We get to ask questions on the common data without disclosing each otherās data.
Traditionally, this has been done by a trusted third party. So you hire a company, we both give the data to a company and then they give you the. Weāve done a Snowflake. Itās just part of a bigger journey around collaboration. Four plus years ago, we started doing data sharing, but very quickly we said, you know what?
If we expand the concept to function [00:10:00] sharing, I can give you trust in a function. You can operate on my data without seeing my data. You can give me a function. And that is the foundation. And letās say the clean room concept is red hot in terms of popularity and value, but there are also macro trends, deprecation of third party cookies, IDFA type of a changes, which has created an even bigger set of needs on how do companies collaborate on data without disclosing full data sets to each other.
Iāll pause there. Itās a topic we can use all 30 minutes.
Tristan Handy: Yes. Weāll do it again tomorrow.
So th this is like a very unusual thing, I think, in the world of database functionality. Certainly if you canāt be. The most popular 20 databases in the world. I donāt think you will frequently see this type of feature.
I, the reason I brought it up is that I think it points to a certain uniqueness, [00:11:00] a Snowflake in that. And my guess is that this has to do with your, the way that you see yourself as a cloud first Can you fill in the blanks here? Yeah,
Christian Kleinerman: So youāre right. That the big cloud first and theyāre designed for the cloud is what makes the difference.
And itās what enables something like the clean rooms. Something that, that was very early on way before I was at Snowflake, all credit goes through to our founders. It was, they took every single. Subsystem of a traditional database system and said, how would you rethink this? If you had not only a virtually unlimited resources, but also femoral resources you can instantiate and release resources that traditional databases were always constrained and memory constrained on disk space, like running out of this space is a very real thing.
But the moment that someone tells you assume that you have unlimited this space, you start to define something different and thatās what leads to the, Hey, what if different customers could have compute clusters, [00:12:00] all sharing, same story and substrate. That was the insight that led toward data sharing. And then one thing leads to another one.
And thatās how we ended up with clean room. So what enables it is the cloud to be on it? If you had a server under your desk, With your data. And I have the same thing here. There is no clean room weāre going to do. Cause you donāt have this ability to have the platform that enables the collaboration.
Does that make sense?
Tristan Handy: Yeah, it can. Can you relate this back? So it was itās been a funny experience for me outside to, to watch the banner on the the Snowflake homepage change over time from. Data warehouse in the cloud to cloud data platform to data cloud. Is, are these changes over time related to your own self conception of what it is that youāre building had?
How does it actually manifest in the technology?
[00:12:55] How does what youāre building manifest in the tech? #
Christian Kleinerman: Yeah. Great question. I would say that our positioning, which [00:13:00] is manifested in in. It has generally trailed what the product and what the division had been doing. There were housing for the cloud was a very easy way to explain what we did.
But early on, we have data sharing and data sharing. Didnāt add up as part of the data warehousing terminology. And then we started to see that customers were doing more. I want to do data transformation. I want to go and potentially build an application. And you donāt think of data warehouses for that.
Thatās when we said, okay. Cloud data platform, but very quickly we saw that having great technology in the company. Itās only part of the requirements to having a great data capability. What we think is that the big insight behind the data cloud is we want to deliver great technology. We want to deliver an ecosystem.
Data providers, function, providers, application providers that enrich the experience. Thatās what is the data cloud. But if you look at our product journey, I would say weāve [00:14:00] always been enabling newer things ahead of. I would say the broader positioning or the broader description of what weāre doing.
Tristan Handy: So thereās this kind of broadening that is happening. Maybe data warehouse is like the most narrow definition and overtime youāve understood it to be more and more broad. Whatās the limit. If this is like a limit, a mathematical limit function, like at some point in time, are you going to be competing with Postgres for LLTP workloads to not just, Iām sure that there are some folks that actually do use Snowflake to like, as some type of a transactional system, but I donāt think that youāre going to compete with Postgres today on.
Christian Kleinerman: So I would say you, you asking that question of a limit. So I want to be an answer in terms of art of the possible. What informs, how we think about it is we have a platform where weāre every bit as proud on the technology that we have. And a lot of the emphasis that weāve done for our customers is around data governance.
Data governance includes [00:15:00] security, but also privacy, but also in knowing that understanding data and what we are systematically doing. Understanding, what are the reasons that get customers to deviate from a single central governed platform? So you can say maybe advanced analytics is part of that, then letās figure out how do we bring advanced analytics into the platform?
Maybe there are use cases where someone is copying data out into a Postgres database for delivery of data, actually serving one possible use case. Weāre saying maybe or not, that one is a more concrete. We are expanding around high concurrency, low latency use cases. So you can say itās chipping away from to use Postgres as the example chipping away from the use cases.
But the way we think about it is not weāre going after Postgres, where weāre going after something technology is more, weāre slowly expanding the set of use cases we support and the workloads we see. All in the name [00:16:00] of a helping our customers preserve governance. Thatās what matters. Thatās the north star.
[00:16:05] Snowpark #
Tristan Handy: Gosh, so much to talk about. So little time I want to make sure to get this new park Snowpark is one of the things where when you folks first talked about that, I was just like oh, let me definitely pay attention. The part of the challenge for me following along in Snowpark news is that until recently it has not been focused on my language, my like non sequel language of choice Python.
And I donāt think the Python docs are out yet, but Iām going to be all over them when theyāre out. So I might not fully understand the vision here, but what. As a practitioner, Iām hoping that so park over time gets me to is an environment where I can stop thinking about running any computation on my local device.
I want not only to push my SQL workloads to a data platform, a cloud data platform, I want to [00:17:00] also push predictive workloads to that same platform. And I, my understanding other you correct me. Not yet. Weāre not yet at a place where I can just like input, SK learn and go train, the next great machine learning model, but ma maybe thatās wrong.
Thatās why I want to get to anyway.
Christian Kleinerman: So the directionally youāre right. The motivation behind a park. How do we bring extensibility into Snowflake? And how do we give a choice to our users? Choice goes back to the programming language. Thereās a strong contingent of people on a SQL is my answer to, to, to any problem.
I am a SQL fan and junkie, but Iām also the first one to admit it is not the best tool for certain problems, spaces. Java has. It has its place. And Python has a very long. Plays in the world of manipulating data enrichment through data. So we started with less extent Snowflakes as [00:18:00] that we can bring computation, bring interest in business logic onto the Snowflake.
Again, within that governance boundary of which one category of extensions could be predictive analytics, machine learning, et cetera. I have people that have shown me not only Java-based extensions to do machine learning both training and scoring, but of course thereās a long list of customers lining up to our Python private preview, or weāre in private preview now.
And theyāre lining up to get access, to be able to do this type of analytics. But itās not about only data science. Weāre seeing all sorts of interesting use cases around security and encryption and tokenization. How do you effectively us as a database? Get out of a, we had to produce every single building function and an extension.
How do we enable literally partners? We were strong believers in the ecosystem, but customers themselves, that is the big vision, which really is a prereq for if we want to go bring [00:19:00] additional workloads. How do you do it?
Tristan Handy: I talk a lot about S curves, the concept in the book by Carlotta Perez.
And I see them in my sleep now. So my question is if all technologies are in some ways S curves, where are we along the trajectory of Snowpark .
Christian Kleinerman: I would say from the investment on our side, weāre very far along a state started many years ago, or two and a half, three years ago. The fact that Java is in public preview and Python is in prayer for you thatās the beginning of the end of the delivery, but I think weāre very early on, on the adoption.
The interest that we see is through the roof. Frankly, the technology has been Java in the market for six months. Fivetran a couple of weeks in since our snow day. So a maturity of what weāve delivered, I would say quite far along the adoption of the early on, based on timing, [00:20:00]
Tristan Handy: Itās a, itās such a funny building databases. I feel like itās funny cultural environment. So like we release technology out quickly and iterate on it very quickly with our community. The syphilis story is really like the founders and a small group of engineers were like in the labs for, I donāt know, pretty large number of years before this technology was actually like usable by a real customers.
And thatās like when I started poking around at it. And it seems not unreasonable to imagine that Snowpark might have its own like a gestation period before it like gets to maturity.
Christian Kleinerman: Yeah. So yeah, the, I like to say our founders boarded on insanity when they decided to build a new platform from scratch, literally they started building everything Snowpark I have some early use cases where Iām seeing dramatic, like improvements relative to prior [00:21:00] art, how concise the code is, et cetera.
So yes, you can say back to your. We will see very interesting use cases. We have a snow accelerated program 50 plus companies, partners signing up to deliver solutions on Snowpark. So think of it as a runtime to bring logic, to run closer to the data. And of course, by virtual, how we do it is cross cloud which is compelling for anyone thatās trying to build solutions for data.
Tristan Handy: Okay. I weāre running at a time, but maybe Iāll get your thoughts on this. Overall. I had this edit conversation with Martin on, was it Monday? We the title of the session was How Big is This wave? And we were talking about the persistence and just overall size of this thing that weāre all doing here together.
Iām curious what you think. Five years from now, it looks like, and this is not really a Snowflake question per se. Itās more like, how are people going to be using data in ways that they are [00:22:00] not yet? What problems do we still need to solve? Yeah.
[00:22:03] How are people going to be using data in ways that they are not yet? What problems do we still need to solve? #
Christian Kleinerman: So a, I also compare notes with Martin on some regular basis, and I donāt know who I havenāt watched her your Monday conversation, but Iāll say that weāre very early on as an industry.
On what is going to be possible with data? We have no end of tailwinds as an industry, the amount of data being created is through the roof. And the vast majority of data is, has been created the last five, five years. They acknowledgment, acknowledgment or realization that competitive advantage is based on data.
Thatās only starting to down into every single industry. Iāve heard they yeah. Software using the world. I think data is eating the software. And the last one is cloud. The cloud is forcing everyone to rethink systems, rethink how things are being done. Weāre very early on it as an industry. And I think the opportunity for all of us in the data world, Itās huge.
Tristan Handy: I very much agree and itās so like for me, thereās this [00:23:00] clarity of what the next 12 months is going to look like. And then thereās this clarity of what things will look like in a decade. And then thereās this, I have this like uncertainty around specifically, like how.
Our community analytics, engineers data analysts. How do their jobs look differently in the three to five year time horizon? I think that thereās only good things involved but itās probably not just going to be like folks like churning out dashboards and just like working ever faster than any.
Christian Kleinerman: So something that we obsess about and youāre also a big part of it is help everyone focus on the data model, the metric definition, not on the tools and the infrastructure. We jointly want to take care of that. So I think weāre going to see a huge focus on what is the metric definition that will give me the competitive advantage or what is the timeliness to my decisions and my insights.
Massive improvements on all of these dimensions so much to be done.
Tristan Handy: Awesome. Christian, thanks so much for joining us. [00:24:00]
Christian Kleinerman: Tristan, thank you for having me here. Itās awesome to be part of Coalesce. Thank you.
Last modified on: Apr 19, 2022