This June, I had the privilege of being one of around 10,000 people that converged on Caesar’s Forum in the Las Vegas strip for Snowflake’s annual Summit.
I learned a ton, and I’ll get to that shortly. But above all else, my lasting impression was… just how much of an absolute delight it was to see so many other members of the data ecosystem in person. I of course already knew, at some abstract level, that these were real human beings that existed in physical space outside of Twitter, Zoom, and Slack. But there is something tangibly different about seeing people in the flesh, shaking their hands, and having them fail to laugh at my jokes.
There was already buzz in the air with so many people gathered together in person after so long, and it became even more pronounced after Snowflake’s keynotes. The attendees I spoke to agreed that what stole the show was a series of major Snowflake product announcements.
Snowflake is, of course, the most frequently used data platform by the dbt community, and one of the most prominent names in the modern data stack. What they do ultimately is of interest to everyone in the space. With that context, here are some of the main highlights from my time at their Summit.
Get ready for the Snowflake app marketplace
Users have already had the ability to purchase data sets from the marketplace. This was cool: who doesn’t like a good data set? But Snowflake is going a step further, getting ready to allow users to build, distribute, and monetize data apps, leveraging the Snowflake Native Application Framework.
Or put slightly differently: why stop at just sharing data, when you can take things a step further and share complete experiences as well? Martin Casado of a16z once predicted that “all apps are just going to be reimplemented on top of the data layer,” and this felt precisely like another step in that direction.
As anyone that owned a Windows Phone (RIP) will tell you though, app stores are only as useful as the apps built on top of them. Time will tell where Snowflake’s attempt ultimately lands, but they seem to at least be hitting the right notes with getting customers and partners on board to develop apps.
Snowflake’s recent acquisition of Streamlit, a data app sharing platform now integrated with Snowflake, was on display. A Streamlit co-founder demo’d what the experience of deploying (and sharing) a modular data app that predicts ROI from paid ads might look like. Hint: it looked cool.
Here’s how Snowflake presented their ambitions on a neat timeline:
- 2014: Disrupt analytics
- 2018: Disrupt collaboration
- 2022: Disrupt application development
Snowflake’s move toward an app marketplace is the thing that had the most people buzzing. It feels like it could be the one big announcement that defines the next several years of Snowflake’s trajectory, and perhaps even the modern data ecosystem as a whole.
Embracing new workloads
Along the same vein, Snowflake is continuing their product shift away from being “just” a warehouse to being a platform that can accommodate whatever data workloads your team has a need for.
Unistore, powered by hybrid tables, is a new type of workload that allows teams to put transactional and analytical data together in one place — previously, this would have required two different databases each tuned for a specific use case. This means that you’ll be able to build transactional business applications directly on Snowflake and run real-time queries, taking advantage of Snowflake’s speed and its consistent approach to governance.
Snowpark welcomes Python
It’s no secret that Snowflake wants more people to build more applications and run more workloads on Snowflake. ML/AI workloads are their stated next frontier. Python is, of course, the programming language in which predictive modeling applications are most commonly developed.
Which makes this much awaited announcement quite a logical one: Snowpark now supports Python, in addition to the previously supported Java and Scala. Unexpected? Perhaps not, but this is a big deal all the same. If things work out as they’ve planned, this should make it a lot easier to do data science work inside Snowflake without needing to move data around.
Streaming gets an upgrade
Snowpipe streaming is a new way to land data inside Snowflake, giving you more options with how you ingest streaming data. Importantly, latency on data ingested by their Kafka connector has been drastically reduced, by up to 10x, meaning that once data lands in Snowflake you can query it way sooner.
Snowflake also announced materialized tables – an option that provides way more simplicity than streams and tasks, but more flexibility than materialized views. Materialized tables are declarative pipelines – meaning you can simply describe what you want to do, and not worry about how it needs to happen.
This will have big implications for some larger dbt projects as well, with the intriguing possibility of lambda views now being replaced by just a few short lines of code.
Native Iceberg Tables in Snowflake
Snowflake has supported external tables (i.e. tables of data that live outside Snowflake) for some time, and now they are taking this further by offering support for Native Iceberg Tables. Iceberg tables are a performant way to make it possible for a variety of tools and engines to work safely with the same external tables at the same time.
Iceberg tables also give you the ability to do some of the things you’re used to doing inside Snowflake on external tables as well – things like time travel and data replication across clouds.
Most importantly, Apache Iceberg is an open source file format that we’re delighted to see Snowflake embrace. For organizations with an eye toward scaling for the future, avoiding the risk of getting locked into a proprietary file format is a big deal.
The dbt community showed up in a big way
dbt Labs was a major sponsor of Snowflake Summit. Officially, we had a single 20x10’ booth and a single 45-minute sponsored talk on the agenda. Unofficially… the dbt community was out in force, and dbt was present just under the surface throughout the show.
One of the more popular varieties of talks was the customer-led session in which a data team talks through some of their internal processes and tooling. These were invariably fascinating. They also almost invariably included an architecture diagram outlining the tools they use. dbt was in… a lot of these. I counted over 10 talks that mentioned dbt in some way.
Meanwhile, our booth was one the most popular ones at the Summit. It was constantly crowded, and were it not facing an empty space in the hall, would probably have been a fire hazard. Security had to come and shut it down 40 minutes after expo hall hours ended on the first full expo day.
Roger Fried from Optum gave a talk about their Snowflake + dbt Cloud journey and Amy Chen of dbt Labs was a partner voice on Snowflake’s session introducing materialized tables — both sessions ended up being standing room only. An additional “Snowflake + dbt best practices” session we co-hosted had a line out the door 100 yards long.
Are these signs that dbt has officially “made it” as an industry standard? It certainly felt like it in Las Vegas.
Last modified on: Jul 6, 2022