How did Snowflake succeed?
When looking at how major change might happen in the future, it helps to look back on previous shifts and reflect on how they happened.
Snowflake’s success is one such example. But it wasn’t an overnight one. It took about five years of work until the product was viable for production workloads.
Once it was ready, though, it found a hungry customer base. The only solutions for processing data at massive scale at that time were Hadoop and Amazon Redshift. Hadoop was notoriously hard to use and many teams failed to adopt it successfully. Meanwhile, Redshift did what it did decently – but some teams discovered that performance suffered past a certain point.
Snowflake focused on delivering a great product that filled these gaps. Internally, the company also focused on its corporate values – i.e., on making Snowflake a great place to work.
However, even after baking for five years, Snowflake initially kept its focus on workloads in areas such as marketing, advertising, and online gaming. These were areas where regulatory requirements existed but weren’t as advanced as in, say, finance or health care.
It took a few more years of working with potential customers such as Capital One until the product could evolve to handle these more compliance-intensive workloads. The result of that effort was Virtual Private Snowflake, which gave Snowflake an architecture with which to handle highly sensitive workloads.
This long journey isn’t atypical for a database product. Bob Muglia, the former CEO of Snowflake, saw this same time lag at SQL Server, where he was one of the original product managers. And Microsoft’s current end-to-end analytics solution, Microsoft Fabric, is essentially a version 3 – and that’s from a company with rich resources and a background in data storage at scale.
Long timelines pose an inherent danger to any data product, as decisions made early on can become limiting constraints down the road. As Muglia puts it: “Where you end up is a function of where you start.”
Listen & subscribe from:
Are we returning to on-prem?
Snowflake was developed explicitly to run in the cloud. At the time, this was a radical decision, as cloud adoption was still in its infancy.
Today, we see some companies pulling some of their workloads back on-premises. Is this a trend? Should we expect to see a mass migration back to self-funded data centers? Not necessarily.
Many of the workloads moving back to on-prem are operational in nature. These are traditionally more static in terms of required storage capacity and computing power. That means it’s easier to predict cost – and that you have less need for the cloud’s on-demand, utility computing model.
DropBox, which famously moved its entire system back on-prem, is a special case due to its intense storage needs. Most other companies simply won’t be able to match the massive capital expenditures that companies like Amazon and Microsoft are pouring into their public cloud infrastructures.
In other words, for most companies, it’ll still make sense to keep most of their compute spending as OpEx rather than shift it back to CapEx.
Do we need artificial general intelligence?
Muglia talked a little about his new book, Datapreneurs, which tells the story of how both data technology and the data marketplace have evolved over the past decade. While writing the book, however, he found himself in the middle of yet another paradigm shift: artificial intelligence (AI).
Large language models (LLM) have turned the industry on its ear, changing the way that we approach data. Previously, data engineering was about extracting knowledge from data: summarizing information and drawing conclusions. Now, we have intelligence in our machines that can reason and draw conclusions based on that data.
But while we have AI, we still don’t have artificial general intelligence (AGI) – i.e., the ability of a computing system to learn and perform any task that a human can. The rise in AI means the horizon for AGI may be dramatically pushed up. Muglia predicts it happen around 2030, as opposed to 100 years from now.
Harnessing AI-generated insights and combining them with intelligent business models alone has the potential to revolutionize how we do business in the coming years.
Is data engineering really just business engineering?
What comes next in data engineering, according to Muglia? In some respects, we’ve come a long way. Data used to be scattered all over a company, with questionable quality and even more questionable security controls. Now, the industry is trending toward bringing it all together in data platforms, data lakes, data lakehouses, and data catalogs, with appropriate governance controls. This unification of data – and the insights we can cull from it – have proven incredibly valuable.
But not everyone’s there yet. Muglia noted that when he polled a room of 100 IT managers in 2022, only around 30% of them said they had implemented some element of the modern data stack.
Data engineering is also still a very low-level effort. It’s more akin to operating forklifts and bulldozers than to, say, the seamless process whenever you transfer data to a new iPhone.
As the rest of the industry comes up to speed with the modern data stack, we’re likely to see a few shifts. The first will be a shift away from SQL as a direct query language. Business Intelligence (BI) tools will increasingly support querying in human languages such as English; SQL will become the intermediary syntax that the machine understands.
The next shift will be a move from data engineering to business engineering, Muglia believes. Instead of focusing on modeling data, engineers and analysts will focus on modeling the business, including its functions and processes, and then derive a data model from the business model. This data model becomes the desired state for the business; systems can then work together to ensure the actual state of the business matches this desired state.
Muglia believes the semantics of SQL aren’t capable of expressing a complete model of a business, and he argues for a new database format more akin to a relationship knowledge graph database than a traditional, relational database.
The end goal is to create a working model of a business that both humans and machines can understand. That model itself will likely be generated by a combination of human and machine effort, as it’s too complex for humans to create solely by themselves.
No one can predict with certainty where the data market will end up in 10 years. After all, few people predicted the rapid growth of AI technology.
However, based on current trends, we’re likely to see the modern data stack grow more sophisticated and easier to use as it expands to model higher-level business concepts. That future will likely be forged, not just by humans, but by human-machine interaction as AI technology continues to mature.
Last modified on: Nov 22, 2023