Today, we are producing data like never before. Digital means data. Every digitized function is data. As our Businesses, society, governments, offices, hospitals get digitized, we collect more data. Yet as the mountain of data grows, are we able to keep pace with it? Is everyone able to make use of vast amounts of information? Does data automatically mean insight? Are we creating taller walls around us by capturing but isolating data? Are traditional methods and systems enough to tackle such massive amounts of data? In This Article, we will explore some of those aspects as we look at the Snowflake platform.
The most powerful tech companies are the ones that have the most data on customers. Data covers thousands of attributes across dimensions- Past Purchases, Demographics, 3rd party data, so on. When good data meets great analysis, magic happens. It helps build better products & experiences. It creates value for customers, for the company, and its shareholders. Netflix, Spotify, American Express, Tesla, Amazon are a few famous examples.
Hence crunching massive amounts of data, and turning it into insights, and putting it to the right use, has the potential to turn around fortunes.
But a lot of companies use traditional tools and frameworks to get, process, secure, and store the data. This has traditionally isolated the data sets. Some use modern tools but turn their data initiatives into data swamps, making them too expensive and impossible to govern.
Snowflake started building a data platform quietly in 2012. Gartner, Forrester were tracking Snowflake for several years. They compared it to Teradata, AWS, GCP Big Query, and many other Product offerings. From a challenger in 2016/17 to the leader quadrant in 2018, Snowflake moved quickly. It tripled its customer base and gave triple-digit YoY growth. Then Snowflake became the poster boy of the tech and data world with its IPO in 2020.
So, what fueled this crazy growth??
A modern data platform like no other.
Let me break Snowflake's strategy into 3 buckets:
- On-premise support. Snowflake came in a 'Cloud' only Avatar.
- Snowflake separated the Compute (processing, Querying) and the Storage (hold the data) tiers. This was an Industry first Innovation.
- Snowflake came up with data cloud. This would enable seamless & global sharing of data on a common platform. This would pave way for the Data economy and term coined as 'Data Marketplace'.
- Timely and high-impact use of data exchange came along at the right time. Data Exchange enabled the world's largest repository of HIPPA Compliant, Anonymized COVID19 data. It featured High-Quality data, Security, and Governance. Data From 30 healthcare companies across the world, shared in real-time for COVID 19 Research.
- Enterprises and Experts got used to elastic cloud computing and pay by usage models. Snowflake stuck to those industry practices.
- Enterprises trusted Cloud providers and their offerings. Snowflake inherited the reliability factor and extended it to become 'cloud agnostic'. It offered AWS, Azure, or Google cloud as a choice for storage.
Most of the enterprise cloud ecosystem is hybrid in nature and multi-cloud focused. They use some combination of AWS, Azure, Google, or Oracle offerings. Each cloud vendor's product works best within their ecosystems. It works "ok" when paired with other vendor's products. Also, not all products from one Vendor are the best. Google has strengths in deep learning, AI & ML. AWS has depth and breadth, with an almost confusing toolset, and so on.
Image: AWS's Reference architect for Data Lake
Image: Azure's Enterprise data warehouse architecture hosted on Azure Pipelines and storage
Leaders, Architects, Operations & Security SMEs intend to find the best solution set.
Every hybrid and multi-cloud architecture involves:
- Integration decisions
- Elaborate tool selection, Pros Vs Cons
- OpEx, CapEx, and ROI Analysis
- and hence, Significant Complexity
Building scalable products with hybrid, multi-cloud architecture is a builder's dream come true.
But there is one small problem!!!
Not every organization can afford to spend precious time & $ on laborious architectural undertakings.
The cocktail of tools and products needs significant up-keep for meaningful performance. Also, the niche toolset means an expensive and tiny expert pool.
They would rather focus on solving customer problems, improving Operational efficiency, and innovate. Only if somehow, they could get their data right.
For such an organization, data strategy becomes an expensive and painful area to tackle.
Snowflake shines as it is geared towards business impact. It empowers business analysts, CX gurus, who may be non-technical with data self-service. Snowflake makes for a compelling case with the 'builders' too, though abstracts the heavy lifting of engineering.
Here are benefits sighted by customers. They compared Snowflake to on-prem Hadoop or Cloud Data Lake, Data Warehouse solutions.
- Cost savings & Profits from accelerated time to market for new products and enhancements
- Speeding up decisions by putting data in hands of business users and decision-makers
- Productivity gains from Simplified data Operations. Self-Service, Data Discovery, and Data Sharing.
- Infrastructure and data management cost savings
- Monitoring Compliances and risks
Let's look at the top features of snowflake
Breaks Silos across organizational data- ulTP DBs, Enterprise applications, IoT, Web, and logs or 3rd party data.
- Brings the best of shared-disk and shared-nothing architecture
- Runs with AWS, Azure, or GCP. Abstract data engineering.
- Provides data cloud. It is now known as a data marketplace. It connects hundreds of companies that could share their data. This opens new frontiers for revenue generation & global innovation.
- Industry-first Innovations
- Separate Storage and Compute.
- Security for business at Access, Authentication, Authorization, Data, and infrastructure level.
- Streamlines data governance, Access & sharing across and outside the organization
- Killing the use of copies(redundancies) of data for sharing and other purposes.
- Schema on Read. Schema on Write.
In the End
While Moore's law paved way for innovations in past decades, Data insights from Applied ML at scale will drive the next wave. The power of data is just beginning to unleash. Snowflake promises to be an excellent platform for that purpose. Data Marketplace promises to lead the global data discovery, as more companies bring and share their data sets online. Larger efforts like COVID 19 or Cancer Research, or innovations in Aviation, Automotive, or Manufacturing will further fuel the data economy. However, there are headwinds for Snowflake, with the cloud vendors pushing their products like Redshift, Big Query harder and everyone focused on data. Only one thing is for sure, the road ahead is about to get interesting and rewarding for the enterprises and end customers.
Additional Reads and References
- Gartner “Magic Quadrant for Data Management Solutions for Analytics” by Adam M. Ronthal, Roxane Edjlali, Rick Greenwald.
- Gartner, “Gartner Peer Insights ‘Voice of the Customer’: Data Management Solutions for Analytics Market,” Peer Contributors. August 6, 2018.?
- Snowflake dataset for corona virus research: https://www.snowflake.com/coronavirus-data-sets/
- Azure architecture for data lake: https://docs.microsoft.com/en-us/azure/architecture/sulution-ideas/articles/enterprise-data-warehouse
Saurabh Mittal May 18, 2021