This article was originally written by Clive Bearman and appeared on the Qlik Blog here: https://blog.qlik.com/estimating-the-cost-of-a-cloud-data-warehouse
Consequently, imagine the culture shock when I came to States in the late '90s and the dotcom boom was at its peak. Money conversations were virtually mandatory. Startups obsessed over raising capital, or “talked-up” valuations and salivated over making millionaires from going public. Suffice it to say, I quickly got over my embarrassment and learned that money conversations were crucial to running a good business.
Fast forward to today and contemporary attitudes are a little different. Modern businesses are concerned with obtaining the most value possible from money. Key enablers are cloud elasticity, subscription IT services and consumption billing. The effect on traditional software markets is truly transformational. You’ve only to look at the data warehouse market to see evidence of this sea change.
Enter Cloud Data Warehouses
Traditional data warehouses were installed in data centers, with budgets in the millions of dollars for hardware, software and the professionals to maintain them. Value for money was often marginal, because insights were often slow to materialize. Traditional data warehouses required the right foresight to design optimal data structures, needed to churn through vast quantities of data to deliver insights, and were difficult to adapt to changing business requirements. Consequently, data warehouses were often the domain of large organizations.
Today, modern cloud data warehouses are within the reach of virtually every company. Startup costs are generally small, and the overall operational costs are significantly less than traditional alternatives. With cloud data warehouses, on-boarding became a breeze. As a result, you could spend less time on administration and more on data analysis.
Figuring Out The Costs
Once you’ve decided to build a cloud data warehouse the next step is figuring out how much it's going to cost. There are several components to consider.
- Initial set up cost
- Cloud data warehouse service
- Implementation cost
- Data warehouse administration, operation and optimization training
- On-going operational costs
- Network charges
- Premium technical support
- Miscellaneous costs
Top Cloud Data Warehouse Pricing Structures
Although there are dozens of well-known cloud-based data warehouse vendors, the following is a brief overview of the pricing structures of the top vendors.
Amazon advises that you first choose a cluster and node type configuration to suit your needs. Don’t worry if you want to change, as you can easily scale your nodes or switch between node types with a single API call or a few clicks in the Amazon Redshift console. Amazon offers two types of pricing model: on-demand and reserved instances. Reserved instances typically have a larger discount; however, they require a bigger up-front investment. Amazon offers three types of Redshift nodes: RA3, DC2, DS2 – each has an optimal use case.
If your data warehouse is likely to be less than 1 TB, Amazon suggests you choose DC2 nodes, which run from $0.25/hour to $4.80/hour. If you require lots of storage, then they suggest using RA3 nodes with managed storage. RA3 nodes cost from $0.85/hour to $6.80/hour, with an additional charge of $0.024 per GB/month for managed storage.
Azure Synapse is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data, using either serverless or provisioned resources. Data storage is charged at the rate of $122.88 per TB of data processed ($0.17/1 TB/hour). Data storage includes the size of your data warehouse and seven days of incremental snapshot storage.
Azure Synapse SQL has its own method for calculating compute resources called Data Warehouse Units (DWU). Azure has a sliding scale from 100 DWUs costing $1.20/hour to 30,000 DWUs costing $360/hour.
The two main components of Google BigQuery pricing are storage and compute. Storage has 2-tiered pricing.
- Active – A monthly charge for data stored in tables that have been modified in the prior 90 days.
- Long-term – A lower monthly charge for tables not accessed in the prior 90 days.
Active storage starts $0.02 per GB/month. The first 10GB is free each month. Any data that isn't accessed for 90 days is automatically moved to long-term storage, which costs $0.01/GB/month.
Compute usage is called Query pricing and refers to the cost of running SQL commands, user-defined functions, and qualifying Data Manipulation Language and Data Definition Language statements. Query pricing also has two pricing models
- On-demand – You only pay for the queries you run.
- Flat-rate pricing – Offered in per-second, monthly or annual commitments.
On-demand pricing charges for the number of bytes processed. Pricing starts at $5 per TB/month; the first TB is free.
The Snowflake architecture separates data warehousing into three distinct layers: storage, virtual warehouses (compute) and cloud services. Snowflake pricing is based on the actual usage of these layers and of serverless features.
All customers are charged a monthly fee for the data they store in Snowflake, and it’s based on the average amount of storage used per month. A virtual warehouse is one or more compute clusters that enable customers to load data and perform queries. Customers pay for virtual warehouses using Snowflake credits. How much a virtual warehouse costs depends on size. The smallest is XS and costs one credit per hour. A 4XL virtual warehouse costs 128 credits per hour. The cloud services layer provides all permanent state management and overall coordination of Snowflake. You also pay for cloud services using Snowflake credits at a 10 percent discount rate.
Snowflake is different from the other vendors since it doesn’t run on its own infrastructure. You have the choice to run on Amazon, Azure or Google. As a result, you should be aware that there’s some slight variation in pricing between underlying cloud infrastructure vendors. Qlik offers a free QlikSense app to help Snowflake users understand their usage costs. For more details, follow this link.
Cloud Data Warehouse Pricing Summary
Cloud data warehouses have a much lower barrier to adoption and are more affordable than traditional on-premises warehouse solutions. However, the flexibility of on-demand subscriptions and pay-as-you-go pricing can be confusing and opaque. To reduce surprises and improve visibility, users should expect to invest a significant amount of time researching variabilities for the most accurate operational cost estimates.