It costs too much!
I didn’t expect this bill!
Why am I being charged for this?!
How do I know what I’ll be spending next month!
These are all common queries I hear from customers concerned about their cloud spend when transitioning away from traditional infrastructure. IT teams (as well as finance depts.) are relatively comfortable justifying expenditure every 3, 5 or 7/10 years (if they’re unlucky!) to perform a refresh of their infrastructure hardware, generally through capital based expenditure. However when transitioning to the cloud and operational based expenditure (in most circumstances) this makes the spend much more transparent and therefore increases the accountability on IT teams to justify costs.
Note: Hints and tips are at the bottom, if you want to skip ahead!
Understanding your existing IT ops cost
There are many hidden costs associated with the operation of on-premises infrastructure, some of which are hidden entirely from IT. Broad generalisation incoming! For the most part, (and there are lots of exceptions) IT teams are very good at understanding the costs associated with the components that make up the infrastructure, e.g. servers, racks, network devices, storage, cabling etc. When it comes to other costs such as estates, power, cooling, security and the associated costs with maintaining and managing infrastructure – these are often less understood.
This makes understanding the true cost (per hour or minute) of running an application or service difficult to understand and thus it is different to perform a like for like comparison against a typical cloud service, such as a virtual machine, or database.
As an example let’s take the requirement to run a workload running on a single Windows virtual machine with 4 cores, 16GB memory and 500GB data. Through the Azure pricing calculator this is quick to model:
The above demonstrates clear ops based pricing on a monthly and yearly basis (yes, other components may be required, e.g. VPN gateway) to run the virtual machine workload. This factors in all ancillary costs for Microsoft to run the workload on their infrastructure. Performing the above activity on-premises is much more difficult as you need to understand all the ancillary costs discussed earlier in the post (e.g. estates, power, cooling, etc.), whilst also attempting to break this down into a catalogue of services that you can price individually (e.g. virtual machine, website, database) in order to directly equate costs.
Due to commodities of scale in the hyper-scale cloud platforms like Azure, it is unlikely that you will be able to compete (unless you own your own estate, generate your own power, and manufacture your own hardware!). The following illustrates the catalogue of services available in Azure, each individually priced:
It is my opinion that the lack of appreciation or understanding of the true cost to operate services on-premises often leads to some of the concerns I discussed in the opening paragraph (remember, it costs too much!) – however this is not the only reason. Many times organisations have a legitimate concern over their cloud spend due to a lack of understanding of the nature of PAYG cloud as well as not using methods that are available to them to get the best possible value out of their cloud spend.
The following paragraphs detail a number of techniques, solutions and methods (some of which only made available recently) to help reduce your spend in Azure through optimising your services and playing cloud at its own game!
Hints & Tips
Remember you are paying as you go (PAYG) in the cloud. Dependent upon the resource type, cloud providers charge per minute or hour. Azure is largely per minute for most resources and is more granular than most providers. It is key to remember this as you can greatly reduce spend by keeping it simple and turning off workloads when not required! Specific to virtual machines, a good example includes domain controllers that typically receive much less demand out of core hours. Other examples may include servers that are part of a load balanced farm, again similar principles applies in that it can be powered off if you know demand has fallen. Again technologies such as Azure Automation (free for the first 500 minutes per month) can be used to do this on a schedule so you don’t even have to remember! Equally, take a look at dev/test labs to help reduce and control your development spend.
Leverage PaaS technologies rather than sticking with tried and tested IaaS workloads. PaaS workloads typically have much more granular billing (i.e. databases in Azure are priced around a DTU, or e-DTU if you want to be all elastic) – by transforming applications to make them cloud-native this can help to better control spend, whilst having other benefits such as increasing agility.
Make use of ‘Reserved Instances’, recently introduced by Microsoft which can reduce spend by up to 72%. This is a game changer for those workloads you know are consistently required, i.e. will be around for 1/3 years. Dependent upon how long you want to commit, Microsoft will provide hefty discounts. Find out more here.
Leverage Azure Hybrid Use Benefits either standalone or in conjunction with Reserved Instances to receive even greater discounts (up to 82% as seen in the figure above). If you have existing Windows licenses with Software Assurance, then these can be leveraged with more information here.
Right-size your workloads, don’t just lift and shift as-is! When you have on-premises virtualisation clusters, VM sprawl fast becomes a big problem, and typically because there is no accountability (usually) for the number of CPUs or Memory allocated to a virtual machine then you often see over provisioned workloads, and lots of them with clear mismatches between CPU/memory (CPU wait, anyone?) Key guidance in this post is to ‘right-size’ your workloads. Analyse them using a tool (Azure Migrate comes to mind) to understand utilisation and then move them to the most applicable Azure VM series.
Get rid of VM sprawl before migration… as with the above recommendation, many VMs do not need moving. Have a hard and fast rule that states you will only migrate what you know is required. Anything else stays on-premises and is powered off at a suitable time. This will avoid your sprawl becoming an expensive sprawl in the cloud.
Understand Azure VM series types as all VMs are not created equally! Azure has a catalogue of VM types canvassing the alphabet. Review the following link and ensure you choose an applicable VM for the workload you are running. For instance, if you require high compute, then an F series may be best, for I/O intensive workloads then look at the Ls series. This leads me nicely to the B-series VM (recently announced as GA in many regions)
Look at what the B series VM can do for you when you have workloads that are very burstable from a CPU perspective. The B series is a cost-effective type for workloads that burst in their performance, e.g. don’t require continuous performance of their CPU. When B series VMs are not using CPU (e.g. in low periods), the VM is building credits. When you have enough credit the VM can burst to 100% of the available CPU. The base price of these VMs are much cheaper than comparative virtual machines
Take a look at Cost Management and Billing (and Cloudyn). Azure has made great strides in providing excellent capabilities available to all users to help manage existing and future spend. Thanks to the acquisition of Cloudyn, Microsoft have introduced these technologies into the Azure portal with reports to help you monitor spending to analyze and track cloud usage, costs, and trends. This capability is free for Azure usage, but can also manage 3rd party cloud systems (e.g. AWS) as a chargeable extra. But we’re trying to save money here, right – not spend more!
There are many more techniques and methods that you can use to optimise your spend. You could look at using even more modern technologies such as those in the serverless space. Serverless technologies are the nirvana as generally you only pay when the service is being used, as opposed to PaaS which typically heralds a core cost for the type of plan you acquire. Equally, ensuring you are managing your platform in a robust fashion, through modern infrastructure-as-code techniques will help to prevent abuse seen through over-provisioning.
I hope this has provided some useful recommendations/guidance to help you gain more control over your cloud costs, specific to Azure (principles apply across other clouds too) and provided some tips on how to reduce spend where applicable! Hopefully this will help to reduce some of the frustration as done right, cloud can deliver on the cost savings you anticipated whilst also giving you access to all the other benefits.