Hints and tips to optimise your Azure costs!

Frustration

It costs too much!

I didn’t expect this bill!

Why am I being charged for this?!

How do I know what I’ll be spending next month!

These are all common queries I hear from customers concerned about their cloud spend when transitioning away from traditional infrastructure. IT teams (as well as finance depts.) are relatively comfortable justifying expenditure every 3, 5 or 7/10 years (if they’re unlucky!) to perform a refresh of their infrastructure hardware, generally through capital based expenditure. However when transitioning to the cloud and operational based expenditure (in most circumstances) this makes the spend much more transparent and therefore increases the accountability on IT teams to justify costs.

Note: Hints and tips are at the bottom, if you want to skip ahead!

Understanding your existing IT ops cost

There are many hidden costs associated with the operation of on-premises infrastructure, some of which are hidden entirely from IT. Broad generalisation incoming! For the most part, (and there are lots of exceptions) IT teams are very good at understanding the costs associated with the components that make up the infrastructure, e.g. servers, racks, network devices, storage, cabling etc. When it comes to other costs such as estates, power, cooling, security and the associated costs with maintaining and managing infrastructure – these are often less understood.

This makes understanding the true cost (per hour or minute) of running an application or service difficult to understand and thus it is different to perform a like for like comparison against a typical cloud service, such as a virtual machine, or database.

As an example let’s take the requirement to run a workload running on a single Windows virtual machine with 4 cores, 16GB memory and 500GB data. Through the Azure pricing calculator this is quick to model:

clip_image002

The above demonstrates clear ops based pricing on a monthly and yearly basis (yes, other components may be required, e.g. VPN gateway) to run the virtual machine workload. This factors in all ancillary costs for Microsoft to run the workload on their infrastructure. Performing the above activity on-premises is much more difficult as you need to understand all the ancillary costs discussed earlier in the post (e.g. estates, power, cooling, etc.), whilst also attempting to break this down into a catalogue of services that you can price individually (e.g. virtual machine, website, database) in order to directly equate costs.

Due to commodities of scale in the hyper-scale cloud platforms like Azure, it is unlikely that you will be able to compete (unless you own your own estate, generate your own power, and manufacture your own hardware!). The following illustrates the catalogue of services available in Azure, each individually priced:

clip_image003

Appreciating Cloud

It is my opinion that the lack of appreciation or understanding of the true cost to operate services on-premises often leads to some of the concerns I discussed in the opening paragraph (remember, it costs too much!) – however this is not the only reason. Many times organisations have a legitimate concern over their cloud spend due to a lack of understanding of the nature of PAYG cloud as well as not using methods that are available to them to get the best possible value out of their cloud spend.

The following paragraphs detail a number of techniques, solutions and methods (some of which only made available recently) to help reduce your spend in Azure through optimising your services and playing cloud at its own game!

Hints & Tips

Remember you are paying as you go (PAYG) in the cloud. Dependent upon the resource type, cloud providers charge per minute or hour. Azure is largely per minute for most resources and is more granular than most providers. It is key to remember this as you can greatly reduce spend by keeping it simple and turning off workloads when not required! Specific to virtual machines, a good example includes domain controllers that typically receive much less demand out of core hours. Other examples may include servers that are part of a load balanced farm, again similar principles applies in that it can be powered off if you know demand has fallen. Again technologies such as Azure Automation (free for the first 500 minutes per month) can be used to do this on a schedule so you don’t even have to remember! Equally, take a look at dev/test labs to help reduce and control your development spend.

Leverage PaaS technologies rather than sticking with tried and tested IaaS workloads. PaaS workloads typically have much more granular billing (i.e. databases in Azure are priced around a DTU, or e-DTU if you want to be all elastic) – by transforming applications to make them cloud-native this can help to better control spend, whilst having other benefits such as increasing agility.

Make use of ‘Reserved Instances’, recently introduced by Microsoft which can reduce spend by up to 72%. This is a game changer for those workloads you know are consistently required, i.e. will be around for 1/3 years. Dependent upon how long you want to commit, Microsoft will provide hefty discounts. Find out more here.

clip_image004

Leverage Azure Hybrid Use Benefits either standalone or in conjunction with Reserved Instances to receive even greater discounts (up to 82% as seen in the figure above). If you have existing Windows licenses with Software Assurance, then these can be leveraged with more information here.

Right-size your workloads, don’t just lift and shift as-is! When you have on-premises virtualisation clusters, VM sprawl fast becomes a big problem, and typically because there is no accountability (usually) for the number of CPUs or Memory allocated to a virtual machine then you often see over provisioned workloads, and lots of them with clear mismatches between CPU/memory (CPU wait, anyone?) Key guidance in this post is to ‘right-size’ your workloads. Analyse them using a tool (Azure Migrate comes to mind) to understand utilisation and then move them to the most applicable Azure VM series.

Get rid of VM sprawl before migration… as with the above recommendation, many VMs do not need moving. Have a hard and fast rule that states you will only migrate what you know is required. Anything else stays on-premises and is powered off at a suitable time. This will avoid your sprawl becoming an expensive sprawl in the cloud.

Understand Azure VM series types as all VMs are not created equally! Azure has a catalogue of VM types canvassing the alphabet. Review the following link and ensure you choose an applicable VM for the workload you are running. For instance, if you require high compute, then an F series may be best, for I/O intensive workloads then look at the Ls series. This leads me nicely to the B-series VM (recently announced as GA in many regions)

clip_image005

Look at what the B series VM can do for you when you have workloads that are very burstable from a CPU perspective. The B series is a cost-effective type for workloads that burst in their performance, e.g. don’t require continuous performance of their CPU. When B series VMs are not using CPU (e.g. in low periods), the VM is building credits. When you have enough credit the VM can burst to 100% of the available CPU. The base price of these VMs are much cheaper than comparative virtual machines

Take a look at Cost Management and Billing (and Cloudyn). Azure has made great strides in providing excellent capabilities available to all users to help manage existing and future spend. Thanks to the acquisition of Cloudyn, Microsoft have introduced these technologies into the Azure portal with reports to help you monitor spending to analyze and track cloud usage, costs, and trends. This capability is free for Azure usage, but can also manage 3rd party cloud systems (e.g. AWS) as a chargeable extra. But we’re trying to save money here, right – not spend more!

Summary

There are many more techniques and methods that you can use to optimise your spend. You could look at using even more modern technologies such as those in the serverless space. Serverless technologies are the nirvana as generally you only pay when the service is being used, as opposed to PaaS which typically heralds a core cost for the type of plan you acquire. Equally, ensuring you are managing your platform in a robust fashion, through modern infrastructure-as-code techniques will help to prevent abuse seen through over-provisioning.

I hope this has provided some useful recommendations/guidance to help you gain more control over your cloud costs, specific to Azure (principles apply across other clouds too) and provided some tips on how to reduce spend where applicable! Hopefully this will help to reduce some of the frustration as done right, cloud can deliver on the cost savings you anticipated whilst also giving you access to all the other benefits.

Recap of key Azure features from Ignite Part 2

… continuation of the Part 1 post which can be found here

The following post summarises the recap of the remaining 5 features that I found interesting from the announcements at the Ignite conference.

Azure Top 10 Ignite Features

5. Global Virtual Network Peering (preview)

Inter-VNet peering is a technology that allows you to connect a VNet directly to another VNet, without having to route that traffic via a gateway of some sort. Bear in mind that VNets are isolated until you connect them via a gateway, this feature allows you to essentially peer the VNet with another VNet thus removing the complexity of routing that traffic via a gateway and/or back on-premises. In addition, it allows you to take advantage the Microsoft backbone with low latency and high bandwidth connectivity. Inter-VNet peering is available to use today, however is constrained to a particular region (I.e. you can only peer VNets that exist within UK South, for instance – not between UK South and UK West).

virtual network peering transit

Source: https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-peering-overview

Global VNet peering addresses that and allows you to peer between regions thus gaining global connectivity, without having to route via your own WAN. This feature is currently in preview in selected regions (US and Canada)

4. New Azure VM Sizes

Many new virtual machine sizes have been announced recently, factoring in differing workload types (e.g. for databases) as well as more cost effective virtual machines. A large number of organisations see Azure IaaS as a key platform allowing them to scale workloads that still require complete control over the operating system.

The announcements around Ignite were mainly focused around SQL server and Oracle type workloads that require high memory and storage, but are not typically CPU intensive. Some of the latest specifications, e.g. DS, ES, GS and MS provide constrained CPU counts to 1/4 or 1/2 of the original VM Size.

An example of this would be the Standard GS5 which comes with 32vCPU, 448GB memory, 64disks (up to 256TB total), and the new GS5-16 which comes with 16 and 8 active CPU respectively.

Another interesting VM type announced recently would be the B-series (burstable VMs) which allows credits to be recovered and applied back to your monthly totals for unused CPU. One to review!!

3. Planned VM maintenance

Maintenance in Azure has long been a bug bear of many customers. If you are operating a single virtual machine (which to be fair, you should think about architecting differently anyway…Smile) then at any time Microsoft may perform updates on the underlying hypervisors that run the platform. If your virtual machine is in this update domain then it will be restarted… and certain data (i.e. that stored in cache) may be lost.

Planned VM maintenance helps greatly here as it provides better visibility and control into when maintenance windows are occurring. Even allowing you to proactively start maintenance early at a suitable time for your organisation. You can create alerts, and discover which VMs are scheduled for maintenance ahead of time. In addition, you can choose between VM preserving and VM restarting/re-deploy state to better manage the recovery of the VM post maintenance.

As stated above, this problem goes away if you can re-architect your application accordingly with HA in mind. Plan to use Azure Availability Zones (AAZ) when they come out of preview and if not, look into regional availability and/or introduction of traffic manager and load balancers into your application.

2. Azure Migrate (preview)

Another great announcement was the introduction of a new capability called Azure Migrate, which is currently in preview. This service is similar to the Microsoft Assessment and Planning (MAP) kit however is very Azure focused (whereas MAP tended to be all about discovery and then light-weight Azure assessments).

The tool provides visibility into your applications and services and goes one step further to map the dependencies between applications, workloads and data. Historically, those working with Azure for a while will remember using tools like OMS to achieve this inter-dependency, or mapping it out themselves in pain staking fashion. A brief overview of the tool console is provided in the figures below:

Blog1Blur

Source: https://azure.microsoft.com/en-gb/blog/announcing-azure-migrate/

The tool is currently in preview, and is free of charge for Microsoft Azure customers (at time of writing). It is appliance based, and discovers virtual machines and performs intelligence such as “right-sizing” to the correct Azure VM type (thus saving costly IaaS overheads!!). It maps the multi-tier app dependencies and is a much deeper and richer capability set than MAP.

… and finally… drumroll please…

1. Azure Stack

I wrote a lengthy post on Azure Stack recently for the organisation I work for; Insight UK, and that post can be found here. Azure Stack was and is a big announcement from Microsoft and demonstrates their commitment to the Enterprise in my opinion. Microsoft have firmly recognised the need to retain certain workloads on-premises for a variety of reasons, from security/compliance through to performance, etc.

The Azure Stack is Microsoft’s true Hybrid Cloud platform and is provided by four vendors at present in HPe, Dell, Lenovo and Cisco. It provides a consistent management interface from the public Azure Cloud to on-premises, ensuring your DevOps/IT teams can communicate with applications in the same way irrespective of location. It allows for consistent management of both cloud native applications and legacy applications.

Image result for Azure Stack microsoft

Source: https://blogs.technet.microsoft.com/uktechnet/2016/02/23/microsoft-azure-stack-what-is-it/

Provided as either a four, eight or twelve node pre-configured rack, the software is locked down by Microsoft and only they can amend or provide updates. In addition the Stack firmware and drivers and controlled by the manufacturer and remain consistent with the software versions.

The hardware is procured directly from the vendor and then the resources are charged in a similar way to the public Azure cloud. The stack offers either a capacity based model or pay as you go, and can even operate in offline mode (great example with Carnival Cruise Ships)…

.. thanks for reading! – that’s my top 10 summary of Azure related announcements that came out of the Ignite conference in 2017. There is many more announcements and features and I hope to get more time to lab and write about them in the near future!

Update: Azure VNet Service Endpoints – Public Preview Expanded

I blogged about Virtual Network Service Endpoints (VNSE) recently after it was announced in preview mid September. From the earlier post;

Virtual Network Service Endpoints is a new feature to address situations whereby customers would prefer to access resources (Azure SQL DBs and Storage Accounts in the preview) privately over their virtual network as opposed to accessing them using the public URI.

Typically, when you create a resource in Azure it gets a public facing endpoint. This is the case with storage accounts and Azure SQL. When you connect to these services you do so using this public endpoint which is a concern for some customers who have compliance and regulatory concerns OR just want to optimise the route the traffic takes.

Initially this feature was restricted to the US and Australian regions. I missed the announcement last week that this feature has been expanded into all Azure regions (still in preview) – which is great news. I have introduced the preview of this feature to several customers recently and they saw great advantages in being able to address resources from a storage and SQL perspective privately rather than with a public URI and considered this something that would increase their opportunities in  the Azure space.