When it comes to computer cloud, mainly there are two areas, namely private cloud, and public cloud. The private cloud is built on a corporate data center with the help of virtualization software such as VMware, Hyper-V, Citrix, etc. On the other hand, the public cloud is provided by large-scale provides with multiple data centers across the globe. As of today, Amazon Web Services (AWS), Microsoft Azure, and Google Computing Cloud (GCC) are the top public cloud providers.

In the public cloud, they offer things differently. For example, they offer Software As A Service (SAAS) or Platform As A Service (PAAS) mainly to cut down the cost and reduce the administrative overhead, which is a crucial point. However, if it is required to move our existing infrastructure as it is to the cloud, it is possible to achieve it by utilizing Infrastructure As A Service (IAAS).

The public cloud has a technology called Auto Scaling which can expand the computing power horizontally. Auto scaling is used in the cloud for virtual machines, containers, and load balancers. But in this article, we will discuss only virtual machine auto scaling.


What is Auto Scaling and How is it Used?

In a local datacenter or private cloud, when a server/VM is created, the capacity needs to be allocated not only for the current workload but also for future or sudden spikes. However, the average CPU and RAM utilization of a server is less than 20% [1]. So, the public cloud came up with a clever mechanism to expand the computing power of auto scaling. It is possible to set a limit for minimum and maximum VMs that we need to handle the load in auto scaling. When the load is low, only the minimum number of VMs are running. On the other hand, when a sudden traffic/load spike occurs, the VM count will be increased up to the maximum limit to compensate for the load.

Usually, the VMs in the auto scaling group will be created with a base image (publicly available SO images) and a startup script to install and configure required services. The startup script can be fed separately in the cloud platform. For instance, AWS refers to this as “user data” and Microsoft Azure refers to this as “custom script extension”. Also, it is possible to keep a customized image with all required software and source codes, then create an auto scaling group based on that.



The VMs created by the auto scaling group need to be associated with a load balancer which is offered as a service in the public cloud. The load balancer can distribute traffic across all VMs. Apart from this, the health checks of a VM play a significant role. For example, if the VM service is frozen, it cannot or slow down response to incoming queries. Therefore, the health checks need to be integrated with the load balancer to distribute traffic across healthier targets.

Ultimately, all auto scaling groups, load balancer, and health checks need to be integrated for a successful outcome of the auto scaling concept. This concept also keeps the minimum count of VMs that reduce the usage of computing power, saving money and ultimately reducing the carbon footprint for a better future.