How I reduced 48% of my cloud cost by using Google Cloud preemptible instances

Google Cloud Platform has an amazing feature that few people use, partially because it is unknown, but mainly because it is very difficult to set up a system architecture that allows you to use. This feature is preemptible instances. How does it work? Simple: you have a virtual machine like any other, except that this VM will shutdown unexpectedly within 24 hours and be eventually unavailable for short periods. The advantage: this preemptive instances cost less than 50% compared to the ordinary machine.

Usually, people use this kind of machine for servers that run workers or asynchronous jobs, a kind of application that does not need 24/7 availability. In my case, I could use the preemptible instances for my internal API, an application that do need 24/7 availability. This internal API can’t stay offline, so the way I solved the unavailability problem was by running many servers in parallel  behind a haproxy load balancer. So, in basically 3 steps I could reduce my API infrastructure cost by 50%.

Step 1 – Setup the client to be fault tolerant

My code is in Scala language. Basically, I made the client to repeat a request when it eventually failed. This is necessary because, even if the API machines are behind the load balancer, the load balancer takes some time (seconds) to realize that a specific machine is down, so eventually it sends some requests to unavailable machines. The client code snippet is:

def query(params, retries = 0) {
  val response = api.query(params)
  response.onSuccess {
  response.onFailure {
    case x => {
      LOG.error(s"Failure on $retries try of API request: " + x.getMessage)
      Thread.sleep(retries * 3000) //this sleep is optional
      query(params, retries + 1) //the could be a maximum number of retries here

Step 2 – put all servers behind a load balancer

I created a haproxy config file that I can auto-update based on a list of servers that I get from the gcloud command line. Here is the script that re-writes the haproxy config file with a list of all servers that has a specific substring in their names:

EMPTY_FILE=`cat /etc/haproxy/haproxy.cfg |grep -v $SERVER_SUBSTRING`
NEW_LINES=`gcloud compute instances list |grep $SERVER_SUBSTRING | sed 's/true//g' |sed 's/ [ ]*/ /g'|cut -d" " -f4|awk '{print " server playax-fingerprint" $NF " " $NF ":9000 check inter 5s rise 1 fall 1 weight 1"}'`
echo "$EMPTY_FILE" >new_config
echo "$NEW_LINES" >>new_config
sudo cp new_config /etc/haproxy/haproxy.cfg
sudo ./

The restart script reloads the haproxy configuration without any outage.

Step 3 – create an instance group for these servers

By creating an instance template and an instance group, I can easily add or remove servers to the infrastructure. The preemptible configuration is inside the instance template page in google cloud panel.

  1. Create an instance template with preemptible option checked
  2. Create an instance group that uses that template

Screen Shot 2016-05-04 at 10.40.58 PM


Screen Shot 2016-05-04 at 10.41.18 PM

One very important warning is that you need to plan your capacity to allow 20% of your servers to be down (remember that preemptible instances eventually are out). In my case, I had 20 servers before using the preemptible option. With the preemptible on, I changed the group to 25 servers.

Before After
Servers 20 24
Cost per server $0.07 $0.03
Total cost per hour $1.4 $0.72
Total cost per month $1,008 $518

Price reduction:  $490 or 48.6%

Graphs of server usage along 1 day (observe how many outages there are, but application ran perfectly ):

Screen Shot 2016-05-04 at 11.12.36 PM

6 Comments How I reduced 48% of my cloud cost by using Google Cloud preemptible instances

  1. Niranjan

    Its a really cool concept! 🙂

    Usually how frequently are preemptible VMs shutdown during a 24 hour period?

  2. Daniel Cukier

    usually it shuts down only once every 24 hours. Sometimes twice and rarely 3 times.

  3. David

    Great post, thanks. How many servers can go down at one time? You are talking about 20% but is that a figure provided to you or just a gamble based on observations?

  4. Daniel Cukier

    It is based on observations, the figure is real graph from my servers. Servers go up and down all the time, but rarely many of them go down at the same time. 20% is by observation something safe.

  5. Erick Mendonça

    Great article, congratulations!

    Wouldn’t it be a safer bet to have a handful of regular servers and a great bunch of preemptive ones?

  6. Daniel Cukier

    Hi Erick, in fact this is also a good approach. Empirically I observed that it is not necessary for an SLA near 99.9% – if I need 100% SLA, your approach is correct!

Leave a Reply

Your email address will not be published. Required fields are marked *