Google Cloud Platform has an amazing feature that few people use, partially because it is unknown, but mainly because it is very difficult to set up a system architecture that allows you to use. This feature is preemptible instances. How does it work? Simple: you have a virtual machine like any other, except that this VM will shutdown unexpectedly within 24 hours and be eventually unavailable for short periods. The advantage: this preemptive instances cost less than 50% compared to the ordinary machine.
Usually, people use this kind of machine for servers that run workers or asynchronous jobs, a kind of application that does not need 24/7 availability. In my case, I could use the preemptible instances for my internal API, an application that do need 24/7 availability. This internal API can’t stay offline, so the way I solved the unavailability problem was by running many servers in parallel behind a haproxy load balancer. So, in basically 3 steps I could reduce my API infrastructure cost by 50%.
Step 1 – Setup the client to be fault tolerant
My code is in Scala language. Basically, I made the client to repeat a request when it eventually failed. This is necessary because, even if the API machines are behind the load balancer, the load balancer takes some time (seconds) to realize that a specific machine is down, so eventually it sends some requests to unavailable machines. The client code snippet is:
def query(params, retries = 0) { val response = api.query(params) response.onSuccess { codeForSuccess() } response.onFailure { case x => { LOG.error(s"Failure on $retries try of API request: " + x.getMessage) Thread.sleep(retries * 3000) //this sleep is optional query(params, retries + 1) //the could be a maximum number of retries here } } }
Step 2 – put all servers behind a load balancer
I created a haproxy config file that I can auto-update based on a list of servers that I get from the gcloud command line. Here is the script that re-writes the haproxy config file with a list of all servers that has a specific substring in their names:
#!/bin/bash SERVER_SUBSTRING=playax-fingerprint EMPTY_FILE=`cat /etc/haproxy/haproxy.cfg |grep -v $SERVER_SUBSTRING` NEW_LINES=`gcloud compute instances list |grep $SERVER_SUBSTRING | sed 's/true//g' |sed 's/ [ ]*/ /g'|cut -d" " -f4|awk '{print " server playax-fingerprint" $NF " " $NF ":9000 check inter 5s rise 1 fall 1 weight 1"}'` echo "$EMPTY_FILE" >new_config echo "$NEW_LINES" >>new_config sudo cp new_config /etc/haproxy/haproxy.cfg sudo ./restart.sh
The restart script reloads the haproxy configuration without any outage.
Step 3 – create an instance group for these servers
By creating an instance template and an instance group, I can easily add or remove servers to the infrastructure. The preemptible configuration is inside the instance template page in google cloud panel.
- Create an instance template with preemptible option checked
- Create an instance group that uses that template
One very important warning is that you need to plan your capacity to allow 20% of your servers to be down (remember that preemptible instances eventually are out). In my case, I had 20 servers before using the preemptible option. With the preemptible on, I changed the group to 25 servers.
Before | After | |
Servers | 20 | 24 |
Cost per server | $0.07 | $0.03 |
Total cost per hour | $1.4 | $0.72 |
Total cost per month | $1,008 | $518 |
Price reduction: $490 or 48.6%
Graphs of server usage along 1 day (observe how many outages there are, but application ran perfectly ):
Its a really cool concept! 🙂
Usually how frequently are preemptible VMs shutdown during a 24 hour period?
usually it shuts down only once every 24 hours. Sometimes twice and rarely 3 times.
Great post, thanks. How many servers can go down at one time? You are talking about 20% but is that a figure provided to you or just a gamble based on observations?
It is based on observations, the figure is real graph from my servers. Servers go up and down all the time, but rarely many of them go down at the same time. 20% is by observation something safe.
Great article, congratulations!
Wouldn’t it be a safer bet to have a handful of regular servers and a great bunch of preemptive ones?
Hi Erick, in fact this is also a good approach. Empirically I observed that it is not necessary for an SLA near 99.9% – if I need 100% SLA, your approach is correct!