February 23, 2018

MICROSERVICES COST OPTIMISATION

For companies that currently leverage a micro-service architecture on AWS using EC2 on-demand instances. you are probably wasting a significant amount of money on your AWS monthly spend. At Cirrusage we decided to write a small article on our approach to reducing the cost of running AWS ECS Clusters. By following this model your EC2 monthly spend can be significantly cut down ( in some cases up to 75%).

Firstly lets address ECS Fargate.

ECS FARGATE:

This is a managed service allowing AWS customers to relieve themselves of the responsibility of managing ECS server and clusters. An explanation on ECS Fargate pricing can be found here:

https://aws.amazon.com/fargate/pricing/

The creation of a ECS Fargate cluster is fairly straight forward, it doesn’t even need you to create a VPC or subnets. You merely have to specify this option while creating your ECSCluster. There is only one catch. Fargate is only available in the North Virginia region (us-east-1). So if you are using any other region then you are out of luck… for now. From conversations we have had with AWS they are looking to make this widely available by 2nd quarter 2018.

SPOT INSTANCES:

Spot Instances enable you to request Amazon’s excess capacity EC2 instances, this can significantly lower your Amazon EC2 costs significantly. The hourly price for a Spot Instance (of each instance type in each Availability Zone) is set by Amazon EC2, and adjusted gradually based on the long-term supply of and demand for Spot Instances. Your Spot Instance runs whenever capacity is available and the maximum price per hour for your request exceeds the Spot price.

Amazon EC2 terminates, stops, or hibernates your Spot Instance when the Spot price exceeds the maximum price for your request or capacity is no longer available. Amazon EC2 provides a Spot Instance interruption notice, which gives the instance a two-minute warning before it is interrupted.

SPOT FLEET:

A Spot Fleet is a collection, or fleet, of Spot Instances. The Spot Fleet attempts to launch the number of Spot Instances that are required to meet the target capacity that you specified in the Spot Fleet request. The Spot Fleet also attempts to maintain its target capacity fleet if your Spot Instances are interrupted due to a change in Spot prices or available capacity.

The Spot Fleet selects the Spot Instance pools that are used to fulfill the request, based on the launch specifications included in your Spot Fleet request, and the configuration of the Spot Fleet request. The Spot Instances come from the selected pools.

Instance Strategy:

There are two types of strategies you can specify in your Spot Fleet request:

  • lowestPrice
    • The Spot Instances come from the pool with the lowest price. This is the default strategy.
  • diversified
    • The Spot Instances are distributed across all pools.

Fleet Configuration:

Cirrusage recommends a diversified approach utilizing varying instance types or within the same family. When using instances of different sizes, apply instances weightings to specify each instances contribution to application capacity.

For more info go to:

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-fleet.html#spot-instance-weighting

For this example we will use three instances of similar sizes c3.large, t2.large and m4.large for the instance types for a spot request spanning across 3 subnets in three different availability zones. With a diversified strategy (goal is for the highest availability possible rather than the lowest price) with the request configured to replace unhealthy instances.

Bidding Strategy:

The goal is to minimize interruptions, to achieve this Cirrusage recommends settings the bid price to the EC2 on demand price. Since you will only ever be charged the current spot price you will still be taking advantage of the low rates offered by spot instances.

M4.Large Spot instance 3 month price history (eu-west-1)

C3.Large Spot instance 3 month price history (eu-west-1)

T2.Large Spot instance 3 month price history (eu-west-1)

As is evident from the diagrams above, the of these three instances in a spot fleet over the last 3 months would have result in no interruptions on the c3.large

instance type and two interruptions on m4.large. With a “diversified” strategy and the spot fleet acting to replace unhealthy or terminated instances the ECS Cluster would have stayed up without interruption for the entire 3 months.

Also evident is that on average we would save 75%  on cost compared to using on-demand versions of the same instance types.

Auto-scaling:

Spot fleets supports auto-scaling of instances in much the same way as on demand instances. Suing cloudwatch metrics we recommend you add both container level and instance level auto-scaling using cloudwatch alarms and scaling policies.

FINALLY

We recommend configuring all of this into a nested cloudformation script and introducing the architecture into an overall continuous delivery (CICD) pipeline.