AWS EMR Uniform Instance groups

In this post, I wrote about the AWS EMR uniform instance groups overview, advantages and caveats of using it.

AWS EMR architecture contains master node, core node(s) and task nodes.  If you’re new to EMR, refer https://www.hadoopandcloud.com/aws/amazon-emr/  for a quick introduction.

While creating the cluster, you have two configuration options for the nodes – instance fleets and Uniform instance groups. [1]

Uniform instance groups

Instance fleets, as name suggests, mix of multiple instance types with on-demand and spot instances combined. For each instance fleet, you specify up to 5 instance types, which can be provisioned as On-Demand and Spot Instances.

Uniform Instance groups

In Zeotap, we’re using instance groups for our clusters as it suits our requirements. It’s a simplified setup, all you have to do is create an instance group for each component – master, core, spot and define the instance type, storage and instance count.

AWS EMR hardware configuration

Each Amazon EMR cluster can include up to 50 instance groups:

  • 1 master instance group that contains one EC2 instance,
  • 1 core instance group that contains one or more EC2 instances
  • 48 optional task instance groups


So, you can even create a simple cluster with just one master, one core node and add task nodes anytime you want by creating task instance group(s).

Autoscaling

As each core and task instance group can contain any number of EC2 instances, you can scale each instance group by adding and removing EC2 instances manually, or set up automatic scaling.

This also gives us an opportunity for efficient cost optimisation. i.e increase the capacity of task nodes when jobs are running and remove them when the cluster is running idle. Also your savings will significantly increase when you use spot instances for task nodes.

Caveats

One of the main caveats of instance group is that once the instance groups of master and core are created, you can’t modify it. For example, if you choose core node type as m4.large, but later if you want to change it to m4.2x/4x you can’t modify it. You can only increase/decrease the count of core instances.

Therefore, you should be well aware of your cluster requirements to effectively use instance groups.

Reference:

1. https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-instance-group-configuration.html

If you like this post, please share your valuable comments and feedback. Also feel free to subscribe to blog to get notified of new posts by email.

Leave a Reply

Your email address will not be published. Required fields are marked *