Prior to Hadoop 2.0.0, the NameNode was a single point of failure (SPOF) in an HDFS cluster.
Each cluster had a single NameNode, and if that machine or process became unavailable, the cluster as a whole would be unavailable until the NameNode was either restarted or brought up on a secondary namenode.
The secondary namenode in contrast to its name just serves a purpose of compacting the namenode edits log with fsimage and it’s not a standby or can’t failover to it if the namenode goes down.
The Namenode High Availability feature addresses the above problems by providing the option of running two redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby. This allows a fast failover to a new NameNode in the case that a machine crashes, or a graceful administrator-initiated failover for the purpose of planned maintenance.
To enable Namenode HA, you must ensure that the two nodes are of same configuration in terms of memory, disk, etc for optimal performance.
Go to HDFS – Actions – Enable High Availability
Provide a name for the HA Nameservice
Select two Namenode hosts and three hosts for journal node.
Once you finish the roles assigning steps, the cluster will perform 20+ steps to set up high availability which includes stopping the namenode and secondary namenode, creating namenode edits directory in both hosts, creating journal node services, set up checkpointing, etc.,
When all the steps are completed successfully, go to HDFS – instances – Federation and High availability,
You can see the Nameservice details, with two namenodes are in active and standby status respectively and Highly Available is showing status “Yes”.
To failover between the namenodes, go to Actions –Manual failover and choose which namenode you want to set up as Active.
Thus we covered how to configure Namenode HA.
Use the comments section below to post your doubts, questions and feedback.
Please follow my blog to get notified of more certification related posts, exam tips, etc.