Define and install a rack topology script

In network terminology, all the physical servers in the network should’ve been present in a rack in the data center. In hadoop, the racks assignment is significant as it plays a vital role in terms of data locality, bandwidth etc.

We can assign the rack of the hosts in the cluster in two ways.

One is to manually assign the rack in hosts section – assign racks.

The second one is to provide a rack topology script and list of servers – rack details as an separate input file.

Below is the rack topology script which takes topology.data file as an input. CM server will cross check the hosts reporting to them with the hosts information given in the topology.data file. If the host is not present or no rack info given in the file, CM will assign default rack to the host.

/hadoop/topology.script

-----
#!/bin/bash 

while [ $# -gt 0 ] ; do
  nodeArg=$1
  exec< /hadoop/topology.data 
  result="" 
  while read line ; do
    ar=( $line ) 
    if [ "${ar[0]}" = "$nodeArg" ] ; then
      result="${ar[1]}"
    fi
  done 
  shift 
  if [ -z "$result" ] ; then
    echo -n "/default/rack "
  else
    echo -n "$result "
  fi
done 

# cat /hadoop/topology.data 
master     /rack1
master2    /rack2
worker1    /rack3
worker2    /rack3
worker3    /rack2

Specify the script location in the below field.

HDFS – Configuration –

 

Tags:

Leave a Reply

%d bloggers like this: