In network terminology, all the physical servers in the network should’ve been present in a rack in the data center. In hadoop, the racks assignment is significant as it plays a vital role in terms of data locality, bandwidth etc.
We can assign the rack of the hosts in the cluster in two ways.
One is to manually assign the rack in hosts section – assign racks.
The second one is to provide a rack topology script and list of servers – rack details as an separate input file.
Below is the rack topology script which takes topology.data file as an input. CM server will cross check the hosts reporting to them with the hosts information given in the topology.data file. If the host is not present or no rack info given in the file, CM will assign default rack to the host.
/hadoop/topology.script ----- #!/bin/bash while [ $# -gt 0 ] ; do nodeArg=$1 exec< /hadoop/topology.data result="" while read line ; do ar=( $line ) if [ "${ar[0]}" = "$nodeArg" ] ; then result="${ar[1]}" fi done shift if [ -z "$result" ] ; then echo -n "/default/rack " else echo -n "$result " fi done
# cat /hadoop/topology.data master /rack1 master2 /rack2 worker1 /rack3 worker2 /rack3 worker3 /rack2
Specify the script location in the below field.
HDFS – Configuration –
Please provide explanation in detail for this topic.