Kannan AK, Author at Hadoop and Cloud

August 23, 2017

Hadoop

1 Comment

Efficiently copy data within a cluster/between clusters

DistCp (distributed copy) is a tool used for large inter/intra-cluster copying of HDFS data. It uses MapReduce to effect its distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list. […]

Kannan AK

August 23, 2017

Hadoop

1 Comment

Perform OS-level configuration for Hadoop installation

Before installing CDH in our server, we’ve to make the below configuration changes in OS level for successful installation. Disable SELINUX “Security-Enhanced Linux (SELinux) is a Linux kernel security module that provides a mechanism for supporting access control security policies” If SElinux is enabled, then cloudera server installation will fail in the server. To disable […]

Kannan AK

August 22, 2017

Hadoop

2 Comments

Define and install a rack topology script

In network terminology, all the physical servers in the network should’ve been present in a rack in the data center. In hadoop, the racks assignment is significant as it plays a vital role in terms of data locality, bandwidth etc. We can assign the rack of the hosts in the cluster in two ways. One […]

Kannan AK

August 22, 2017

Hadoop

3 Comments

Create an HDFS user’s home directory

Let’s assume that you’re asked to create an HDFS user home directory for a user named “goldberg“. First of all, verify that the user is exists in the server. If you’re creating the user directory for goldberg and set the ownership as goldberg:goldberg with restricted permissions, then none can access the dir except user ‘goldberg’. […]

Kannan AK

August 19, 2017

Hadoop

1 Comment

Configure HDFS ACLs

Every file/folder in linux is owned by a owner and the group. If an user needs to access the file (read, write, modify) either the user has to be part of the group or the file has appropriate “others” permissions. In this model, we can’t set different permissions userwise, groupwise catering to our requirements. ACLs […]

Kannan AK

August 12, 2017

Hadoop

3 Comments

Set up a local CDH repository

This post will explain you how to set up a local YUM/CDH repository for your network. In Linux, /etc/yum.repos.d is the path for yum repos present in the server. For every repo , there will be a baseurl value which contains the link for the repository path. When you execute “yum install packagename” the […]

Kannan AK

August 12, 2017

Hadoop

1 Comment

CCA131 – Cloudera Administration Certification Exam Notes and Preparation Guide

In this post, we’ll go through the exam blueprint topics and show you what and how to perform the tasks in each topic. All the embedded links are my exam preparation notes. Go through each link and perform/practice it in your cluster. Please practice till the time you’re confident of doing all the tasks without referring […]

Kannan AK

July 31, 2017

AWS

No Comments

S3 – Versioning

Versioning is a method of keeping multiple modifications/versions of an object in the same bucket. You can use versioning to preserve, retrieve, and restore every version of every object stored in your Amazon S3 bucket. With versioning, you can easily recover from both unintended user actions, accidental deletes and application failures. How Versioning works: In […]

Kannan AK

July 18, 2017

AWS

No Comments

AWS Developer Associate – Exam tips/feedback

I took the developer exam with worst possible preparation. I started preparing only a couple of days before the scheduled exam date and believe it or not, I read Dynamodb last night and FAQs few hours before the exam. But one thing which really helped me is the knowledge I gained on S3, EC2, VPC, […]

Kannan AK

July 3, 2017

Uncategorized

No Comments

S3 – Storage classes

The below image compares the features of the storage classes. S3 Standard: Availability: […]