Amazon EMR or AWS EMR is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data. Architecture: EMR cluster refers to a group of AWS EC2 instances built on AWS ami. Each instance in the cluster is called a node. Each… Continue reading AWS EMR (Elastic MapReduce) – Overview
Map outputs are temporary intermediate data which doesn't purpose to the user running the job. It is used by the reducer to combine, sort, shuffle and produce the final output. It's not recommended to store it in hdfs as the data will be replicated across the cluster, the namenode has to update its metadata, etc.… Continue reading Why Map outputs are stored in local FS and not in HDFS?
Data Locality optimization is the method of running computation closer to the node where the actual data resides. Since Hadoop is dealing with large amount of data, network bandwidth is one of the valuable resources for them. Hadoop does its best to run the map task on a node where the blocks of input data… Continue reading Data locality optimization in Hadoop
Below are the cca131 sample exam questions format in which the problems will appear in the exam. Each problem contains three sections - instructions, data description and output requirements. Instructions explain about the problem and what needs to be done; Data description contains nodes/services details or relevant information to solve the problem; Output requirements is… Continue reading CCA131 sample exam questions
I've screened around 50 profiles/resume in recent months for the technical round of Hadoop Admin/Engineer position and it's shocking to see how terrible the candidates resumes are. All the resumes came through Consultancies and they do have major share for the poor quality of those resumes. I'm listing the below points I observed during screening… Continue reading How your resume shouldn’t be – observations on screening profiles for Hadoop Admin role
In this post I'm going to share my feedback on AWS Certified Solutions Architect Associate exam, preparation strategy and the exam tips. I passed all 3 AWS certifications in recent months and I took them in the order of CSA first, followed by Developer associate and finally Sysops Associate. You can find my feedback on… Continue reading AWS Certified Solutions Architect – Associate exam feedback
I passed AWS Certified SysOps Administrator exam last week, thus completing the trifecta of AWS associate certifications. I took AWS Solutions Architect associate 3 months back, Developer associate 2 months back which really helped me in increasing my expertise in AWS and my confidence, which is key for this certification. As expected, Sysops is… Continue reading AWS Certified SysOps Administrator – Associate exam feedback
CCA131 exam is a fully hands on exam and one should have a practical experience of working in a hadoop cluster to pass the exam. If you don't have practical experience, then I'd recommend you to practice through Cloudera quickstart VM or build a multi node cluster in your laptop using VMs or in AWS.… Continue reading Multinode hadoop cluster configuration for CCA131 preparation
SSH is the indispensable service for the Linux servers, which is a method of secure login from one server to a remote server. It is also used to transfer files over the network using secure copy (SCP) Protocol.
Linux servers are used in all the organizations and irrespective of your roles, likes/dislikes, you'd use Linux in one way or another. So it's good to learn the linux commands which would be handy for you whenever you get a chance to work on it. I have listed the basic and commonly used commands in… Continue reading Linux commands – Basics