CCA131 exam is a fully hands on exam and one should have a practical experience of working in a hadoop cluster to pass the exam.
If you don’t have practical experience, then I’d recommend you to practice through Cloudera quickstart VM or build a multi node cluster in your laptop using VMs or in AWS.
Below are the minimum configuration requirements for the cluster.
CM server & Namenode: 8 GB of RAM, 10 GB Disk
CM server needs 8GB of memory to run the Cloudera Manager else it will hung.
Standby node: 4 GB RAM, 8-10 GB disk
Ideally standby should have same config as active NN, since it’s for learning/POC, you can go for 4GB config.
Datanodes (2) : 1 GB RAM, 8-10 GB disk
Total minimum memory required for the cluster : 12GB
So if you have a laptop of 16GB memory then you can create separate linux VMs with above configs, install Cloudera and build cluster for free of cost.
If you don’t have prior AWS experience/exposure, please skip this topic and check Online labs section at the end
This is just an high level overview of AWS ec2 instances, pricing charges and not basics of AWS.
Most of us don’t have laptop of 16GB memory, so if you try to reduce the memory allocated and build the cluster, it’ll hang for ages, run out of memory, which is not a convenient one.
If you have experience in AWS, then you can go for EC2 instances to build your cluster.
AWS EC2 instances prices can be found in this link: https://aws.amazon.com/ec2/pricing/on-demand/
I used t2.large for CM/NN, t2.medium for standby, t2.micro (2 no’s) for datanodes.
AWS EC2 instances are charged per hour basis (now they introduced per second billing), so if you stop and start an Amazon EC2 instance three times in a single hour, you will be charged for 3 hours.
Since the t2.micro is free for one year(750 hrs) when you signup with AWS, let’s calculate the pricing for masters.
t2.large -> $0.1152/hour
t2.medium -> $0.0576/hour
Total: $0.1728 per hour. So if you’re using it on a weekend for 20 hours, you’ll incur $3.5 for the instances and additional EBS volumes charges plus taxes.
As a free tier, you’ll get 30GB of EBS volumes. Since the min volume size of instance should be 8GB, you will cross the free limits when you create 4 instances ( 4*8 GB ).
I paid around $15 during my preparation. I didn’t know about EBS charges beforehand so ended up paying $3-$4 extra for provisioning 60+ GB EBS volumes.
- Launch the AWS instances in a new VPC to have a static Private IPs.
- Public ip will change during every start/stop.
- Before running Cloudera binary installer, disable SElinux and restart the instance. (By default, selinux is enabled and your installation will fail). Of course, you’ll be charged per hour price for this reboot.
- Always start/stop the instances till your POC is done. If you terminate, cluster will be gone.
- Update the /etc/hosts in all instances with hostname/ip of the instances, to enable forward/reverse dns lookup.
- You may encounter lot of issues while building cluster in AWS and don’t get frustrated. Try to debug/resolve as it’s a valuable learning.
AWS charges vary on time/region basis. Please check the pricing details and use at your own discretion.
If you don’t have sufficient configuration in laptop or new to AWS, then I’d suggest to try some online labs such as https://labs.itversity.com/#/ to avoid all these hassles.
Please post your queries, feedback on the comments section.