cca131 · Hadoop

CCA131 – Cloudera Administration Certification Exam Notes and Preparation Guide

In this post, we’ll go through the CCA131 exam blueprint topics and show you what and how to perform the tasks in each topic.

All the embedded links are my exam preparation notes. Go through each link and perform/practice it in your cluster.

Check out my post on Minimum cluster configuration required for practice.

Please practice till the time you’re confident of doing all the tasks without referring any documentation/material and capable of resolving the issues.

Before we start, check out my post on exam feedback.  CCA131- Exam Feedback

If you’re new to Linux, please check out Linux commands – Basics which will help you get familiar with shell commands.

UPDATE: 

After getting many requests from the followers on the exam question format, I created a post contains sample exam questions. Please check it to get an understanding of how the problems will appear in the exam.

CCA131 exam questions

 


 

Promotion

Pluralsight is offering a 10-day free trial for their courses.

They have very good courses of building Cloudera Cluster, building a production cluster with external databases etc., which you may find very helpful in your preparation for Hadoop admin role as well as for certification.

Please check it out using the below link and you can cancel the subscription at any time.

Start a 10-day free trial at Pluralsight

 


 

Install

Demonstrate an understanding of the installation process for Cloudera Manager, CDH, and the ecosystem projects.

 

Configure

Perform basic and advanced configuration needed to effectively administer a Hadoop cluster

 

Manage

Maintain and modify the cluster to support day-to-day operations in the enterprise

Secure

Enable relevant services and configure the cluster to meet goals defined by security policy; demonstrate knowledge of basic security practices

Test

Benchmark the cluster operational metrics, test system configuration for operation and efficiency

  • Execute file system commands via HTTPFS : HttpFS commands
  • Efficiently copy data within a cluster/between clusters : Distcp
  • Create/restore a snapshot of an HDFS directory :  Snapshots
  • Get/set ACLs for a file or directory structure :  HDFS ACLs
  • Benchmark the cluster (I/O, CPU, network) : Benchmarking

Troubleshoot

Demonstrate ability to find the root cause of a problem, optimize inefficient execution, and resolve resource contention scenarios

 

Before you appear for the exam, please check my post on how to validate the outputs in exam. CCA131 – Answers Validation Steps

This will help you in re validating your outputs/solutions/tasks which you may fail to notice during the exam.


Wish you all the best for your exam.

Please do share your thoughts/feedback after taking the exam, which will help me to update the contents accordingly and also help our fellows in Hadoop community..

 

 

55 thoughts on “CCA131 – Cloudera Administration Certification Exam Notes and Preparation Guide

  1. Hello Kannan
    Thanks for sharing. If i am thorough with these questions, are enough to take the exam ??, Please let me know.

    Thanks
    krish

  2. Dear Krish,
    Thank you so much for the guide. I tried to attempt exam on 22 november but I failed. Only 3 problems were correct.
    There was a question about setting Heap size which I didnt understand. They gave us some amount of Ram and asked to calculate and configure heap size. Can you help me what is this.
    Also there was a problem related to Kafka configuration which was configured wrong before but I was not able to solve the error.
    Can you guide me what should I do for preparing exam.
    Also there was a question about making groups for role.
    Can you help me.
    Thanks in advance
    Thanks in advance

    1. Hi Syed,

      Sorry to hear about that.

      For Kafka configuration, first you have to download and activate the Kafka parcel, then proceed to add the Kafka service. Then kafka brokers will be started. If you encounter any issues in starting Kafka broker, then look at the logs to see what’s the issue and troubleshoot accordingly.
      One of the common issue is the heap size of Kafka broker. Go to kakfa – configuration – search for ‘heap size’ and change it to recommended value.

    2. Hello Syed !!

      Excuse me for the delay in replying to your comment. My suggestion for preparing “CCA131” exam, make use of the official documentation,read it no shortcut, as well use this blog for practising cloudera official curriculum to take exam. If you can recollect what questions you come across in your last attempt and analyse them to get solutions. I hope this will help you in clearing in exam. Practice well. ALL THE BEST!!
      Thanks
      krish

  3. Hi Kannan- You’re doing a great job. Is it possible to give an idea of what kind of questions are asked related to Sqoop and Flume?

    Regards.

  4. Hi Friends,

    Thank you much for the guide. I tried to attempt exam recently but unfortunately, I got failed and 6 question was correct.

    As per my Understanding, This exam question was not high level but its all tricky questions.
    Linux administrator skill is more useful so first prepare the Linux basic command.

    There was a question on HDFS YARN configuration change parameter, Resolve the application failure issue, dynamic pool configuration, Regular expression, snapshot related question.

    My suggestion for preparing “CCA131” exam, Install the Cloudera on a laptop and practice well as per the syllabus and this blog. I hope this will help you in clearing in exam. ALL THE BEST!!

    Thanks
    Pandurang Bhadange

      1. This blog covered all syllabus, You can refer this blog as blueprint.

        My suggestion for study :

        1) Refer Itvsersity Video series :
        https://www.youtube.com/watch?v=1c5LZQsHyxk&list=PLf0swTFhTI8q9Uc7g2gs2t_ToFGzi7GEw

        2) Refer the complete list of video of Hadoop engineering playlist:
        https://www.youtube.com/watch?v=DjdhZlTGukg&list=PLY-V_O-O7h4dUA5A2E-w_Jd4J4ks27_9v

        3) Refer the Hadoop Expert admin Book : Book by Sam R. Alapati

        4)Refer the current Kannan blog

  5. Yo! Pandurang Bhadange, Can you share the 6 question which you got correct too, Please ? I’d appreciate your input. I’d like to know your experience during the test which can help me a lot I think. can you ping me if you will at viczius @ gmail.

    Thanks in Advance!

  6. Hello Kannan,

    Thank you for writing this blog to the point and clear.

    I had gone through the complete blog and had thoroughly prepared on each activities / tasks you have mentioned. I am happy to inform that I am now a Cloudera Certified Administrator of Hadoop… 🙂

    Thank you very much for your indirect support via this blog. It had really helped me a lot.

    Thanks
    Sunil

    1. Hi Sunil,

      Congrats for passing the exam, that’s a great accomplishment.
      Glad that I could be of some help, but it’s all due to your efforts and perseverance.
      Wish you all the success in your future projects.

    2. Hello Sunil
      Congrats!! How could I reach you bro, I need your inputs, tips,tricks process of clearing the exam. Please help bro, I Failed to clear it in my first attempt.
      Thanks in advance
      Vmc

  7. How do you calculate total allotable memory on a name node, where the total available memory is 41 GiB NonDFS Reserved is 7.2 and the total usage/consumption multiplier is 1.4 ?

    1. Hi Vic,

      Can you clarify what does it mean ” total usage/consumption multiplier is 1.4 “.

      By default, namenode memory + os memory = node memory. So here in this case, maximum allocable for nn is 33.6

        1. I think that this multiplier is the Java overhead memory consumption.
          The “real” memory conssumption of a Java process is the HeapSpace + Meta or Perm Space + Native Memory used by the SO to manage the java process itself. So, the way to calculate the real memory used by the NN java process is to apply the multiplier to the heap size.

          Applying all this to the sizing: (41 – 7.2) / 1.4 = 24.1 GB disposable for the NN’s heap.

          1. What would be the correct calculation to count: Allocate the maximum amount of memory for the HDFS name node in order to store max number of files. The node had 41 GB. The OS took 7.2. Keeping in mind an overhead of 1.4. The namenode and secondary name node should have equal memory where both are running on same server.
            (31-6.4)/1.3= 24.1 means 12 for each node ..

            Can any one has correct answer for this

  8. Hey Kannan,

    I have cleared CCA131 couple of days back, I must thank you for your writings and guidelines, it indeed helped me to go further and give it a try. Appreciate your efforts.

    1. Hi Vin,
      That’s awesome to hear. Glad that I could be of some help but full credits to your hard efforts.
      Wish you all the success in your future projects.

  9. Dear Kannan,
    Thank You very much for this website and blog…
    I had a question in the exam:
    Node RAM is 32GB, nonDFS reserved memory is 6.2GB. Heap size 1.3X
    Cant remember exact question…
    Couldn’t answer the question correctly.

    Would be really helpful if you could explain how this question is to be solved.

    Thanks in advance..

    1. Yeah I didn’t understand this question either! There could be multiple answers and 1.3 x multiplier thing doesn’t really made sense

  10. Hello Kannan/Sunil,

    I need your help in getting the scenarios that you guys faced , as i am also planning to go for CCA131.

    @Kannan which official document you have mentioned in your above comment?

    For calculating the heap , what are the basic understanding we should have?

  11. Hi Kannan,

    I had this below question in my exam can you please let me know what to do ?
    I have given Max Log Size as 8gb and Maximum Log File as 4 for both the nodes but still this answer is wrong in exam ?

    Namenode and Sec NN should have max 4 logs with 8G of size each. together nn and snn must consume only 16G

    1. I had a question in the exam:
      Node RAM is 32GB, nonDFS reserved memory is 6.2GB. Heap size 1.3X
      Cant remember exact question…
      Couldn’t answer the question correctly.

      Would be really helpful if you could explain how this question is to be solved.

      1. if possible could you please re gain your memory on this question. If you can, it will help new exam takers.
        Thanks

    2. Hey Raja

      Please check with “cloudera official documentation” will help you. I guess you had a successful attempt.

    3. Hi Raja,

      The max log size is per log, so in this case you will (4*8)GB occupied by each service which is not the expected behaviour. I think changing the 8gb to 2gb should have done the trick.

      Regards,

  12. I think the value should be (32-6.2)/1.3/2

    32-6.2 is the avaliable memory , As JVM will overhead , So the max value should be multiple 1.3 , in the test , it asked to have NN and DN nodes, so we have 2 nodes shared these values

    1. Hi Valery/Others who passed CCA131 exam,

      CAN SOMEONE WHO CLEARED CCA131 EXAM RECENTLY IN PAST 3 MONTHS PLEASE CALL ME ON MY MOBILE NUMBER (9886019555)

      THIS IS VERY URGENT I NEED URGENT HELP TO DISCUSS WITH YOU ON THIS.

      Thanks
      Gautam
      +91 9886019555

  13. @Kannan,
    Probably I am repeating same question here because I am not confident about the answers.. could you tell me what was the correct configurations for these 2 questions..
    1. Change logging policies of the name node and the secondary name node. Keep 4 copies, occupying max 8GB space. The total capacity can’t exceed 16B for namenode and secondary namenode.
    2. Allocate the maximum amount of memory for the HDFS name node in order to store max number of files. The node had 31 GB. The OS took 6.4. where the namenode and secondary namenode should have equal memory. keeping in mind an overhead of 1.3 heapsize. The namenode and seconadary namenodes are on same server.

    1. Under HDFS configuration Namenode (Filter) property “NameNode Max Log Size” should be 2 GB and property “NameNode Maximum Log File Backups” should be 4. So it will consume 8 GB. The same thing can be set for Secondary namenode as well.

      1. Hi Muthu, Thanks!. I have done same thing with stale configuration. My answer marked as wrong for this configuration. Am I missing anything here?

        1. Hi Krb,
          Planning to take the certification exam on coming Sunday. Any inputs will be greatly appreciated.
          1. I agree with Muthu that how log file size calculates. 4 files x 2 GB for each file = 8GB of space on Namenode and same for SNN. I am not sure why it was marked wrong. Can someone estimate why it went wrong?

          2. I haven’t understand this question completely but seems like this is sure bet question asking in certification exam. If some one could do with detailed explanation will be greatly appreciated.
          As per my understanding, NN heap memory calculation will depends on How much data we have and block size, right?
          Here is my understanding.
          NN have 31GB and SNN have 31 GB. These are separate as these name nodes are not HA.
          Allocated Memory for OS is 6.4, this means we still have (31-6.4=24.6 GB)
          Overhead heap memory is 1.3, means still we have (24.6 – 1.3 = 23.3 GB) in NN and SNN separately.

          Let’s assume we have replication 3 and 128 mb block size requires = 384 MB for each file. Suppose we have 10 DNs of each have 5 TB = 50 TB space.
          (50x1024x1024)/384 = 136533 blocks, So 136 MB of maximum heap is enough to store namespace metadata, right?

          Sorry for long comment. Please bear with me.

    1. Hi Vaqar,

      These commands won’t be provided in the exam. I’d suggest you to practice those commands in any linux machine so that you don’t have to memorize 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *