CCA131 – Cloudera Administration Certification Exam Notes and Preparation Guide

In this post, we’ll go through the CCA131 exam blueprint topics and show you what and how to perform the tasks in each topic.

All the embedded links are my exam preparation notes. Go through each link and perform/practice it in your cluster.

Check out my post on Minimum cluster configuration required for practice.

Please practice till the time you’re confident of doing all the tasks without referring any documentation/material and capable of resolving the issues.

Before we start, check out my post on exam feedback.  CCA131- Exam Feedback

If you’re new to Linux, please check out Linux commands – Basics which will help you get familiar with shell commands.

UPDATE: 

After getting many requests from the followers on the exam question format, I created a post contains sample exam questions. Please check it to get an understanding of how the problems will appear in the exam.

CCA131 exam questions

 


 

Promotion

Pluralsight is offering a 10-day free trial for their courses.

They have very good courses of building Cloudera Cluster, building a production cluster with external databases etc., which you may find very helpful in your preparation for Hadoop admin role as well as for certification.

Please check it out using the below link and you can cancel the subscription at any time.

Start a 10-day free trial at Pluralsight

 


 

Install

Demonstrate an understanding of the installation process for Cloudera Manager, CDH, and the ecosystem projects.

 

Configure

Perform basic and advanced configuration needed to effectively administer a Hadoop cluster

 

Manage

Maintain and modify the cluster to support day-to-day operations in the enterprise

Secure

Enable relevant services and configure the cluster to meet goals defined by security policy; demonstrate knowledge of basic security practices

Test

Benchmark the cluster operational metrics, test system configuration for operation and efficiency

  • Execute file system commands via HTTPFS : HttpFS commands
  • Efficiently copy data within a cluster/between clusters : Distcp
  • Create/restore a snapshot of an HDFS directory :  Snapshots
  • Get/set ACLs for a file or directory structure :  HDFS ACLs
  • Benchmark the cluster (I/O, CPU, network) : Benchmarking

Troubleshoot

Demonstrate ability to find the root cause of a problem, optimize inefficient execution, and resolve resource contention scenarios

 

Before you appear for the exam, please check my post on how to validate the outputs in exam. CCA131 – Answers Validation Steps

This will help you in re validating your outputs/solutions/tasks which you may fail to notice during the exam.


Wish you all the best for your exam.

Please do share your thoughts/feedback after taking the exam, which will help me to update the contents accordingly and also help our fellows in Hadoop community..

 

 

79 thoughts on “CCA131 – Cloudera Administration Certification Exam Notes and Preparation Guide

  1. Hello Kannan
    Thanks for sharing. If i am thorough with these questions, are enough to take the exam ??, Please let me know.

    Thanks
    krish

  2. Dear Krish,
    Thank you so much for the guide. I tried to attempt exam on 22 november but I failed. Only 3 problems were correct.
    There was a question about setting Heap size which I didnt understand. They gave us some amount of Ram and asked to calculate and configure heap size. Can you help me what is this.
    Also there was a problem related to Kafka configuration which was configured wrong before but I was not able to solve the error.
    Can you guide me what should I do for preparing exam.
    Also there was a question about making groups for role.
    Can you help me.
    Thanks in advance
    Thanks in advance

    1. Hi Syed,

      Sorry to hear about that.

      For Kafka configuration, first you have to download and activate the Kafka parcel, then proceed to add the Kafka service. Then kafka brokers will be started. If you encounter any issues in starting Kafka broker, then look at the logs to see what’s the issue and troubleshoot accordingly.
      One of the common issue is the heap size of Kafka broker. Go to kakfa – configuration – search for ‘heap size’ and change it to recommended value.

    2. Hello Syed !!

      Excuse me for the delay in replying to your comment. My suggestion for preparing “CCA131” exam, make use of the official documentation,read it no shortcut, as well use this blog for practising cloudera official curriculum to take exam. If you can recollect what questions you come across in your last attempt and analyse them to get solutions. I hope this will help you in clearing in exam. Practice well. ALL THE BEST!!
      Thanks
      krish

  3. Hi Kannan- You’re doing a great job. Is it possible to give an idea of what kind of questions are asked related to Sqoop and Flume?

    Regards.

  4. Hi Friends,

    Thank you much for the guide. I tried to attempt exam recently but unfortunately, I got failed and 6 question was correct.

    As per my Understanding, This exam question was not high level but its all tricky questions.
    Linux administrator skill is more useful so first prepare the Linux basic command.

    There was a question on HDFS YARN configuration change parameter, Resolve the application failure issue, dynamic pool configuration, Regular expression, snapshot related question.

    My suggestion for preparing “CCA131” exam, Install the Cloudera on a laptop and practice well as per the syllabus and this blog. I hope this will help you in clearing in exam. ALL THE BEST!!

    Thanks
    Pandurang Bhadange

      1. This blog covered all syllabus, You can refer this blog as blueprint.

        My suggestion for study :

        1) Refer Itvsersity Video series :
        https://www.youtube.com/watch?v=1c5LZQsHyxk&list=PLf0swTFhTI8q9Uc7g2gs2t_ToFGzi7GEw

        2) Refer the complete list of video of Hadoop engineering playlist:
        https://www.youtube.com/watch?v=DjdhZlTGukg&list=PLY-V_O-O7h4dUA5A2E-w_Jd4J4ks27_9v

        3) Refer the Hadoop Expert admin Book : Book by Sam R. Alapati

        4)Refer the current Kannan blog

  5. Yo! Pandurang Bhadange, Can you share the 6 question which you got correct too, Please ? I’d appreciate your input. I’d like to know your experience during the test which can help me a lot I think. can you ping me if you will at viczius @ gmail.

    Thanks in Advance!

  6. Hello Kannan,

    Thank you for writing this blog to the point and clear.

    I had gone through the complete blog and had thoroughly prepared on each activities / tasks you have mentioned. I am happy to inform that I am now a Cloudera Certified Administrator of Hadoop… 🙂

    Thank you very much for your indirect support via this blog. It had really helped me a lot.

    Thanks
    Sunil

    1. Hi Sunil,

      Congrats for passing the exam, that’s a great accomplishment.
      Glad that I could be of some help, but it’s all due to your efforts and perseverance.
      Wish you all the success in your future projects.

    2. Hello Sunil
      Congrats!! How could I reach you bro, I need your inputs, tips,tricks process of clearing the exam. Please help bro, I Failed to clear it in my first attempt.
      Thanks in advance
      Vmc

  7. How do you calculate total allotable memory on a name node, where the total available memory is 41 GiB NonDFS Reserved is 7.2 and the total usage/consumption multiplier is 1.4 ?

    1. Hi Vic,

      Can you clarify what does it mean ” total usage/consumption multiplier is 1.4 “.

      By default, namenode memory + os memory = node memory. So here in this case, maximum allocable for nn is 33.6

        1. I think that this multiplier is the Java overhead memory consumption.
          The “real” memory conssumption of a Java process is the HeapSpace + Meta or Perm Space + Native Memory used by the SO to manage the java process itself. So, the way to calculate the real memory used by the NN java process is to apply the multiplier to the heap size.

          Applying all this to the sizing: (41 – 7.2) / 1.4 = 24.1 GB disposable for the NN’s heap.

          1. What would be the correct calculation to count: Allocate the maximum amount of memory for the HDFS name node in order to store max number of files. The node had 41 GB. The OS took 7.2. Keeping in mind an overhead of 1.4. The namenode and secondary name node should have equal memory where both are running on same server.
            (31-6.4)/1.3= 24.1 means 12 for each node ..

            Can any one has correct answer for this

              1. The answer was correct.
                The node had 31 GB. The OS took 6.4. Keeping in mind an overhead of 1.3. The namenode and secondary name node should have equal memory where both are running on same server.
                (31-6.4)/1.3= 24.1 means 12 for each node ..

                  1. Does anyone got a question like this and got it right using this logic?
                    (Total Memory – OS Memory) / Overhead / 2 (2 because is one NN and one Secondary NN on the same server), is that correct?

  8. Hey Kannan,

    I have cleared CCA131 couple of days back, I must thank you for your writings and guidelines, it indeed helped me to go further and give it a try. Appreciate your efforts.

    1. Hi Vin,
      That’s awesome to hear. Glad that I could be of some help but full credits to your hard efforts.
      Wish you all the success in your future projects.

  9. Dear Kannan,
    Thank You very much for this website and blog…
    I had a question in the exam:
    Node RAM is 32GB, nonDFS reserved memory is 6.2GB. Heap size 1.3X
    Cant remember exact question…
    Couldn’t answer the question correctly.

    Would be really helpful if you could explain how this question is to be solved.

    Thanks in advance..

    1. Yeah I didn’t understand this question either! There could be multiple answers and 1.3 x multiplier thing doesn’t really made sense

    2. Hi Plascio,

      The Ans depends on worker node or namenode?

      If it was for NN then (32-6.2) ~ 25 GB and maximum heap should be 19 GB ad 19 x 1.3 ~ 25Gb.

      If it was for WorkerNode then depends on roles assigned on that host example Datanode, node manager or hbase region server.

      It Should be like below:

      Available memory : (32-6.2) ~ 25 GB

      If Datanode heap 2 Gb then ~ (2 x 1.3 ) ~2.6 GB
      If hbase Region server heap 12 GB ~ (12 x 1.3) ~ 15.6 GB

      Then for yarn node manger memory left = (25 – 2.6 – 15.6 ) ~ 6.8 GB

      Thanks,
      Amit Patra

  10. Hello Kannan/Sunil,

    I need your help in getting the scenarios that you guys faced , as i am also planning to go for CCA131.

    @Kannan which official document you have mentioned in your above comment?

    For calculating the heap , what are the basic understanding we should have?

  11. Hi Kannan,

    I had this below question in my exam can you please let me know what to do ?
    I have given Max Log Size as 8gb and Maximum Log File as 4 for both the nodes but still this answer is wrong in exam ?

    Namenode and Sec NN should have max 4 logs with 8G of size each. together nn and snn must consume only 16G

    1. I had a question in the exam:
      Node RAM is 32GB, nonDFS reserved memory is 6.2GB. Heap size 1.3X
      Cant remember exact question…
      Couldn’t answer the question correctly.

      Would be really helpful if you could explain how this question is to be solved.

      1. if possible could you please re gain your memory on this question. If you can, it will help new exam takers.
        Thanks

    2. Hey Raja

      Please check with “cloudera official documentation” will help you. I guess you had a successful attempt.

    3. Hi Raja,

      The max log size is per log, so in this case you will (4*8)GB occupied by each service which is not the expected behaviour. I think changing the 8gb to 2gb should have done the trick.

      Regards,

    4. I believe I know this one. The NN, and Sec NN each should have 2GB log file size for each group where there is one NN in one group, and one Sec NN in other group. Now the max log file should be 4GB for each group. So now the total will be 2GBx4GB+2GBx4GB=16GB.

  12. I think the value should be (32-6.2)/1.3/2

    32-6.2 is the avaliable memory , As JVM will overhead , So the max value should be multiple 1.3 , in the test , it asked to have NN and DN nodes, so we have 2 nodes shared these values

    1. Hi Valery/Others who passed CCA131 exam,

      CAN SOMEONE WHO CLEARED CCA131 EXAM RECENTLY IN PAST 3 MONTHS PLEASE CALL ME ON MY MOBILE NUMBER (9886019555)

      THIS IS VERY URGENT I NEED URGENT HELP TO DISCUSS WITH YOU ON THIS.

      Thanks
      Gautam
      +91 9886019555

  13. @Kannan,
    Probably I am repeating same question here because I am not confident about the answers.. could you tell me what was the correct configurations for these 2 questions..
    1. Change logging policies of the name node and the secondary name node. Keep 4 copies, occupying max 8GB space. The total capacity can’t exceed 16B for namenode and secondary namenode.
    2. Allocate the maximum amount of memory for the HDFS name node in order to store max number of files. The node had 31 GB. The OS took 6.4. where the namenode and secondary namenode should have equal memory. keeping in mind an overhead of 1.3 heapsize. The namenode and seconadary namenodes are on same server.

    1. Under HDFS configuration Namenode (Filter) property “NameNode Max Log Size” should be 2 GB and property “NameNode Maximum Log File Backups” should be 4. So it will consume 8 GB. The same thing can be set for Secondary namenode as well.

      1. Hi Muthu, Thanks!. I have done same thing with stale configuration. My answer marked as wrong for this configuration. Am I missing anything here?

        1. Hi Krb,
          Planning to take the certification exam on coming Sunday. Any inputs will be greatly appreciated.
          1. I agree with Muthu that how log file size calculates. 4 files x 2 GB for each file = 8GB of space on Namenode and same for SNN. I am not sure why it was marked wrong. Can someone estimate why it went wrong?

          2. I haven’t understand this question completely but seems like this is sure bet question asking in certification exam. If some one could do with detailed explanation will be greatly appreciated.
          As per my understanding, NN heap memory calculation will depends on How much data we have and block size, right?
          Here is my understanding.
          NN have 31GB and SNN have 31 GB. These are separate as these name nodes are not HA.
          Allocated Memory for OS is 6.4, this means we still have (31-6.4=24.6 GB)
          Overhead heap memory is 1.3, means still we have (24.6 – 1.3 = 23.3 GB) in NN and SNN separately.

          Let’s assume we have replication 3 and 128 mb block size requires = 384 MB for each file. Suppose we have 10 DNs of each have 5 TB = 50 TB space.
          (50x1024x1024)/384 = 136533 blocks, So 136 MB of maximum heap is enough to store namespace metadata, right?

          Sorry for long comment. Please bear with me.

    1. Hi Vaqar,

      These commands won’t be provided in the exam. I’d suggest you to practice those commands in any linux machine so that you don’t have to memorize 🙂

  14. I also referred this blog and it was extremely useful to focus on on the exam stuff and I was able to get passed. Highly appreciate the author and rest of the colleagues initiation and contribution in this regard.

    1. Thanks for your kind words. I’d say more than me, people in the comments section contributing more value to everyone. Good to see!

  15. Hi All,

    Thanks for all the inputs

    I noticed that no one mentioned about the timelines

    Generally with a moderate Hadoop experienced guy,will the time be enough

    Also is it possible to go back to the previous questions once after we reach the last question

    People whoever passed would be great if they can shed more light on the approach they have followed

  16. Hello Kannan,

    Thank you for this material to help us make it!

    Do you have anything on this one:

    Revise YARN resource assignment based on user feedback

    Greetings from Mexico!!

  17. Hi Kannan,

    Can you please share the step by step procedure to setup SSL/TLS Level (0-3) for Self signed certificate for enabling TLS encryption for non-prod Cloudera Cluster. I have gone through cloudera documentation and got confused, it would be great if you can share them in more easier way.
    Thanks in advance 🙂

  18. Hello ,

    Did anyone attempted for CCA131 recently in april or may 2020 ? .. please help me with details how was the exam

  19. Hi Kannan,
    do you know how to access cloudera documentation during exam? can I directly goto google and find it or the pdf/document is available on exam screen?

  20. Hi all,

    I recently did the exam and I passed (July 2020).

    Something really different that I was not expecting was that the cluster given is now only single node (only one machine), a part from that this guide here is an excellent start point but you guys need to know that this is not enough, for instance I was asked to use sqoop to import some tables from Mysql into HDFS. Even Cloudera’s video example is showing how to configure Kafka something that is not covered here, but again everything posted here is super useful, but I advise everyone to get your hands dirty, install a cluster and practice everything there until you be able to do it without follow the instructions, try to use all the services available.

    Regarding documentation, the same webpage and correlated links you can find on docs.cloudera.com will be there available for you via your own browser. I also suggest: get yourself familiar in how to find things there.

    In addition I just wanna say thank you to Kannan, all practices showed here were very handful.

  21. Here is the Questions for CCA_131. I got Six question correct and i am failed.

    Problem 1:
    Create a file in Linux named ‘timestamp_epoch time’. Upload this file to hdfs and save the latest fsimage to Linux filesystem in such a way that it should contain the file with epoch time created earlier
    Ans:
    Issue the below command on the terminal and get the epoch values
    Date +”%s”
    Consider we got epoch value as 1234567890
    Touch timestamp_1234567890
    Export HADOOP_USER_NAME=hdfs
    Hdfs dfs –put timestamp_1234567890 /
    Open clouderaGui save the namespace
    Hdfs dfsadmin –fetchImage /home/

    Problem 2: (Rack awareness)-
    Add rack /east to master1 and gateway
    Add rack /west to worker1 and worker2

    Problem 3: (Configuration) –
    There are 2 data nodes in the cluster, ensure that the blocks in both the nodes are almost equal, and threshold of the difference between the blocks is as minimal as  possible.

    Set Balancer width of datanode to 1GB (1073741824 bytes)
    Add the balance role if it is missed
    Ans: export HADOOP_USER_NAME=hdfs
    hdfs dfsadmin -setBalancerBandwidth=1073741824
    http://www.hadoopandcloud.com/hadoop/rebalance-the-cluster/

    Problem 4: (Query redaction) –
    Mask Phone number of Customers. Replace string is 201 –xxx –xxx .

    Problem 5: (Install service)-
    Install Sqoop 1 and configure it .create symlink for mysql connector. import data from mysql table to hdfs
    ln –s mysql-connector-java-version/mysql-connector-java-version-bin.jar /var/lib/sqoop/

    sudocp mysql-connector-java-version/mysql-connector-java-version-bin.jar /var/lib/sqoop/
    Run the Sqoop import command
    sqoop import –connect jdbc:mysql://localhost/userdb –username root -P–table emp

    Problem 6: (Block size change)-
    Business needs to change block size of files from 64 MB to 128 MB for files located in /user/hive/warehouse/customer01 to customer12 directories. Please do it.
    Ans : Hadoop distcp –Ddfs.block.size= 134217728 -overwrite /user/hive/warehouse/customer01/
    /user/hive/warehouse/customer01/

    Problem 7: Configure the Fair Scheduler to resolve application delays –
    Create a resource pool with given min and max resources.
    Create a pool with given resources and set scheduling policy as DRF

    Problem 8: Configuration –
    Change logging level of Name node and Secondary Name node to WARN.
    Change logging level of Resource manager and NodeManager to DEBUG.

    Problem 9: Troubleshooting –

     job script is failing to write output, fix it. related to hdfs permission

    hdfs dfs –chmod 777 /

    Problem 10: Troubleshooting –
    Go through Namenode UI and find out parameter which is abnormal and fix it.

    Problem 11)Download hdfs and yarn client config files to the location specified or local system

    wget -O filename.zip

    Problem 12)Change hdfs thrash retention policy 

    Problem 13)Restore file from snapshot while preserving timestamps, permissions and acls. Out of available 3 snapshots, choose latest one(based on epoch value in snapshotname) (best to do it via CLI to preserve PTOPAX)

    hdfs dfs -cp -ptopax //.snapshot/1533976063/ //

    Problem 14)Install kafka, solve problem that is preventing kafka installation (Broker Heap Size was the issue had to increase it).

    Problem 15)Allocate the maximum amount of memory for the HDFS name node in order to store max number of files. The node had 31 GB. The OS took 6.4. where the namenode and secondary namenode should have equal memory. keeping in mind an overhead of 1.3 heapsize. The namenode and seconadarynamenodes are on same server.

    Problem 16)Change logging policies of the name node and the secondary name node. Keep 4 copies, occupying max 8GB space. The total capacity can’t exceed 16B for namenode and secondary namenode

    17)Create Host Template based on configuration(Data Node, Node Manager).

    18)Create Role Groups and assign host to it accordingly create role group with datanodes added, new role group should have default configs like disk filling policy(available space, etc)

    19) Hadoop Benchmark test, teragen, terasort and teravalidate.

    



Leave a Reply

Your email address will not be published. Required fields are marked *