Configure a service using Cloudera Manager

Configuring a service is one of the hardest tasks in Apache Hadoop, but Cloudera Manager has made our lives easier.

For any configurations, you only have to give the values for the property, then Cloudera Manager will take care of updating the dependent services configuration, updating all the conf files (hdfs-site, yarn-site, core-site,etc) and deploy the client configuration.

For configuring a service, go to CM – Service – Configuration section – using search option you can find the relevant property.

Ex, to configure the datanode to reserve 5 GB for non-dfs usage (mapreduce temp output, log files) of their total disk capacity,

Go to HDFS – configuration – search ‘space or reserved or disk’

As you see , the property is listed first, configure/modify it to 5 GB – save changes – deploy Client Config.

Similarly, to change job history log location

Go to Yarn – configuration – search ‘log’

This is very simple to do, only thing is you have to know which service the given configuration is referring to.

You can also search using the search bar in the right top corner of the CM page and look for the relevant properties.

Problem Scenarios:

· Configure datanodes to tolerate 1 failed volumes.

· Change the datanodes log dir to /hadoop/logs

· Configure Yarn to retain only last 5 days logs.

Thus we covered how to Configure a service using Cloudera Manager

Use the comments section below to post your doubts, questions and feedback.

Please follow my blog to get notified of more certification related posts, exam tips, etc.



9 thoughts on “Configure a service using Cloudera Manager

  1. hi when i change datanodes to tolerate 1 failed volumes. It ask to restarts and few services like hdfs yarn and related services won’t start unless i change datanodes to tolerate 0 failed volumes

    1. Hi Yogesh,

      Whenever you make any config changes, the service will go in stale state which requires a service restart. Since HDFS has many dependents such as Yarn, Hive/Spark/Oozie, etc all the services will be restarted when you try to stale restart HDFS.

      Also when you’re changing tolerate 1 failed volumes for datanodes, ensure datanodes has two volumes dedicated for data dir.

      i.e Datanode data directory ( property should have multiple volumes(Filesystems) specified.

      Hope this helps.

  2. also another query when you you try to make changes of the job history log path it gives you a pop saying ‘There is only one instance of this role in this service. We recommend you make configuration changes on the service configuration page instead.’ so my question is do we have to make changes at instance level or at service level? will this be clearly specified(service or instance) in the question during CCA 131 exam?
    If nothing is specified in the question then which method should we use?

    1. In a hadoop admin point of view, if you’ll be adding multiple role instances in the future, it’s better to make service level configurations. If it’s going to be only one role instance, then instance or service level, both are fine.

      But this varies for the following services. Flume, Kafka are data streaming services, so if you have multiple role instances each one will be having different configuration as per incoming data. So, it’s better to make changes to instance level rather than service level configuration.

      In the exam, watch out for keywords like “multiple instances will be added”, “later they’ll be adding more” etc, if they are present and the services are flume,kafka then choose “cancel” in the popup and configure instance level. If it’s any other service, you can go for “service level”

      Check the sample question given in Cloudera. . It’s Flume configuration with more gateways to be added later, so they chose with instance level.

      For your question, you can select “service level configuration” for job history.

Leave a Reply

Your email address will not be published. Required fields are marked *