Fair scheduling is a method of assigning resources to applications such that all apps get, on average, an equal share of resources over time. Using fair scheduler we can separate pools(queues) for each team and configure the resources for the pool which will help in overcoming application delays. In the exam you may be asked […]
Continue ReadingTag: exam notes
Determine reason for application failure
There are many possible causes for a job/application failure varying from code error, environments, files availability, permissions, mapreduce/yarn configuration, resources allocation and even due to server i/o, network issue etc., So the first thing you’ve to do when a job fails is, to look at the error message and correlate with your job. If an […]
Continue ReadingBenchmark the cluster (I/O, CPU, network)
Benchmarking is the process of stress testing the resources of the cluster. It’s very useful in understanding the performance of your cluster and to check whether it’s performing as expected before taking it live. Here we are going to test speed in which files are being read/write in HDFS, time taken for mappers/reducers to process […]
Continue ReadingResolve performance problems/errors in cluster operation
This is again another scenario based topic. Some of the common performance problems are jobs running slowly, services crash due to out of memory, etc., Example: When I’m adding Yarn roles, one of the node managers failed to start. In this case, we have to identify the cause for the failure. Select Log files dropdown […]
Continue ReadingResolve errors/warnings in Cloudera Manager
This is a typical scenario based question and the solution is solely depend upon the errors/warnings appears in the cluster. Some examples: The warnings could be space issue, service health status, low resources allocations, etc., The errors could be log directories are full, services down and other critical events. In these scenarios, click on […]
Continue ReadingExecute file system commands via HTTPFS
HttpFS is a service that provides HTTP access to HDFS. i.e we can access the HDFS from other filesystems from browsers, and using programming languages. HttpFS has a REST HTTP API supporting all HDFS filesystem operations (both read and write). Using HttpFS, we can Read and write data in HDFS using HTTP utilities (such as […]
Continue ReadingSet up alerting for excessive disk fill
Alert Publisher, one of the Cloudera’s management services, used to send alert notifications by email or by SNMP. Service instances of type HDFS, MapReduce can generate alerts if so configured. Alerts can also be configured for the monitoring roles that are a part of the Cloudera Management Service. Go to CM – Cloudera Management Service […]
Continue ReadingConfigure a service using Cloudera Manager
Configuring a service is one of the hardest tasks in Apache Hadoop, but Cloudera Manager has made our lives easier. For any configurations, you only have to give the values for the property, then Cloudera Manager will take care of updating the dependent services configuration, updating all the conf files (hdfs-site, yarn-site, core-site,etc) and deploy […]
Continue ReadingAdd a new node to an existing cluster
This task’s steps are as same as the steps involved in installing CDH but the scenario is different, as we’re adding hosts to an existing cluster. Once the cluster is setup and running, you may have the requirement for adding new nodes to the cluster. To do that, ensure initial OS configurations/prechecks are complete. Go […]
Continue ReadingInstall CDH using Cloudera Manager
After the installation of Cloudera Manager (SCM server), we can install CDH on our hosts using Cloudera Manager. Step 1: Login to CM url. When you login to CM for the first time after the installation of Cloudera SCM server, the login will redirect you to the following steps. Select the desired edition. Choose the […]
Continue Reading