There are three ways to install the Cloudera Manager(server, agents), CDH and services. Automated installation by Cloudera Manager Install using Cloudera Manager parcels/packages Manual installation using Cloudera Manager Tarballs Automated installation by Cloudera Manager: This is the most preferred way to install Cloudera Manager in non-production/test environments. This is not recommended for production deployments. […]
Continue ReadingTag: exam notes
Create encrypted zones in HDFS
Encryption at rest is the process of encrypting the data stored in the HDFS. This is a very advanced topic and to create an encryption zone you need to do below steps. Enable Kerberos Enable TLS/SSL Add Java Keystore KMS service (This will act as KTS as well) In production environment, you need to create […]
Continue ReadingInstall and configure Sentry
Before adding Sentry, below are the general prerequisites need to be done. This may be mentioned in the problem description. Please confirm the hive warehouse directory detail in /etc/hive/conf/hive-site.xml file. The Hive warehouse directory (/user/hive/warehouse) must be owned by the Hive user and group and should have 771 permissions. # sudo –u hdfs hadoop fs […]
Continue ReadingAdd a service using Cloudera Manager
Your running cluster will be having only core services (HDFS, YARN, Zookeeper) or handful of services and your task is to add a specific service to the cluster. To add a service: Go to CM – click the drop down box near the cluster – select Add service. You will get a list of services […]
Continue ReadingCreate/restore a snapshot of an HDFS directory
HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a directory of the file system or the entire file system. To enable a snapshot on a specific directory, Go to CM – HDFS – File Browser Select the directory in the file browser, select ‘Enable Snapshots’ in the right […]
Continue ReadingConfigure Hue user authorization and authentication
When you access Hue after installation, by default the first user that logs into Hue becomes the first admin user. CM – Hue – Web UI After you logged in, go to admin tab – Manage Users Now provide the username of the user you want to provide access and add them in appropriate profile/groups. […]
Continue ReadingInstall new type of I/O compression library in cluster
File/data compression brings two major benefits: it reduces the space needed to store files and it speeds up data transfer across the network or to or from disk. When dealing with large volumes of data, both of these savings can be significant. Hadoop supports the following compression types and codecs: gzip – org.apache.hadoop.io.compress.GzipCodec bzip2 – […]
Continue ReadingCommission/decommission a node
When you want to remove the node from the cluster, you shouldn’t just delete the cloudera agents, services installed as it will impact the whole cluster. You should go for decommission first. Decommissioning a host decommissions and stops all roles on the host without requiring you to individually decommission the roles on each service. After […]
Continue ReadingRebalance the cluster
In HDFS, the blocks of the files are distributed among the datanodes as per the replication factor. Whenever you add a new datanode, the node will start receiving,storing the blocks of the new files. Though this sounds alright, the cluster is not balanced when you look at administrative point view. HDFS provides a balancer utility […]
Continue ReadingConfigure proxy for Hiveserver2/Impala
A proxy server is a server or application that acts as an intermediary for requests from clients seeking resources from other servers/applications. A client connects to the proxy server, requesting some service or resource available from a different server and the proxy server evaluates the request and sends it to the intended server/service/application. The server’s […]
Continue Reading