Set up a local CDH repository


This post will explain you how to set up a local YUM/CDH repository for your network.

In Linux, /etc/yum.repos.d is the path for yum repos present in the server. For every repo , there will be a baseurl value which contains the link for the repository path.

When you execute “yum install packagename” the yum will look go through each repos and contact baseurl via internet for the availability of packagename you’ve given. If there’s no internet connectivity, baseurl can’t be reached and the command will fail. In organizations, it’s prohibited to download packages from external sites/repositories, so they’ll create a repo satellite and put all the necessary packages/rpms in the satellite, from there we can download the packages.

In this task, we are going to download the CDH repos to our server and create a local repository in the server, so that the other servers in our network can contact this local repo instead of cloudera for installing CDH packages.

You need internet connection to download the packages for the first time to set up the repository.


Step 1: Download the repo to your machine

RHEL / Cent OS 6 :

# wget

RHEL / Cent OS 7:

# wget

After wget, move the cloudera-cdh5.repo to /etc/yum.repos.d for understanding.


Step 2: Install webserver

We need webserver to be installed in this server, so that others can access the rpms through http.

# yum install httpd -y

This will create a /var/www/html directory. Whatever files you place under this directory can be accessed via http.

# service httpd start


Step 3: Install yum-utils and createrepo

The yum-utils package includes the reposync command, which is required to create the local Yum repository and createrepo will create a repo file.

# yum install yum-utils createrepo -y


Step 4: Fetch the rpms of CDH5 repo to your server

# reposync -r cloudera-cdh5

This command will download all the available rpms in cloudera-cdh5 repo (wget’d in step 1) to your server.

Copy the RPMs inside the downloaded directory to /var/www/html/cdh/5/rpms/ folder.

Now you should be able to access the rpms in browser via url “http://servername/cdh/5/rpms”.


Step 5: Create a repo file

Inside /var/www/html/cdh/5/ folder, run the below command.

# createrepo .

This creates or update the metadata required by the yum command to recognize the directory as a repository. The command creates a new directory called repodata.

Edit the repo file you downloaded in step 1 and replace the line starting with baseurl as baseurl=http://servername/cdh/5/, using the URL from step 4. Save the file back to /etc/yum.repos.d/.


Step 6: Local CDH repository created

Distribute the /etc/yum.repos.d/cloudera-cdh5 to all of your servers. Now they can download the rpms from this machine without a need of connecting to the internet.

Leave a Reply

%d bloggers like this: