Installing Oracle’s Big Data Connectors for HDFS and SQL*Loader

Published: 4/27/2012 Last Revised: 5/10/2012

I’m trying to learn all that I can on Oracle’s Big Data.  Installing the Big Data Connectors from Oracle looks like a good place to start.  Here’s how I installed them.

The Oracle Direct Connector for Hadoop Distributed File System (ODC HDFS) and the Oracle Loader for Hadoop (ODC OLH) seemed like good choices to get started on the Big Data.

Download the two connector files from the Oracle Big Data Connector Download. On second thought get them all if you plan to do more in the future with Oracle’s Big Data Connectors.

Before we go too much further, I’m assuming you have a machine with Oracle to play with and another with Cloudera’s Apache Hadoop. If not, and your like me (not hardware blessed), you should consider getting Oracle Virtual Box on you desktop. I have 8G Ram on a Windows 7 box. I created 2 Linux VM’s.

VM1 – RDBMS on OEL6u2

  • Oracle Enterprise Linux 6 Upgrade 2 64 bit (OEL6U2_64)
  • Oracle’s RDBMS 11g Enterprise

The VM Settings on VM1:

  • Base Memory: 3072 MB
  • 4 Processors (1 will work fine for learning)
  • Hard Disk: 30 G – Dynamic (to save space, static would be faster)
  • Network: Host-Only Adapter
  • No Firewall and No Security (SELINUX=disabled)

For the 2nd VM I used Cloudera’s pre-built instance.  You can get this on Cloudera’s Site, download here. There are also VMWare and KVM virtual machine downloads in addition to VirtualBox.

VM2 – Cloudera Hadoop 

  • Cloudera’s VM Instance

VM Settings on VM2:

  • OS:Linux – Red Hat (64 bit)
  • Base Memory: 2048 MB
  • Network: Host-Only Adapter

I used the oracle-rdbms-server-11gR2-preinstall announced on their Linux Blog (HERE) and it made the install go quickly.

Make sure both machines can communicate with each other and your host machine.  Let’s give them names here for communication purposes, the RDBMS server can be “odbs” for Oracle Database Server, and the Cloudera VM can be “cloud-alocal”.  If you stick with the Host-Only network and are on Windows, I think it will be a 192.168.56.x network.  You assign names to each VM and the host will be  If you have any trouble with the host-only, you may try the bridge adapter network making sure you pick the same adapter for both machines and use DHCP or static addresses.  I like the Host – Only since it isolates my VM’s from the rest of the network.

I’m also assuming you’ve done all the basic database checks to see that sqlplus logins work and your listener is up, la, la, la…   If you’ve lived in a cave for awhile, it’s a good time to brush on the 11g install.

If you haven’t looked at it already, you may want to look at the Oracle Support Master Note 1416116.1 “Master Note for Big Data Appliance Integrated Software.  It seems to have good info for the Oracle Big Data Connectors.

On the odbs we need to install only the basic hadoop software.  In preparation to loading the Cloudera Hadoop 0.20.2, we need to install the Java software.  I used the jdk-6u31-linux-x64-rpm.bin downloaded from Oracle at the Java Download Page, changed the permissions (chmod o+x), and ran the binary.  When it finished the binary was installed under /user/java/jdk1.6.0_31.  I used the path /usr/java/latest, which is a pointer to the latest install, for all my paths later in this installation so I won’t have to go back after upgrading the JDK at some point later.

Still on odbs, now with Oracle Java installed, we are ready to install Cloudera Hadoop.

as root create a user hdfs:

useradd  -d /home/hadoop -m hdfs

/home/hdfs works too, if you so choose.  If you name the user hadoop, the install will switch to the user hdfs.  Add hdfs to the wheel group and make sure the wheel group is commented in /etc/sudoers.
Now logging in as user hdfs (still on odbs).

On a side note Cloudera has free training videos and some good documentation on their website.  You might take a look around while you’re there.  Below is one way to install the Cloudera Hadoop, see the CDH3 Installation Guide for others if this doesn’t work for you.

1) download the package:

Red Hat / CentOS 5 at:

Red Hat / CentOS 6 at:


2) Use yum

sudo yum --nogpgcheck localinstall cdh3-repository-1.0-1.noarch.rpm

Get the GPG Key:



Check your yum repository now:

$ yum search hadoop

then install

$ sudo yum install hadoop-0.20

When it is finished check the hadoop install:

$ hadoop version

Should render a version #

You may want to set up alternatives for the hadoop configuration. There’s a good document on this at Cloudera’s Site. /etc/alternatives shows all the alternatives set up. Cloudera sets up one called hadoop-0.20 for this version of the install.

$ ls /etc/alternatives

to see the full list.

$ sudo cp -r /etc/hadoop-0.20/conf.empty /etc/hadoop-0.20/conf.my_cluster
sudo chmod -R 755 /etc/hadoop-0.20/conf.my_cluster

Enter the password.

You might want to add a profile if you haven’t already. Add /usr/sbin to the PATH for the next step.

sudo alternatives --display hadoop-0.20-conf
[sudo] password for hdfs:
hadoop-0.20-conf - status is auto.
link currently points to /etc/hadoop-0.20/conf.empty
/etc/hadoop-0.20/conf.empty - priority 10
Current `best' version is /etc/hadoop-0.20/conf.empty.

and you can see that the /etc/hadoop-0.20/conf.empty is the currently selected alternative. Let’s change it to the new one you created.

sudo alternatives --install /etc/hadoop-0.20/conf hadoop-0.20-conf /etc/hadoop-0.20/conf.my_cluster 50

sudo alternatives --config hadoop-0.20-conf

There are 2 programs which provide ‘hadoop-0.20-conf’.

Selection Command
1 /etc/hadoop-0.20/conf.empty
*+ 2 /etc/hadoop-0.20/conf.my_cluster

Select your new alternative if it is not already selected.

sudo to root

sudo su -

Edit the file /etc/hadoop/conf/core-site.xml


The default port is: 8020
and cloud-alocal is our instructional VM we created or a target hadoop node or cluster

While you’re sudo’d to root make the changes to the following files as well:

Changes shown in RED

edit the /etc/hadoop/conf/ file shown edited below:


# Set Hadoop-specific environment variables here.
# The only required environment variable is JAVA_HOME.  All others are
# optional.  When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.
# The java implementation to use.  Required.
export JAVA_HOME=/usr/java/latest
# Extra Java CLASSPATH elements.  Optional.
export ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1
export HADOOP_CLASSPATH="$ORACLE_HOME/jdbc/lib/ojdbc6.jar:$ORACLE_HOME/oraloader-*:$HADOOP_CLASSPATH

Note: If you have multiple ORACLE_HOME’S, you might prefer to use ORACLE_BASE or /etc for your install location for the ODC HDFS or ODC OLH installs.
I’d prefer a standard place where I never have to involve a system admin, they’re busy enough.
exiting from root, login as oracle on odbs
Let’s add a few variables to .bash_profile:
somewhere after ORACLE_HOME gets set:

export HADOOP_HOME=/usr/bin
export HADOOP_CLASSPATH="$ORACLE_HOME/jdbc/lib/ojdbc6.jar:$ORACLE_HOME/jlib/oraclepki.jar"
Logout and login – or you can keep executing the .bash_profile every time you create a new session (easier to logout and login)
check the hadoop connectivity to cloud-alocal:

$ hadoop fs -ls /user

Found 2 items
drwxr-xr-x   - cloudera supergroup          0 2012-04-05 14:00 /user/cloudera
drwxr-xr-x   - hue      supergroup          0 2012-03-14 20:22 /user/hive

Try a few commands:

$ touch test_src.lst
$ hadoop fs -copyFromLocal test_src.lst /user/scott/data/test.lst

Try hadoop fs to get the usage:

$ hadoop fs

Open a browser and go to http://cloud-alocal:50070, Click – “browse the file system“, and navigate to /user/scott/data and see your file:



So now you have a file in the Hadoop file system, and if you search for that full directory path and file you won’t see it on the OS file system.

Let’s put the connector code in place:

Go to the download directory for the zipped connector files:

unzip -d $ORACLE_HOME
unzip -d $ORACLE_HOME

Modify hdfs_stream at $ORACLE_HOME/orahdfs-
Changes in RED
export HADOOP_HOME=/usr
export HADOOP_CONF_DIR=/etc/hadoop/conf
# set DIRECTHDFS_HOME to install directory for DirectHDFS
export ORAHDFS_JAR=${DIRECTHDFS_HOME}/jlib/orahdfs.jar

A correct run should show the usage info:

$ ./hdfs_stream
Oracle Direct HDFS Release - ProductionCopyright (c) 2011, Oracle and/or its affiliates. All rights reserved.Usage: $HADOOP_HOME/bin/hadoop jar orahdfs.jar oracle.hadoop.hdfs.exttab.HdfsStream <locationPath>

Otherwise check in $DIRECTHDFS_HOME/log for the log files and debug your changes.
Create the external directory on odbs & grant the following as well:
mkdir $ORACLE_BASE/external_dir
chmod 770 $ORACLE_BASE/external_dir

From sqlplus execute the following:

sqlplus /nolog
connect / as sysdba

create or replace directory external_dir as '/u01/app/oracle/external_dir';
create or replace directory hdfs_bin_path as '/u01/app/oracle/product/11.2.0/dbhome_1/orahdfs-';
create user hdfsuser identified by hdfsuser;
grant create session to hdfsuser;
grant create table to hdfsuser;
grant execute on sys.utl_file to hdfsuser;
grant read, write on directory external_dir to hdfsuser;
grant read, execute on directory hdfs_bin_path to hdfsuser;

Make sure all statements executed correctly.

Now you’re all done.

Guess you need some examples to make this worth the effort. Here’s an external table example.

Revised: 5/8/2012 – Added notes about alternative config
Revised: 5/10/2012 – added permissions for conf.my_cluster

2 thoughts on “Installing Oracle’s Big Data Connectors for HDFS and SQL*Loader

  1. Will Oracle Big Data Connector work with an Oracle database on Red Hat Linux and and an existing Apache Hadoop Cluster, which we have in our organization? Or it is specific to Oracle Linux and Cloudera Hadoop?

    • I haven’t tried it. I think it should work fine. I’m guessing, but I’ll bet you could load the cloudera on the database node and connect to your existing hadoop cluster. Let me know how it works out!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s