Creating an External Table with the Oracle Big Data Connector for Hadoop Data File Storage

This is a followup to the installation blog.  It assumes you’ve done everything there for the installation.

Here’s a complete example for creating an external table with a csv file on the Hadoop File System.

First let’s login as hdfsuser and create the external table:

drop table hdfsuser.emp_ext_tab1;

CREATE TABLE hdfsuser.emp_ext_tab1
( empno number(4,0),
ename VARCHAR2(10),
job VARCHAR2(9),
mgr number(4,0),
HIREDATE date,
sal number(7,2),
comm number(7,2),
deptno varchar2(4)

)
ORGANIZATION EXTERNAL
( TYPE oracle_loader
DEFAULT DIRECTORY "EXTERNAL_DIR"
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE
PREPROCESSOR HDFS_BIN_PATH:hdfs_stream
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '\"' LDRTRIM
MISSING FIELD VALUES ARE NULL
(empno integer external(4) ,
ename char(10) ,
job char(9) ,
mgr integer external(4) ,
HIREDATE char(10) date_format date mask "MM/DD/YYYY" ,
sal decimal external(7) ,
comm decimal external(7),
deptno integer external(4)
)
)
LOCATION ( 'hdfsuser_emp')
);

Ok lets go through this,
DEFAULT DIRECTORY “EXTERNAL_DIR” — is where the location, log, and bad files will be written
PREPROCESSOR HDFS_BIN_PATH:hdfs_stream — is the path and location of our processor file hdfs_stream that we added with the install of the HDFS connector
LOCATION ( ‘hdfsuser_emp’) — is where our location file will be written when we run the hadoop external table creation script

I added the drop table, because I don’t use many external tables and usually have to correct something, so I’ll drop and recreate the table as I tweak the code.

Create a csv file from scott.emp.

example data from scott.emp placed in the emp_table.csv file:
7369,”SMITH”,”CLERK”,7902,”17-Dec-1980″,800,,20
7499,”ALLEN”,”SALESMAN”,7698,”20-Feb-1981″,1600,300,30
7521,”WARD”,”SALESMAN”,7698,”22-Feb-1981″,1250,500,30
7566,”JONES”,”MANAGER”,7839,”02-Apr-1981″,2975,,20
7654,”MARTIN”,”SALESMAN”,7698,”28-Sep-1981″,1250,1400,30
7698,”BLAKE”,”MANAGER”,7839,”01-May-1981″,2850,,30
7782,”CLARK”,”MANAGER”,7839,”09-Jun-1981″,2450,,10
7788,”SCOTT”,”ANALYST”,7566,”19-Apr-1987″,3000,,20
7839,”KING”,”PRESIDENT”,,”17-Nov-1981″,5000,,10
7844,”TURNER”,”SALESMAN”,7698,”08-Sep-1981″,1500,0,30
7876,”ADAMS”,”CLERK”,7788,”23-May-1987″,1100,,20
7900,”JAMES”,”CLERK”,7698,”03-Dec-1981″,950,,30
7902,”FORD”,”ANALYST”,7566,”03-Dec-1981″,3000,,20
7934,”MILLER”,”CLERK”,7782,”23-Jan-1982″,1300,,10

Copy the file to the hdfs:
hadoop fs -copyFromLocal emp_table.csv /user/hdfsuser/data/emp_table.csv

Hadoop will create the full path if it is not created.

Use http://hadoopnamenode:8020 and browse the file system to find the emp_table.csv file, where hadoopnamenode has been cloud-alocal for the install example.

Publish the HDFS Data Paths:

You have to use the long command found in the Oracle Big Data Connectors User’s Guide. There may be a few things to watch out for:

Beware of the hdfs.exttab.connection.url a good reference for what this should be is at: http://www.orafaq.com/wiki/JDBC#Thin_driver

$ $HADOOP_HOME/hadoop jar \
> $DIRECTHDFS_HOME/jlib/orahdfs.jar oracle.hadoop.hdfs.exttab.ExternalTable \
> -D oracle.hadoop.hdfs.exttab.tableName=EMP_EXT_TAB1 \
> -D oracle.hadoop.hdfs.exttab.connection.url=jdbc:oracle:thin:@odbs:1521:or
cl \
> -D oracle.hadoop.hdfs.exttab.connection.user=hdfsuser \
> -D oracle.hadoop.hdfs.exttab.datasetPaths=hdfs:/user/hdfsuser/data/emp_table.c
sv -publish
[Enter Database Password:]
Oracle Direct HDFS Release 1.0.0.0.0 - Production

Copyright (c) 2011, Oracle and/or its affiliates. All rights reserved.

EMP_EXT_TAB1 publish operation succeeded

Details:

EXTERNAL_DIR:hdfsuser_emp file created with the following paths
hdfs://cloud-alocal.enkitec.com/user/hdfsuser/data/emp_table.csv

Note if you cut and paste the “>” would be removed. Now if you look in the $ORACLE_BASE/external_dir you’ll find the file hdfsuser_emp specified in the external table command.

[oracle@odbs external_dir]$ more hdfsuser_emp
CompressionCodec=
hdfs://cloud-alocal.enkitec.com/user/hdfsuser/data/emp_table.csv

If you run your external table select now you should get the same results as if you ran “select * from scott.emp”

SQL> select * from emp_ext_tab1;

EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPT
———- ———- ——— ———- ——— ———- ———- —-
7369 SMITH CLERK 7902 17-DEC-80 800 20
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30
7566 JONES MANAGER 7839 02-APR-81 2975 20
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30
7698 BLAKE MANAGER 7839 01-MAY-81 2850 30
7782 CLARK MANAGER 7839 09-JUN-81 2450 10
7788 SCOTT ANALYST 7566 19-APR-87 3000 20
7839 KING PRESIDENT 17-NOV-81 5000 10
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30
7876 ADAMS CLERK 7788 23-MAY-87 1100 20

EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPT
———- ———- ——— ———- ——— ———- ———- —-
7900 JAMES CLERK 7698 03-DEC-81 950 30
7902 FORD ANALYST 7566 03-DEC-81 3000 20
7934 MILLER CLERK 7782 23-JAN-82 1300 10

14 rows selected.

All Done!

One thought on “Creating an External Table with the Oracle Big Data Connector for Hadoop Data File Storage

  1. Pretty nice post. I just stumbled upon your weblog and wished to say that I’ve really enjoyed browsing your blog posts. After all I will be subscribing to your rss feed and I hope you write again soon!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s