Sunday, May 21, 2017

Hive JDBC in Cloudera Hadoop

I am going to stick with Cloudera Quickstart VM, which saves me a lot of time on buggy messy configuration. Now I try to bridge the gap between Hive and unstructured data by JDBC. Forget python. Java is the native language in Hadoop.

1. install hive jdbc driver

https://www.cloudera.com/documentation/enterprise/5-5-x/topics/cdh_ig_hive_jdbc_install.html

$ sudo yum install hive-jdbc  # red-hat

Add /usr/lib/hive/lib/*.jar and /usr/lib/hadoop/*.jarto classpath.

2 write java codes

https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBC

import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Connection;
import java.sql.Statement;
import java.sql.DriverManager;

public class etl {
       private static String driverName = "org.apache.hive.jdbc.HiveDriver";
       public static void main(String[] args) throws SQLException {
    try {Class.forName(driverName);}
    catch(ClassNotFoundException ex) {
       System.out.println("Error: unable to load driver class!");
       System.exit(1);
    }    
          // get connection, user and password are ignored in non-secured mode
          Connection con = DriverManager.getConnection("jdbc:hive2://localhost:10000/default", "cloudera", "cloudera");
          Statement stmt = con.createStatement();
       // execute statement
          ResultSet res = stmt.executeQuery("SELECT * FROM employee ");
          System.out.println("Result:");
          System.out.println(" ID \t Name \t Salary \t Designation  ");
          while (res.next()) {
             System.out.println(res.getInt(1) + " " + res.getString(2) + " " + res.getDouble(3) + " " + res.getString(4));
          }
          con.close();
       }
    }

notes:

driverName is for hiveServer2. the previous version has longer driver name.
table employee is prepared as in my previous blog or https://www.tutorialspoint.com/hive/
connection is “jdbc:hive2://localhost:10000/default”, username and password can be empty string.
Java class ResultSet is weird and quite different from Python cursor. It provides a getter methods such as getInt, getString, getDouble and requires a column index numbered from 1. This means the strict data type enforcement. The nextmethod moves the object cursor to the next row so a while loop can iterate through the result set.

3. compile and run

This is the most buggy part. I saw in StackOverflow, someone suggest run it as:

javac -cp . etl.class
java -cp . etl

But no matter how I tried(change configuration here and there, tinker codes here and there), I always got java - ClassNotFoundException. I guess the reason is that the Java compiler or JVM doesn’t read classpath as supposed.

Fortunately, I got a friend studying PhD in Hadoop. He simply use IDE to add classpath instead of writing classpath in bashrc. To be more specific:

eclipse -new -> new java project -> build java path, add external JARs
local at /usr/lib/hive/lib/*.jar, add all JARs
run as application

mySQL

This is another topic. I will keep it here in case I will need in the future.

brew install mysql
We've installed your MySQL database without a root password. To secure it run:
    mysql_secure_installation

MySQL is configured to only allow connections from localhost by default

To connect run:
    mysql -uroot

To have launch start mysql now and restart at login:
  brew services start mysql
Or, if you don't want/need a background service you can just run:
  mysql.server start
==> Summary
🍺  /usr/local/Cellar/mysql/5.7.18_1: 321 files, 234.5MB

3 comments:

AnonymousOctober 5, 2017 at 2:25 PM
Nice summary. To compile the program I think you meant
javac etl.java
Also you can specify the classpath with
export CLASSPATH=/usr/lib/hive/lib/hive-jdbc-standalone.jar:.:
then run the progam as
java etl
ReplyDelete
Replies
Beyond CornerFebruary 26, 2018 at 2:59 AM
This comment has been removed by the author.
ReplyDelete
Replies
Beyond CornerFebruary 26, 2018 at 3:01 AM
Nice Article with great information with Java,

To learn complete about Apache Hive. Visit Learn Apache Hive
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.