I am going to stick with Cloudera Quickstart VM, which saves me a lot of time on buggy messy configuration. Now I try to bridge the gap between Hive and unstructured data by JDBC. Forget python. Java is the native language in Hadoop.
1. install hive jdbc driver
$ sudo yum install hive-jdbc # red-hat
Add
/usr/lib/hive/lib/*.jar
and /usr/lib/hadoop/*.jar
to classpath.2 write java codes
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Connection;
import java.sql.Statement;
import java.sql.DriverManager;
public class etl {
private static String driverName = "org.apache.hive.jdbc.HiveDriver";
public static void main(String[] args) throws SQLException {
try {Class.forName(driverName);}
catch(ClassNotFoundException ex) {
System.out.println("Error: unable to load driver class!");
System.exit(1);
}
// get connection, user and password are ignored in non-secured mode
Connection con = DriverManager.getConnection("jdbc:hive2://localhost:10000/default", "cloudera", "cloudera");
Statement stmt = con.createStatement();
// execute statement
ResultSet res = stmt.executeQuery("SELECT * FROM employee ");
System.out.println("Result:");
System.out.println(" ID \t Name \t Salary \t Designation ");
while (res.next()) {
System.out.println(res.getInt(1) + " " + res.getString(2) + " " + res.getDouble(3) + " " + res.getString(4));
}
con.close();
}
}
notes:
- driverName is for hiveServer2. the previous version has longer driver name.
- table employee is prepared as in my previous blog or https://www.tutorialspoint.com/hive/
- connection is “jdbc:hive2://localhost:10000/default”, username and password can be empty string.
- Java class
ResultSet
is weird and quite different from Python cursor. It provides a getter methods such as getInt, getString, getDouble and requires a column index numbered from 1. This means the strict data type enforcement. Thenext
method moves the object cursor to the next row so a while loop can iterate through the result set.
3. compile and run
This is the most buggy part. I saw in StackOverflow, someone suggest run it as:
javac -cp . etl.class
java -cp . etl
But no matter how I tried(change configuration here and there, tinker codes here and there), I always got java - ClassNotFoundException. I guess the reason is that the Java compiler or JVM doesn’t read classpath as supposed.
Fortunately, I got a friend studying PhD in Hadoop. He simply use IDE to add classpath instead of writing classpath in bashrc. To be more specific:
- eclipse -new -> new java project -> build java path, add external JARs
- local at
/usr/lib/hive/lib/*.jar
, add all JARs - run as application
mySQL
This is another topic. I will keep it here in case I will need in the future.
brew install mysql
We've installed your MySQL database without a root password. To secure it run:
mysql_secure_installation
MySQL is configured to only allow connections from localhost by default
To connect run:
mysql -uroot
To have launch start mysql now and restart at login:
brew services start mysql
Or, if you don't want/need a background service you can just run:
mysql.server start
==> Summary
🍺 /usr/local/Cellar/mysql/5.7.18_1: 321 files, 234.5MB
Nice summary. To compile the program I think you meant
ReplyDeletejavac etl.java
Also you can specify the classpath with
export CLASSPATH=/usr/lib/hive/lib/hive-jdbc-standalone.jar:.:
then run the progam as
java etl
This comment has been removed by the author.
ReplyDeleteNice Article with great information with Java,
ReplyDeleteTo learn complete about Apache Hive. Visit Learn Apache Hive