Saturday, May 20, 2017

Hadoop Hive, local setup

I have spent many hours to get hive run locally in MacOS but couldn’t make it. Last time I get to the very end of this tutorial except the last step. This time I proceed a little further but bugs keep poping up:
$ hdfs dfs -mkdir /user
Cannot create directory /user. Name node is in safe mode.
$ hdfs dfsadmin -safemode leave
Safe mode is OFF
$ hdfs dfsadmin -safemode get
Safe mode is ON
Anyway, I try to record every step of my journey.
According to Quora, the minimum requirement for a local machine is 500 GB. This may be reason I failed.

Download tar files from respective official sites:
  1. oracle Java SE
  2. hadoop:
  3. hive
  4. derby

bash command line refresh

export varname=value  # export a variable to environment
env  # disply all environment variables, note that different shells have different default env variables
cat .bash_profile  # see a file in command window
less .bash_profile  # another way to see, less overwhelming
echo $varname # display variable value, note the dollar sign
eval $fun  # evaluate function
history  # display command history
hash     # display command history and path
pwd  # equal to echo $PWD which is a buit-in variable
let arg1=2  # define variable value, space is forbidden
let arg2=$arg1**3
echo $arg2
printf "result=%d\n" $arg2
Most compiler/commands are stored at /usr/local/bin .

path setup

# setup environment for hadoop
export HADOOP_HOME=/usr/local/hadoop-2.8.0    

# setup environment for hive
export HIVE_HOME=/usr/local/apache-hive-2.1.1-bin 
export PATH=$PATH:$HIVE_HOME/bin
export CLASSPATH=$CLASSPATH:/usr/local/hadoop-2.8.0/lib/*:.
export CLASSPATH=$CLASSPATH:/usr/local/hive-2.1.1/lib*:.

# setup environment for Derby
export DERBY_HOME=/usr/local/db-derby-
export CLASSPATH=$CLASSPATH:$DERBY_HOME/lib/derby.jar:$DERBY_HOME/lib/derbytools.jar

hadoop initialize and commands

cd /usr/local/hadoop-2.8.0/
hdfs namenode -format
sbin/   # start Hadoop file system
# open http://localhost:50070/  
# open http://localhost:8088/  

hadoop fs -mkdir /tmp 
hadoop fs -mkdir -p ~/hive/warehouse #also make pararent dir 
hadoop fs -chmod 777 /user  # change permission of file or folder
hdfs dfs -mkdir /user/hadoop  # make folder
hdfs dfs -put a.csv /user/hadoop/a.csv # move from local to HDFS
hdfs dfs -ls /user/hadoop  # list content of a folder
hdfs dfs -du  /user/hadoop/  # display utilization (size)
hdfs dfs -get /user/hadoop/ /home/ # get from HDFS to local
hdfs dfs -cp /user/hadoop/folderA /user/hadoop/folderB # copy
hdfs fs -rm -r <directory>  # remove

Hive metastore_db initialize

schematool -initSchema -dbType derby # may fail
mv metastore_db metastore_db.tmp #
schematool -initSchema -dbType derby #rerun
show tables;
create table myGod (name string);
hive metastore configuration
add follows to hive-site.xml
<value>/usr/local/apache-hive-2.1.1-bin /iotmp</value> 
Hive use Derty database as default. You may change it to mySQL database by following the above link.