Hadoop is one of the most popular tool or framework for Big Data today. Usually people say that setting up Hadoop is difficult but it isn’t so. So first we need to check whether java is there or not.
$ java -version
It should show something like : java version “1.6.0_65” Java(TM) SE Runtime Environment (build 1.6.0_65-b14-466.1-11M4716) Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-466.1, mixed mode) If you don’t see something like this then you need to install java first.
$ sudo apt-get install default-jre
$ sudo apt-get install openjdk-7-jdk
$ sudo apt-get install openssh-server
$ sudo apt-get install rsync
We need to configure SSH.
$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa): (Press Enter)
Created directory '/home/hduser/.ssh'.
Your identification has been saved in /home/****/.ssh/id_rsa.
Your public key has been saved in /home/****r/.ssh/id_rsa.pub.
The key fingerprint is: 9b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2 hduser@ubuntu T
he key's randomart image is: [.IMAGE]
Then, copy it to authorized key.
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
Now,If it asks for “yes/no” , enter “yes”
$ ssh localhost
Disable ipv6.By appending some lines to sysctl.conf files
$ sudo gedit /etc/sysctl.conf
Append following lines :
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
Restart the machine and run following command:
$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6
It should return 1. Now download Hadoop 2.6.0.tar from following link : http://apache.bytenet.in/hadoop/common/ Now we need to extract it. So go to the directory where you downloaded it.Then,
$ tar -xvf hadoop-2.6.0-src.tar.gz
Rename extracted folder from hadoop-2.6.0 to hadoop. And move it to /usr/local by issuing command
$ cp hadoop /usr/local/
Now we need to modify bashrc file.
$ sudo gedit ~/.bashrc
and append it with following lines :
export HADOOP_HOME=/usr/local/hadoop
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export PATH=$JAVA_HOME/bin:$PATH
And run following command.
$ source ~/.bashrc
Go to hadoop directory.
cd $HADOOP_HOME
Make 3 directories. One for datanode one for namenode and another for temp directory
$ mkdir -p hdfs-data
$ mkdir -p hdfs-site
$ mkdir -p tmp
Now we need to configure several xml files. So go to etc folder in hadoop directory.
$ cd $HADOOP_HOME/etc/hadoop
$ sudo gedit hadoop-env.sh
Replace
# The java implementation to use. Required.
export JAVA_HOME=/usr/lib/j2sdk1.5-sun
To
# The java implementation to use. Required.
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
For Mac Users
export JAVA_HOME=`/usr/libexec/java_home`
Then in core-site.xml file
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000/</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> <description>A base for other temporary directories.</description> </property> </configuration>
Hdfs-site.xml
<configuration> <property> <name>dfs.name.dir</name> <value>/usr/local/hadoop/hdfs-site</value> </property> <property> <name>dfs.data.dir</name> <value>/usr/local/hadoop/hdfs-data</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
That’s It.To start hadoop .
$ cd $HADOOP_HOME
$ ./bin/hadoop namenode -format
$ cd sbin
$ ./start-all.sh
**
**
$jps
2475 NodeManager
2164 DataNode
2386 ResourceManager
2609 Jps
2267 SecondaryNameNode
2078 NameNode
$
P.S. : while executing start-all.sh for the first time it will ask for “yes/no” for rsa key.Enter “yes”.
If you have followed above setup , you don’t need to format namenode everytime you restart your system.
If you want to use hadoop as another user e.g. “hduser” then you need to create one user named “hduser” and give it admin privileges and hadoop folder permission with command “chown -R hduser:hadoop hadoop”.
You can also set more environment and path variables in order to use hadoop from anywhere in the system.For that you need to append some more lines in ./bashrc
sudo gedit ~/.bashrc
Append these lines
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
If you have any queries, you can comment here. I will try my best to solve your problem.