Hadoop 2.6.0 Single Node Setup (pseudo-distributed mode)

Hadoop is one of the most popular tool or framework for Big Data today. Usually people say that setting up Hadoop is difficult but it isn’t so. So first we need to check whether java is there or not.


$ java -version

It should show something like : java version “1.6.0_65” Java(TM) SE Runtime Environment (build 1.6.0_65-b14-466.1-11M4716) Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-466.1, mixed mode) If you don’t see something like this then you need to install java first.


$ sudo apt-get install default-jre
$ sudo apt-get install openjdk-7-jdk
$ sudo apt-get install openssh-server
$ sudo apt-get install rsync

We need to configure SSH.


$ ssh-keygen -t rsa -P ""

Generating public/private rsa key pair. 

Enter file in which to save the key (/home/hduser/.ssh/id_rsa): (Press Enter)

Created directory '/home/hduser/.ssh'. 

Your identification has been saved in /home/****/.ssh/id_rsa. 

Your public key has been saved in /home/****r/.ssh/id_rsa.pub.

The key fingerprint is: 9b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2 hduser@ubuntu T

he key's randomart image is: [.IMAGE]

Then, copy it to authorized key.


$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Now,If it asks for “yes/no” , enter “yes”


$ ssh localhost

Disable ipv6.By appending some lines to sysctl.conf files


$ sudo gedit /etc/sysctl.conf

Append following lines :


net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1 
net.ipv6.conf.lo.disable_ipv6 = 1

Restart the machine and run following command:


$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6

It should return 1. Now download Hadoop 2.6.0.tar from following link : http://apache.bytenet.in/hadoop/common/ Now we need to extract it. So go to the directory where you downloaded it.Then,

$ tar -xvf hadoop-2.6.0-src.tar.gz 

Rename extracted folder from hadoop-2.6.0 to hadoop. And move it to /usr/local by issuing command


$ cp hadoop /usr/local/

Now we need to modify bashrc file.


$ sudo gedit ~/.bashrc

and append it with following lines :

export HADOOP_HOME=/usr/local/hadoop
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export PATH=$JAVA_HOME/bin:$PATH 

And run following command.

$ source ~/.bashrc

Go to hadoop directory.

cd $HADOOP_HOME

Make 3 directories. One for datanode one for namenode and another for temp directory


$ mkdir -p hdfs-data
$ mkdir -p hdfs-site
$ mkdir -p tmp

Now we need to configure several xml files. So go to etc folder in hadoop directory.

$ cd $HADOOP_HOME/etc/hadoop
$ sudo gedit hadoop-env.sh

Replace


# The java implementation to use. Required. 

 export JAVA_HOME=/usr/lib/j2sdk1.5-sun

To


# The java implementation to use. Required.
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

For Mac Users


export JAVA_HOME=`/usr/libexec/java_home`

Then in core-site.xml file

<configuration>
 <property>
 <name>fs.default.name</name>
 <value>hdfs://localhost:9000/</value>
 </property>
 <property>
 <name>dfs.replication</name>
 <value>1</value>
 </property>

 <property>
 <name>hadoop.tmp.dir</name>
 <value>/usr/local/hadoop/tmp</value>
 <description>A base for other temporary directories.</description>
 </property>
 
</configuration>

Hdfs-site.xml

 <configuration>
 <property>
 <name>dfs.name.dir</name>
 <value>/usr/local/hadoop/hdfs-site</value>
 </property>

 <property>
 <name>dfs.data.dir</name>
 <value>/usr/local/hadoop/hdfs-data</value>
 </property>


 <property>
 <name>dfs.replication</name>
 <value>1</value>
 </property>

 
</configuration>

yarn-site.xml

<configuration> 
<property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
 </property>
</configuration>

That’s It.To start hadoop .

$ cd $HADOOP_HOME
$ ./bin/hadoop namenode -format
$ cd sbin
$ ./start-all.sh
**
**
$jps
2475 NodeManager
2164 DataNode
2386 ResourceManager
2609 Jps
2267 SecondaryNameNode
2078 NameNode

$

 P.S. : while executing start-all.sh for the first time it will ask for “yes/no” for rsa key.Enter “yes”.

If you have followed above setup , you don’t need to format namenode everytime you restart your system.

If you want to use hadoop as another user e.g. “hduser” then you need to create one user named “hduser” and give it admin privileges and hadoop folder permission with command “chown -R hduser:hadoop hadoop”.

You can also set more environment and path variables in order to use hadoop from anywhere in the system.For that you need to append some more lines in ./bashrc


sudo gedit ~/.bashrc

Append these lines



export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

If you have any queries, you can comment here. I will try my best to solve your problem.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s