This tutorial provides step by step way to install multinode Hadoop on Ubuntu 10.04.
1. Install Single node Hadoop on all Nodes before starting to follow this tutorial.
2. If you follow my single node tutorial and install Ubuntu 10.04 on VMware Workstation. You can clone your single node machine using method as shown in Figure below.
3. Name the cloned virtual machine as slave. This can be change from System>Administration>User and Groups.
3.3 Format the NameNode
1. Install Single node Hadoop on all Nodes before starting to follow this tutorial.
2. If you follow my single node tutorial and install Ubuntu 10.04 on VMware Workstation. You can clone your single node machine using method as shown in Figure below.
3. Name the cloned virtual machine as slave. This can be change from System>Administration>User and Groups.
1. Networking of Master and Slave
1.1 All nodes should be accessable from each other on the network.
1.2 Add IP address of both master and slave machine in all the machines using following command.
- ifconfig # on all machines to know their IP.
- sudo nano /etc/hosts
Add following lines in the file.
- 192.168.216.135 master
- 192.168.216.136 slave
2. SSH Access
SSH access should be enabled form master to slave so that jobs can be transfered from master to slave and vice versa.
2.1 Add RSA key of slave to authorized users using following command.
- su - hduser
- ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@slave
Now check the working of SSH by following commands.
- ssh master
- ssh slave
3. Configuration of Hadoop
3.1 Add IP and name of masters in /conf/master file of master node only using following command.
- su - hduser
- cd /home/hadoop/conf
- nano masters
- master
- nano slaves
- master
- slave
3.2 Change different parameter files of hadoop files on all machines as follows, as changed in single node installation.
In conf/core-site.xml:
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value> Only line which require change.
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
In conf/mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value> Only Line require the change
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
In conf/hdfs-site.xml:
set replication value to 2 which was 1 eariler.
- /home/hadoop/bin/hadoop namenode -format
3.4 If namenode donot get formatted delete temp folder of hadoop
- cd /app/hadoop/tmp
- rm -R dfs
- rm -R mapred
4. Starting the Multi node Hadoop
4.1 Run the command
bin/start-dfs.sh
on the machine you want the (primary) NameNode to run on. This will bring up HDFS with the NameNode running on the machine you ran the previous command on, and DataNodes on the machines listed in the conf/slaves
file.
4.2 Run the command
bin/start-mapred.sh
on the machine you want the JobTracker to run on. This will bring up the MapReduce cluster with the JobTracker running on the machine you ran the previous command on, and TaskTrackers on the machines listed in the conf/slaves
file.