In this tutorial i will give you line by line instruction on how to setup hadoop on single node. First we will install hadoop on single node then move to multiple node in coming tutorials.
* Open system control file in nano editor.
* Restart the system for settings to take effect. Check if IPv6 is disabled by following command. Output should be 1 when Ipv6 is disabled.
# Set Hadoop-related environment variables
export HADOOP_HOME=/home/hadoop
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
# Some convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"
# If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
#
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
# Requires installed 'lzop' command.
#
lzohead () {
hadoop fs -cat $1 | lzop -dc | head -1000 | less
}
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
In file conf/core-site.xml.
Add following between configuration tags:
Output will look like:
10. Start the single node cluster
Output will look like:
Pre-Requirements:
1. Java
* Update the system files using following command.- sudo add-apt-repository ppa:webupd8team/java
- sudo apt-get update
- sudo apt-get install oracle-java7-installer
- java -version
2. Python environment.
Setup Python environment using following command.- sudo apt-get install python-software-properties
3. Create a new user for Hadoop.
Create a new user for hadoop. We will provide all permission to this user so that our hadoop can be secure.- sudo addgroup hadoop
- sudo adduser --ingroup hadoop hduser
4. Configure SSH
SSH is required for hadoop to communicate between master and slave nodes. Configure it using following commands- sudo apt-get install openssh-server
- su – hduser
- ssh-keygen -t rsa -P “”
- cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
- ssh localhost
5. Disable IPV6
Sometime due to IPV6 configuration master node of hadoop is not able to communicate with slaves. It is better to disable IPV6 configuration using System Control file by following command.* Open system control file in nano editor.
- sudo nano /etc/sysctl.conf
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1net.ipv6.conf.default.disable_ipv6 = 1net.ipv6.conf.lo.disable_ipv6 = 1
* Restart the system for settings to take effect. Check if IPv6 is disabled by following command. Output should be 1 when Ipv6 is disabled.
- cat /proc/sys/net/ipv6/conf/all/disable_ipv6
All pre-requirements are complete.
We start with installation of hadoop on single node cluster.
1. Download hadoop from here.
* Extract the tar file. Rename it to hadoop and save on Desktop.
* Move it to /home folder so that it canbe used by slave nodes also.
- cd Desktop
- sudo mv hadoop /home
- cd /home
- sudo chown -R hduser:hadoop /home/hadoop
- su - hduser
- nano $HOME/.bashrc
# Set Hadoop-related environment variables
export HADOOP_HOME=/home/hadoop
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
# Some convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"
# If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
#
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
# Requires installed 'lzop' command.
#
lzohead () {
hadoop fs -cat $1 | lzop -dc | head -1000 | less
}
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
2. Configure Hadoop script file
* The only required environment variable we have to configure for Hadoop in this tutorial is Java. Follow following commands.
- su - hduser
- cd /home/hadoop/conf
- nano hadoop-env.sh
- # The java implementation to use. Required.
- export JAVA_HOME=/usr/lib/jvm/java-7–oracle
A temp folder is created so that hadoop can store its temporary files in this folder.
- exit
- sudo mkdir -p /app/hadoop/tmp
- sudo chown hduser:hadoop /app/hadoop/tmp
4. Set Configuration files
All hadoop configuration files are set in this section.Add the following snippets between the<configuration> ... </configuration>
tags in the respective configuration XML file.In file conf/core-site.xml.
- su - hduser
- cd /home/hadoop/conf
- nano core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
In file
conf/mapred-site.xml
:
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
In file
conf/hdfs-site.xml
:
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
9. Formatting the HDFS filesystem via the NameNode
- /home/hadoop/bin/hadoop namenode -format
10. Start the single node cluster
- /home/hadoop/bin/start-all.sh
Hadoop is running on this machine check its working using following command.
- cd /home/hadoop
- jps
Above screen shows port where hadoop API's are running.
No comments:
Post a Comment