In this tutorial I am going to explain the steps required to install and run Hadoop on a MacBook Pro.
In this Demo I am using Mac OS X version 10.6.8.
STEP 1 : Preparing Environment
The first step in Hadoop installation is to setup the environment like Java, ssh connectivity etc. We will go through each of the required setup in detail.
Hadoop is written in Java and requires Java 1.6 or higher for Hadoop installation. Mac comes with Java and you need to make sure that you have the required Java version installed on.
Open the terminal and type the below.
Jinesh-Mathews-MacBook-Pro:~ jineshmathew$ java -version
java version "1.6.0_65"
Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-10M4609)
Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode)
If java is not installed or the version is below 1.6 then you need to download and install java.Please see the below note.
For Java versions 6 and below, Apple supplies their own version of Java. For Mac OS X 10.6 and below, use the Software Update feature (available on the Apple menu) to check that you have the most up-to-date version of Java 6 for your Mac. For issues related to Apple Java 6 on Mac, contact Apple Support. Oracle and Java.com only support Java 7 and later, and only on 64 bit systems.
Make sure that ssh is installed on your machine. By default Mac has ssh installed. Make sure that by running below commands.
Go to System Preferences->Sharing and enable Remote Login and try again.
At the time of this tutorial I downloaded version hadoop-2.5.1.tar.gz
STEP 1 : Preparing Environment
The first step in Hadoop installation is to setup the environment like Java, ssh connectivity etc. We will go through each of the required setup in detail.
Hadoop is written in Java and requires Java 1.6 or higher for Hadoop installation. Mac comes with Java and you need to make sure that you have the required Java version installed on.
Open the terminal and type the below.
Jinesh-Mathews-MacBook-Pro:~ jineshmathew$ java -version
java version "1.6.0_65"
Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-10M4609)
Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode)
If java is not installed or the version is below 1.6 then you need to download and install java.Please see the below note.
For Java versions 6 and below, Apple supplies their own version of Java. For Mac OS X 10.6 and below, use the Software Update feature (available on the Apple menu) to check that you have the most up-to-date version of Java 6 for your Mac. For issues related to Apple Java 6 on Mac, contact Apple Support. Oracle and Java.com only support Java 7 and later, and only on 64 bit systems.
Make sure that ssh is installed on your machine. By default Mac has ssh installed. Make sure that by running below commands.
Jinesh-Mathews-MacBook-Pro:~ jineshmathew$ which ssh
/usr/bin/ssh
Jinesh-Mathews-MacBook-Pro:~ jineshmathew$ which ssh-keygen
/usr/bin/ssh-keygen
Now make sure that ssh actually works by running below command.
Jinesh-Mathews-MacBook-Pro:~ jineshmathew$ ssh localhost
Last login: Tue Nov 18 10:29:01 2014
There is a good chance that it would not work in the first attempt. Then try one or more of the following to solve the issue.
Go to System Preferences->Sharing and enable Remote Login and try again.
If above doesn't work then you need to generate ssh keys as below.
Run the below command to create public and private keys. Make sure that you DON'T give any passphrase when it is prompted.
Jinesh-Mathews-MacBook-Pro:~ jineshmathew$ssh-keygen -t rsa
Now the public key will be available at ~/.ssh/id_rsa.pub and private key will be available at ~/.ssh/id_rsa
Copy public key to authorized_keys file by running below command.
Jinesh-Mathews-MacBook-Pro:.ssh jineshmathew$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Now try ssh to localhost.
Jinesh-Mathews-MacBook-Pro:~ jineshmathew$ ssh localhost
Last login: Tue Nov 18 10:29:01 2014
STEP 2 : Installing Hadoop
Get latest version of Hadoop from below URL.
Get latest version of Hadoop from below URL.
At the time of this tutorial I downloaded version hadoop-2.5.1.tar.gz
Run following command to extract Hadoop. Assuming hadoop-2.5.1.tar.gz in home directory.
Jinesh-Mathews-MacBook-Pro:~ jineshmathew$ tar -xzvf hadoop-2.5.1.tar.gz
A directory hadoop-2.5.1 has been created. Run below command to verify.
Jinesh-Mathews-MacBook-Pro:~ jineshmathew$ cd hadoop-2.5.1
Jinesh-Mathews-MacBook-Pro:hadoop-2.5.1 jineshmathew$ ls -ltr total 48 drwxr-xr-x@ 4 jineshmathew staff 136 Sep 5 18:30 share drwxr-xr-x@ 3 jineshmathew staff 102 Sep 5 18:30 lib drwxr-xr-x@ 3 jineshmathew staff 102 Sep 5 18:30 etc -rw-r--r--@ 1 jineshmathew staff 1366 Sep 5 18:30 README.txt -rw-r--r--@ 1 jineshmathew staff 101 Sep 5 18:30 NOTICE.txt -rw-r--r--@ 1 jineshmathew staff 15458 Sep 5 18:30 LICENSE.txt drwxr-xr-x@ 29 jineshmathew staff 986 Sep 5 18:30 sbin drwxr-xr-x@ 11 jineshmathew staff 374 Sep 5 18:30 libexec drwxr-xr-x@ 7 jineshmathew staff 238 Sep 5 18:30 include drwxr-xr-x@ 13 jineshmathew staff 442 Sep 5 18:30 bin drwxr-xr-x 9 jineshmathew staff 306 Nov 18 10:37 logs
Jinesh-Mathews-MacBook-Pro:hadoop-2.5.1 jineshmathew$ ls -ltr total 48 drwxr-xr-x@ 4 jineshmathew staff 136 Sep 5 18:30 share drwxr-xr-x@ 3 jineshmathew staff 102 Sep 5 18:30 lib drwxr-xr-x@ 3 jineshmathew staff 102 Sep 5 18:30 etc -rw-r--r--@ 1 jineshmathew staff 1366 Sep 5 18:30 README.txt -rw-r--r--@ 1 jineshmathew staff 101 Sep 5 18:30 NOTICE.txt -rw-r--r--@ 1 jineshmathew staff 15458 Sep 5 18:30 LICENSE.txt drwxr-xr-x@ 29 jineshmathew staff 986 Sep 5 18:30 sbin drwxr-xr-x@ 11 jineshmathew staff 374 Sep 5 18:30 libexec drwxr-xr-x@ 7 jineshmathew staff 238 Sep 5 18:30 include drwxr-xr-x@ 13 jineshmathew staff 442 Sep 5 18:30 bin drwxr-xr-x 9 jineshmathew staff 306 Nov 18 10:37 logs
For ease of use I am going to export 3 environment variables and I will add them to ~/.bash_profile so that every time they will be set.
Jinesh-Mathews-MacBook-Pro:hadoop-2.5.1 jineshmathew$ vi ~/.bash_profile
export JAVA_HOME=/Library/Java/Home
export HADOOP_HOME=/Users/jineshmathew/hadoop-2.5.1
export PATH=$PATH:$HADOOP_HOME
Now try running hadoop
This hadoop job is to count the pattern 'dfs[a-z]+' on all files available in the input/hadoop directory.
Once the job is finished then you can check the output by running following command.
Hope you have a great installation of Hadoop and understood some concepts on the way. More tutorials will be uploaded soon. Please dont forget to post your comments and questions.
Jinesh-Mathews-MacBook-Pro:~ jineshmathew$ cd $HADOOP_HOME
Jinesh-Mathews-MacBook-Pro:hadoop-2.5.1 jineshmathew$ bin/hadoop version
Hadoop 2.5.1
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 2e18d179e4a8065b6a9f29cf2d
e9451891265cce
Compiled by jenkins on 2014-09-05T23:11Z
Compiled with protoc 2.5.0
From source with checksum 6424fcab95bfff8337780a181ad7c78
This command was run using /Users/jineshmathew/hadoop-2.5.1/share/hadoop/common/hadoo
p-common-2.5.1.jar
STEP 3 : Hadoop Configuration
Now we need to configure Hadoop to run one of the three supported modes.
- Standalone mode
- Pseudo-distributed mode
- Fully distributed mod
For learning I prefer to have at least pseudo-distributed mode which kind of simulates a hadoop cluster with just one nameNode and one dataNode. This also has a secondaryNameNode all in one machine which is our Mac.
Edit etc/hadoop/core-site.xml to have the following.
<configuration>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Edit etc/hadoop/hdfs-site.xml to have the following.
<configuration>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
STEP 4 : Running Hadoop
- Now we need to format HDFS which is the file system for hadoop by running the following command.
$ bin/hdfs namenode -format
- Start nameNode and dataNode Daemon
$ sbin/start-dfs.sh
- Now create following directories inside HDFS.
$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/<username>
$ bin/hdfs dfs -mkdir /user/<username>/input
- Check the directory by running the following command
$ bin/hdfs dfs -ls /user/jineshmathew/input
- Now Copy some files from local file system to HDFS.
$ bin/hdfs dfs -put etc/hadoop /user/jineshmathew/input
- Now it is time to run some of the examples provided by Hadoop.
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar grep input/hadoop output 'dfs[a-z.]+'
This hadoop job is to count the pattern 'dfs[a-z]+' on all files available in the input/hadoop directory.
Once the job is finished then you can check the output by running following command.
$ bin/hdfs dfs -cat /user/jineshmathew/output/*
14/11/18 10:54:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
6 dfs.audit.logger
4 dfs.class
3 dfs.server.namenode.
2 dfs.period
2 dfs.audit.log.maxfilesize
2 dfs.audit.log.maxbackupindex
1 dfsmetrics.log
1 dfsadmin
1 dfs.servers
1 dfs.replication
1 dfs.file
- You can also browse the web interface for nameNode using URL: http://localhost:50070/
- At any time check all hadoop daemons by running jps command
$ jps
592 SecondaryNameNode
424 NameNode
495 DataNode
1798 Jps
- Once you are done then stop hadoop by running following command.
Hope you have a great installation of Hadoop and understood some concepts on the way. More tutorials will be uploaded soon. Please dont forget to post your comments and questions.
Love this blog entry, thanks for sharing!
ReplyDelete