Hadoop Tutorial - How to install pseudo-distributed Hadoop cluster on a Mac Book Pro

In this tutorial I am going to explain the steps required to install and run Hadoop on a MacBook Pro. In this Demo I am using Mac OS X version 10.6.8.

STEP 1 : Preparing Environment

The first step in Hadoop installation is to setup the environment like Java, ssh connectivity etc. We will go through each of the required setup in detail.

Hadoop is written in Java and requires Java 1.6 or higher for Hadoop installation. Mac comes with Java and you need to make sure that you have the required Java version installed on.

Open the terminal and type the below.

Jinesh-Mathews-MacBook-Pro:~ jineshmathew$ java -version
java version "1.6.0_65"
Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-10M4609)
Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode)

If java is not installed or the version is below 1.6 then you need to download and install java.Please see the below note.

For Java versions 6 and below, Apple supplies their own version of Java. For Mac OS X 10.6 and below, use the Software Update feature (available on the Apple menu) to check that you have the most up-to-date version of Java 6 for your Mac. For issues related to Apple Java 6 on Mac, contact Apple Support. Oracle and Java.com only support Java 7 and later, and only on 64 bit systems.

Make sure that ssh is installed on your machine. By default Mac has ssh installed. Make sure that by running below commands.

Jinesh-Mathews-MacBook-Pro:~ jineshmathew$ which ssh

/usr/bin/ssh

Jinesh-Mathews-MacBook-Pro:~ jineshmathew$ which ssh-keygen

/usr/bin/ssh-keygen

Now make sure that ssh actually works by running below command.

Jinesh-Mathews-MacBook-Pro:~ jineshmathew$ ssh localhost

Last login: Tue Nov 18 10:29:01 2014

There is a good chance that it would not work in the first attempt. Then try one or more of the following to solve the issue.

Go to System Preferences->Sharing and enable Remote Login and try again.

If above doesn't work then you need to generate ssh keys as below.

Run the below command to create public and private keys. Make sure that you DON'T give any passphrase when it is prompted.

Jinesh-Mathews-MacBook-Pro:~ jineshmathew$ssh-keygen -t rsa

Now the public key will be available at ~/.ssh/id_rsa.pub and private key will be available at ~/.ssh/id_rsa

Copy public key to authorized_keys file by running below command.

Jinesh-Mathews-MacBook-Pro:.ssh jineshmathew$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Now try ssh to localhost.

Jinesh-Mathews-MacBook-Pro:~ jineshmathew$ ssh localhost

Last login: Tue Nov 18 10:29:01 2014

STEP 2 : Installing Hadoop

Get latest version of Hadoop from below URL.

http://www.apache.org/dyn/closer.cgi/hadoop/common/

At the time of this tutorial I downloaded version hadoop-2.5.1.tar.gz

Run following command to extract Hadoop. Assuming hadoop-2.5.1.tar.gz in home directory.

Jinesh-Mathews-MacBook-Pro:~ jineshmathew$ tar -xzvf hadoop-2.5.1.tar.gz

A directory hadoop-2.5.1 has been created. Run below command to verify.

Jinesh-Mathews-MacBook-Pro:~ jineshmathew$ cd hadoop-2.5.1
Jinesh-Mathews-MacBook-Pro:hadoop-2.5.1 jineshmathew$ ls -ltr total 48 drwxr-xr-x@ 4 jineshmathew staff 136 Sep 5 18:30 share drwxr-xr-x@ 3 jineshmathew staff 102 Sep 5 18:30 lib drwxr-xr-x@ 3 jineshmathew staff 102 Sep 5 18:30 etc -rw-r--r--@ 1 jineshmathew staff 1366 Sep 5 18:30 README.txt -rw-r--r--@ 1 jineshmathew staff 101 Sep 5 18:30 NOTICE.txt -rw-r--r--@ 1 jineshmathew staff 15458 Sep 5 18:30 LICENSE.txt drwxr-xr-x@ 29 jineshmathew staff 986 Sep 5 18:30 sbin drwxr-xr-x@ 11 jineshmathew staff 374 Sep 5 18:30 libexec drwxr-xr-x@ 7 jineshmathew staff 238 Sep 5 18:30 include drwxr-xr-x@ 13 jineshmathew staff 442 Sep 5 18:30 bin drwxr-xr-x 9 jineshmathew staff 306 Nov 18 10:37 logs

For ease of use I am going to export 3 environment variables and I will add them to ~/.bash_profile so that every time they will be set.

Jinesh-Mathews-MacBook-Pro:hadoop-2.5.1 jineshmathew$ vi ~/.bash_profile export JAVA_HOME=/Library/Java/Home export HADOOP_HOME=/Users/jineshmathew/hadoop-2.5.1 export PATH=$PATH:$HADOOP_HOME

Now try running hadoop

Jinesh-Mathews-MacBook-Pro:~ jineshmathew$ cd $HADOOP_HOME Jinesh-Mathews-MacBook-Pro:hadoop-2.5.1 jineshmathew$ bin/hadoop version Hadoop 2.5.1 Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 2e18d179e4a8065b6a9f29cf2d

e9451891265cce Compiled by jenkins on 2014-09-05T23:11Z Compiled with protoc 2.5.0 From source with checksum 6424fcab95bfff8337780a181ad7c78 This command was run using /Users/jineshmathew/hadoop-2.5.1/share/hadoop/common/hadoo

p-common-2.5.1.jar

STEP 3 : Hadoop Configuration

Now we need to configure Hadoop to run one of the three supported modes.

Standalone mode

Pseudo-distributed mode

Fully distributed mod

For learning I prefer to have at least pseudo-distributed mode which kind of simulates a hadoop cluster with just one nameNode and one dataNode. This also has a secondaryNameNode all in one machine which is our Mac.

Edit etc/hadoop/core-site.xml to have the following.
<configuration>

        <property>
              <name>fs.defaultFS</name>
              <value>hdfs://localhost:9000</value>
        </property>
     </configuration>

Edit etc/hadoop/hdfs-site.xml to have the following.
<configuration>

         <property>
               <name>dfs.replication</name>
               <value>1</value>
         </property>

</configuration>

STEP 4 : Running Hadoop

Now we need to format HDFS which is the file system for hadoop by running the following command.

$ bin/hdfs namenode -format

Start nameNode and dataNode Daemon

$ sbin/start-dfs.sh

Now create following directories inside HDFS.

$ bin/hdfs dfs -mkdir /user

           $ bin/hdfs dfs -mkdir /user/<username>

           $ bin/hdfs dfs -mkdir /user/<username>/input

Check the directory by running the following command

$ bin/hdfs dfs -ls /user/jineshmathew/input

Now Copy some files from local file system to HDFS.

$ bin/hdfs dfs -put etc/hadoop /user/jineshmathew/input

Now it is time to run some of the examples provided by Hadoop.

$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar grep input/hadoop output 'dfs[a-z.]+'

This hadoop job is to count the pattern 'dfs[a-z]+' on all files available in the input/hadoop directory.

Once the job is finished then you can check the output by running following command.

$ bin/hdfs dfs -cat /user/jineshmathew/output/*

14/11/18 10:54:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

6 dfs.audit.logger

4 dfs.class

3 dfs.server.namenode.

2 dfs.period

2 dfs.audit.log.maxfilesize

2 dfs.audit.log.maxbackupindex

1 dfsmetrics.log

1 dfsadmin

1 dfs.servers

1 dfs.replication

1 dfs.file

You can also browse the web interface for nameNode using URL: http://localhost:50070/
At any time check all hadoop daemons by running jps command

$ jps

592 SecondaryNameNode

424 NameNode

495 DataNode

1798 Jps

Once you are done then stop hadoop by running following command.

$ sbin/stop-dfs.sh

Hope you have a great installation of Hadoop and understood some concepts on the way. More tutorials will be uploaded soon. Please dont forget to post your comments and questions.

How to format and install macOS in your old Macbook/ iMac

You can follow these steps to install a mac OS on an old Mac book following these steps. Here I assume that you have the actual bootable CD for the OS for installation. 1. Restart the laptop 2. Press Command + R key until it shows recovery mode 3. Open Disk Utilities 4. Select the hard drive and try to partition the drive. For example I have created a partition called Partition1 5. Insert bootable CD and restart the laptop. When option comes choose to boot from the CD. 6. Choose partition1 as the place to install the OS 7. Continue the installation process. 8. Once installation is completed then it might need to restart for further updates. 9. Most of the times a more recent compatible version of the OS might be available. In order to upgrade to the more latest compatible OS follow below steps. 11. Find the latest compatible version of OS. 12. Go to apple support sites and manually download the image and click to install. 13. Follow installation instructions and this would upgrade...

developer-tips

Search This Blog

Hadoop Tutorial - How to install pseudo-distributed Hadoop cluster on a Mac Book Pro

Comments

Post a Comment

Popular posts from this blog

How to create a minikube single node cluster for learning Kubernetes

log4j - How to write log to multiple log files using log4j.properties

How to format and install macOS in your old Macbook/ iMac