Skip to main content

How to write MapReduce program using eclipse

Make sure that your Hadoop installtion is working fine on your machine. For this tutorial I have installed Hadoop on my MacBook Pro using the below guidelines.

Hadoop Installation guidelines

Install Eclipse IDE on the machine. I have installed Eclipse Version: Luna Service Release 1 (4.4.1).

Create a new project using File->New->Project

Set java Build Path by using Project->Properties->Java Build Path->Libraries->Add External JARs

Assuming Hadoop is installed on /Users/jineshmathew/hadoop-2.5.1, select all JARs in below folders

/Users/jineshmathew/hadoop-2.5/share/hadoop/common/lib/
/Users/jineshmathew/hadoop-2.5/share/hadoop/yarn/lib/
/Users/jineshmathew/hadoop-2.5/share/hadoop/mapreduce/lib
/Users/jineshmathew/hadoop-2.5/share/hadoop/hdfs/lib/

Now we need to create input data, I have created a file emp.dat with the below employee data.

emp.dat
-------------------------------
Tom,Developer,IL,80000
Jose,Architect,OH,100000
Bill,Director,OH,130000
Bill,Director,OH,140000
Matt,Architect,IL,110000
Tom,Developer,IL,90000

Copy the input data to HDFS using the below command.

bin/hdfs dfs -put emp.dat /user/jineshmathew/input

The MapReduce program we are about to write will find average salary for each Job title.

Now we need to create 3 class files for Mapper, Reducer and Driver class.


AverageDriver.java
 import org.apache.hadoop.conf.Configuration;  
 import org.apache.hadoop.fs.Path;  
 import org.apache.hadoop.io.Text;  
 import org.apache.hadoop.mapreduce.Job;  
 import org.apache.hadoop.mapreduce.Mapper;  
 import org.apache.hadoop.mapreduce.Reducer;  
 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;  
 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;  
 public class AverageDriver {  
      public static void main(String[] args) throws Exception {  
           Configuration conf = new Configuration();  
           Job job = Job.getInstance(conf, "EmpJob");  
           job.setJarByClass(AverageDriver.class);  
           // TODO: specify a mapper  
           job.setMapperClass(AvgMapper.class);  
           // TODO: specify a reducer  
           job.setReducerClass(AvgReducer.class);  
           //job.setCombinerClass(AvgCombiner.class);  
           // TODO: specify output types  
           job.setOutputKeyClass(Text.class);  
           job.setOutputValueClass(Text.class);  
           // TODO: specify input and output DIRECTORIES (not files)  
           FileInputFormat.setInputPaths(job, new Path("input"));  
           FileOutputFormat.setOutputPath(job, new Path("output"));  
           if (!job.waitForCompletion(true))  
                return;  
      }  
 }  


AvgMapper.java

 import java.io.IOException;  
 import java.util.Arrays;  
 import org.apache.hadoop.io.LongWritable;  
 import org.apache.hadoop.io.Text;  
 import org.apache.hadoop.mapreduce.Mapper;  
 public class AvgMapper extends Mapper<LongWritable, Text, Text, Text> {  
      public void map(LongWritable ikey, Text ivalue, Context context)  
                throws IOException, InterruptedException {  
           String[] fields = ivalue.toString().split(",");  
           if(fields.length >=2)  
                context.write(new Text(fields[1]), new Text(fields[3]));  
      }  
 }  


AvgReducer.java

 import java.io.IOException;  
 import org.apache.hadoop.io.Text;  
 import org.apache.hadoop.mapreduce.Reducer;  
 public class AvgReducer extends Reducer<Text, Text, Text, Text> {  
      public void reduce(Text _key, Iterable<Text> values, Context context)  
                throws IOException, InterruptedException {  
           // process values  
           double sum =0.0;  
           int count=0;  
           for (Text val : values) {  
                sum += new Double(val.toString());  
                count++;  
           }  
           context.write(_key, new Text(new Double(sum/count).toString()));  
      }  
 }  

Now create a JAR by Right clicking the project ->export->

Make sure that no errors and the JAR is created. I have named the JAR as average1.jar

Once the JAR is ready then we can run the Map Reduce Job by running the below command from HADOOP_HOME.

bin/hadoop jar average1.jar AverageDriver

Make sure that no errors are written to console and no exceptions. Once the Job is done then we can check the output in HDFS output directory mentioned by using below command.

bin/hdfs dfs -cat /user/jineshmathew/output/*

For emp.dat that we created will generate the following output.

Architect 105000.0
Developer 85000.0
Director 135000.0

Now you are done with the Map Reduce program :). Please let me know if any questions or issues.

Comments

  1. JackpotCity Casino Review 2021 - DrmCad
    JackpotCity 안양 출장마사지 casino 경주 출장안마 review from experts. Get information 서산 출장샵 on bonuses, banking, customer support, games, 공주 출장마사지 banking, bonus policy 평택 출장샵 and more. Rating: 8.3/10 · ‎Review by DrmCad

    ReplyDelete

Post a Comment

Popular posts from this blog

How to format and install macOS in your old Macbook/ iMac

 You can follow these steps to install a mac OS on an old Mac book following these steps. Here I assume that you have the actual bootable CD for the OS for installation. 1. Restart the laptop 2. Press Command + R key until it shows recovery mode 3. Open Disk Utilities 4. Select the hard drive and try to partition the drive. For example I have created a partition called Partition1 5. Insert bootable CD and restart the laptop. When option comes choose to boot from the CD. 6. Choose partition1 as the place to install the OS 7. Continue the installation process. 8. Once installation is completed then it might need to restart for further updates. 9. Most of the times a more recent compatible version of the OS might be available. In order to upgrade to the more latest compatible OS follow below steps. 11. Find the latest compatible version of OS. 12. Go to apple support sites and manually download the image and click to install. 13. Follow installation instructions and this would upgrade you

How to create a minikube single node cluster for learning Kubernetes

In this post I will explain how to setup a minikube single node kubernetes cluster using AWS EC2 instance which would help anyone who is trying to learn kubernetes and also help them to gain practical knowledge in kubernetes by running kubernetes commands, creating kubernetes objects etc. Minikube is a single node kubernetes cluster which means a kubernetes cluster with only one node that is a single VM. Minikube is only used for learning purposes and it is not an alternative for a real kubernetes cluster and should not be used for development and production usage. In this example I have launched an AWS EC2 instance with below configuration where I will install minikube and related tools. AWS EC2 Instance Configuration AMI: Ubuntu Free tier eligible 64 bit Instance type : t2-large ( For me t2-small or t2-micro is giving performance issues due to less memory) Once the EC2 instance is up and running, login to the instance using below command on terminal. If you are using wi

log4j - How to write log to multiple log files using log4j.properties

In Java applications some times you may need to write your log messages to specific log files with its own specific log properties. If you are using log4j internally then first step that you need to do is to have a proper log4j.properties file. Below example shows 2 log4j appenders which write to 2 different log files, one is a debug log and another one is a reports log. Debug log file can have all log messages and reports log can have log messages specific to reporting on say splunk monitoring. # Root logger option log4j.rootLogger=ALL,STDOUT,debugLog log4j.logger.reportsLogger=INFO,reportsLog log4j.additivity.reportsLogger=false     log4j.appender.STDOUT=org.apache.log4j.ConsoleAppender log4j.appender.STDOUT.layout=org.apache.log4j.PatternLayout log4j.appender.STDOUT.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %C:%L - %m%n     # Direct log messages to a log file log4j.appender.debugLog=org.apache.log4j.RollingFileAppender log4j.appender.debugLo