How to write MapReduce program using eclipse

Make sure that your Hadoop installtion is working fine on your machine. For this tutorial I have installed Hadoop on my MacBook Pro using the below guidelines.

Hadoop Installation guidelines

Install Eclipse IDE on the machine. I have installed Eclipse Version: Luna Service Release 1 (4.4.1).

Create a new project using File->New->Project

Set java Build Path by using Project->Properties->Java Build Path->Libraries->Add External JARs

Assuming Hadoop is installed on /Users/jineshmathew/hadoop-2.5.1, select all JARs in below folders

/Users/jineshmathew/hadoop-2.5/share/hadoop/common/lib/
/Users/jineshmathew/hadoop-2.5/share/hadoop/yarn/lib/
/Users/jineshmathew/hadoop-2.5/share/hadoop/mapreduce/lib
/Users/jineshmathew/hadoop-2.5/share/hadoop/hdfs/lib/

Now we need to create input data, I have created a file emp.dat with the below employee data.

emp.dat
-------------------------------
Tom,Developer,IL,80000
Jose,Architect,OH,100000
Bill,Director,OH,130000
Bill,Director,OH,140000
Matt,Architect,IL,110000
Tom,Developer,IL,90000

Copy the input data to HDFS using the below command.

bin/hdfs dfs -put emp.dat /user/jineshmathew/input

The MapReduce program we are about to write will find average salary for each Job title.

Now we need to create 3 class files for Mapper, Reducer and Driver class.

AverageDriver.java

 import org.apache.hadoop.conf.Configuration;  
 import org.apache.hadoop.fs.Path;  
 import org.apache.hadoop.io.Text;  
 import org.apache.hadoop.mapreduce.Job;  
 import org.apache.hadoop.mapreduce.Mapper;  
 import org.apache.hadoop.mapreduce.Reducer;  
 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;  
 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;  
 public class AverageDriver {  
      public static void main(String[] args) throws Exception {  
           Configuration conf = new Configuration();  
           Job job = Job.getInstance(conf, "EmpJob");  
           job.setJarByClass(AverageDriver.class);  
           // TODO: specify a mapper  
           job.setMapperClass(AvgMapper.class);  
           // TODO: specify a reducer  
           job.setReducerClass(AvgReducer.class);  
           //job.setCombinerClass(AvgCombiner.class);  
           // TODO: specify output types  
           job.setOutputKeyClass(Text.class);  
           job.setOutputValueClass(Text.class);  
           // TODO: specify input and output DIRECTORIES (not files)  
           FileInputFormat.setInputPaths(job, new Path("input"));  
           FileOutputFormat.setOutputPath(job, new Path("output"));  
           if (!job.waitForCompletion(true))  
                return;  
      }  
 }

AvgMapper.java

 import java.io.IOException;  
 import java.util.Arrays;  
 import org.apache.hadoop.io.LongWritable;  
 import org.apache.hadoop.io.Text;  
 import org.apache.hadoop.mapreduce.Mapper;  
 public class AvgMapper extends Mapper<LongWritable, Text, Text, Text> {  
      public void map(LongWritable ikey, Text ivalue, Context context)  
                throws IOException, InterruptedException {  
           String[] fields = ivalue.toString().split(",");  
           if(fields.length >=2)  
                context.write(new Text(fields[1]), new Text(fields[3]));  
      }  
 }

AvgReducer.java

 import java.io.IOException;  
 import org.apache.hadoop.io.Text;  
 import org.apache.hadoop.mapreduce.Reducer;  
 public class AvgReducer extends Reducer<Text, Text, Text, Text> {  
      public void reduce(Text _key, Iterable<Text> values, Context context)  
                throws IOException, InterruptedException {  
           // process values  
           double sum =0.0;  
           int count=0;  
           for (Text val : values) {  
                sum += new Double(val.toString());  
                count++;  
           }  
           context.write(_key, new Text(new Double(sum/count).toString()));  
      }  
 }

Now create a JAR by Right clicking the project ->export->

Make sure that no errors and the JAR is created. I have named the JAR as average1.jar

Once the JAR is ready then we can run the Map Reduce Job by running the below command from HADOOP_HOME.

bin/hadoop jar average1.jar AverageDriver

Make sure that no errors are written to console and no exceptions. Once the Job is done then we can check the output in HDFS output directory mentioned by using below command.

bin/hdfs dfs -cat /user/jineshmathew/output/*

For emp.dat that we created will generate the following output.

Architect 105000.0
Developer 85000.0
Director 135000.0

Now you are done with the Map Reduce program :). Please let me know if any questions or issues.

Comments

warburtonnadonMarch 3, 2022 at 5:55 PM
JackpotCity Casino Review 2021 - DrmCad
JackpotCity 안양 출장마사지 casino 경주 출장안마 review from experts. Get information 서산 출장샵 on bonuses, banking, customer support, games, 공주 출장마사지 banking, bonus policy 평택 출장샵 and more. Rating: 8.3/10 · ‎Review by DrmCad
ReplyDelete
Replies

Add comment

developer-tips

Search This Blog

How to write MapReduce program using eclipse

Comments

Post a Comment

Popular posts from this blog

How to create a minikube single node cluster for learning Kubernetes

log4j - How to write log to multiple log files using log4j.properties

How to format and install macOS in your old Macbook/ iMac