Make sure that your Hadoop installtion is working fine on your machine. For this tutorial I have installed Hadoop on my MacBook Pro using the below guidelines.
Hadoop Installation guidelines
Install Eclipse IDE on the machine. I have installed Eclipse Version: Luna Service Release 1 (4.4.1).
Create a new project using File->New->Project
Set java Build Path by using Project->Properties->Java Build Path->Libraries->Add External JARs
Assuming Hadoop is installed on /Users/jineshmathew/hadoop-2.5.1, select all JARs in below folders
/Users/jineshmathew/hadoop-2.5/share/hadoop/common/lib/
/Users/jineshmathew/hadoop-2.5/share/hadoop/yarn/lib/
/Users/jineshmathew/hadoop-2.5/share/hadoop/mapreduce/lib
/Users/jineshmathew/hadoop-2.5/share/hadoop/hdfs/lib/
Now we need to create input data, I have created a file emp.dat with the below employee data.
emp.dat
-------------------------------
Tom,Developer,IL,80000
Jose,Architect,OH,100000
Bill,Director,OH,130000
Bill,Director,OH,140000
Matt,Architect,IL,110000
Tom,Developer,IL,90000
The MapReduce program we are about to write will find average salary for each Job title.
Now we need to create 3 class files for Mapper, Reducer and Driver class.
Make sure that no errors and the JAR is created. I have named the JAR as average1.jar
Once the JAR is ready then we can run the Map Reduce Job by running the below command from HADOOP_HOME.
bin/hadoop jar average1.jar AverageDriver
Make sure that no errors are written to console and no exceptions. Once the Job is done then we can check the output in HDFS output directory mentioned by using below command.
bin/hdfs dfs -cat /user/jineshmathew/output/*
For emp.dat that we created will generate the following output.
Architect 105000.0
Developer 85000.0
Director 135000.0
Hadoop Installation guidelines
Install Eclipse IDE on the machine. I have installed Eclipse Version: Luna Service Release 1 (4.4.1).
Create a new project using File->New->Project
Set java Build Path by using Project->Properties->Java Build Path->Libraries->Add External JARs
Assuming Hadoop is installed on /Users/jineshmathew/hadoop-2.5.1, select all JARs in below folders
/Users/jineshmathew/hadoop-2.5/share/hadoop/common/lib/
/Users/jineshmathew/hadoop-2.5/share/hadoop/yarn/lib/
/Users/jineshmathew/hadoop-2.5/share/hadoop/mapreduce/lib
/Users/jineshmathew/hadoop-2.5/share/hadoop/hdfs/lib/
Now we need to create input data, I have created a file emp.dat with the below employee data.
emp.dat
-------------------------------
Tom,Developer,IL,80000
Jose,Architect,OH,100000
Bill,Director,OH,130000
Bill,Director,OH,140000
Matt,Architect,IL,110000
Tom,Developer,IL,90000
Copy the input data to HDFS using the below command.
bin/hdfs dfs -put emp.dat /user/jineshmathew/input
Now we need to create 3 class files for Mapper, Reducer and Driver class.
AverageDriver.java
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class AverageDriver {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "EmpJob");
job.setJarByClass(AverageDriver.class);
// TODO: specify a mapper
job.setMapperClass(AvgMapper.class);
// TODO: specify a reducer
job.setReducerClass(AvgReducer.class);
//job.setCombinerClass(AvgCombiner.class);
// TODO: specify output types
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
// TODO: specify input and output DIRECTORIES (not files)
FileInputFormat.setInputPaths(job, new Path("input"));
FileOutputFormat.setOutputPath(job, new Path("output"));
if (!job.waitForCompletion(true))
return;
}
}
AvgMapper.java
import java.io.IOException;
import java.util.Arrays;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class AvgMapper extends Mapper<LongWritable, Text, Text, Text> {
public void map(LongWritable ikey, Text ivalue, Context context)
throws IOException, InterruptedException {
String[] fields = ivalue.toString().split(",");
if(fields.length >=2)
context.write(new Text(fields[1]), new Text(fields[3]));
}
}
AvgReducer.java
import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class AvgReducer extends Reducer<Text, Text, Text, Text> {
public void reduce(Text _key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
// process values
double sum =0.0;
int count=0;
for (Text val : values) {
sum += new Double(val.toString());
count++;
}
context.write(_key, new Text(new Double(sum/count).toString()));
}
}
Now create a JAR by Right clicking the project ->export->
Make sure that no errors and the JAR is created. I have named the JAR as average1.jar
Once the JAR is ready then we can run the Map Reduce Job by running the below command from HADOOP_HOME.
bin/hadoop jar average1.jar AverageDriver
Make sure that no errors are written to console and no exceptions. Once the Job is done then we can check the output in HDFS output directory mentioned by using below command.
bin/hdfs dfs -cat /user/jineshmathew/output/*
For emp.dat that we created will generate the following output.
Architect 105000.0
Developer 85000.0
Director 135000.0
Now you are done with the Map Reduce program :). Please let me know if any questions or issues.
JackpotCity Casino Review 2021 - DrmCad
ReplyDeleteJackpotCity 안양 출장마사지 casino 경주 출장안마 review from experts. Get information 서산 출장샵 on bonuses, banking, customer support, games, 공주 출장마사지 banking, bonus policy 평택 출장샵 and more. Rating: 8.3/10 · Review by DrmCad