I am writing a mapreduce program for matrix addition. Since it requires 2 input files, i am using MultipleInputs. I have these following classes
MatAddMapper1.java
package mapred;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MatAddMapper1 extends Mapper<LongWritable, Text, Text, IntWritable>
{
//private static final int MISSING = 9999;
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException
{
String line = value.toString();
String[] content = line.split (" ");
String key1 = content[0] + " " + content[1];
int val = Integer.parseInt(content[2]);
// Key is (i,j)
context.write(new Text(key1), new IntWritable(val));
}
}
MatAddMapper2.java is similar.
MatAddReducer.java
package mapred;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class MatAddReducer
extends Reducer<Text, IntWritable, Text, IntWritable>
{
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException
{
int val = 0;
for (IntWritable value : values)
{
val = val + value.get();
}
context.write(key, new IntWritable(val));
}
}
MatAddApp.java (Main class)
package mapred;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class MatAddApp extends Configured implements Tool
{
public int run(String[] args) throws Exception
{
Configuration conf = new Configuration();
@SuppressWarnings("deprecation")
Job job = new Job(conf, "Matrix Addition");
job.setJarByClass(MatAddApp.class);
MultipleInputs.addInputPath(job,new Path(args[0]),TextInputFormat.class,MatAddMapper1.class);
MultipleInputs.addInputPath(job,new Path(args[1]),TextInputFormat.class,MatAddMapper2.class);
FileOutputFormat.setOutputPath(job, new Path(args[2]));
job.setReducerClass(MatAddReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
return (job.waitForCompletion(true) ? 0 : 1);
}
public static void main(String[] args) throws Exception
{
int ecode = ToolRunner.run(new MatAddApp(), args);
System.exit(ecode);
}
}
I am using eclipse and created a jar file MatAddition.jar. M.txt and N.txt are input matrices. When I tried to run the program in my hadoop cluster, I got the following error
Exception in thread "main" java.lang.ClassNotFoundException: MatAddApp
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)