I have a tab separated input file from where I am reading 2 columns in Map-Reduce. 1 column is the key and the other value. So my requirement is, If value is blank i.e.. it contains space or tab or any other character, even the key should not be processed to the reducer.In whole, it should discard that record and fetch the next record which has value. Have written the following code, but it does not work. It executes all the records. It does not filter anything.
public static class Map extends Mapper<LongWritable, Text, Text,Text>
{
private Text vis = new Text();
private Text eValue = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
String line=value.toString();
String[] arr=line.split("\t");
vis.set(arr[8]);
eValue.set(arr[287]);
if (!eValue.equals("\t") || eValue.equals(" "))
{
context.write(vis,eValue);
}
}
}
Any help is appreciated. Thanks in advance.
PS : I am using Hadoop-2.6.0
You are doing it right with respect to the design. However, the if condition is not what you would expect I suppose. First understand what values are you getting in map,if you have a blank value. And once you split based on '\t', how are you expecting it to be present still in the individual words. Think again, and modify the if condition.