I'm writing a MapReduce job for finding common friends on facebook.
This is the input for my mapper :
100, 200 300 400 500 600
200, 100 300 400
300, 100 200 400 500
400, 100 200 300
500, 100 300
600, 100
And this is part of my mapper code:
map{
String line = value.toString();
String[] LineSplits = line.split(",");
String[] friends = LineSplits[1].trim().split(" ");
for(int i =0; i<friends.length;i++) {
int friend2 = Integer.parseInt(friends[i]);
System.out.println(friend2);
}
int friend1 = Integer.parseInt(LineSplits[0]);
System.out.println(friend1);
}
When I execute this,I am getting correct values in friend2
. (Intege.parseInt
is working fine here). The variable friend1 is supposed to get the value as '100'. But Integer.ParseInt
is not working and I am getting an error like this:
java.lang.Exception: java.lang.NumberFormatException: For input string: "100"
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.NumberFormatException: For input string: "100"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at com.hadoop.CFMapper.map(CFMapper.java:29)
at com.hadoop.CFMapper.map(CFMapper.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
So I got stuck here. Why am I getting a NumberFormatException for this? and how can I rectify this?
Your file has an invalid Unicode Character 'ZERO WIDTH NO-BREAK SPACE' (U+FEFF). you need to get rid of that.
The character is not visible. So it's understandable that you didn't realize that it was there. You probably copy pasted in there by mistake. you'll have to see where you copied your data from.
I should mention that trimming your string in code won't work to get rid of that character. You really need to go in your input file and fix it from there.
You will find suggestions on how to get rid of the character in this thread.
Otherwise, if your file is not too big, why not start a fresh new file, and type in your values manually to be safe. :)