java.lang.NumberFormatException: For input string: "100" while executing MapReduce

6.2k views Asked by At

I'm writing a MapReduce job for finding common friends on facebook.

This is the input for my mapper :

100, 200 300 400 500 600
200, 100 300 400
300, 100 200 400 500
400, 100 200 300
500, 100 300
600, 100

And this is part of my mapper code:

map{
        String line = value.toString();
        String[] LineSplits = line.split(",");


        String[] friends = LineSplits[1].trim().split(" ");
        for(int i =0; i<friends.length;i++) {
            int friend2 = Integer.parseInt(friends[i]);
            System.out.println(friend2);
        }

        int friend1 = Integer.parseInt(LineSplits[0]);
        System.out.println(friend1);
}                                            

When I execute this,I am getting correct values in friend2. (Intege.parseInt is working fine here). The variable friend1 is supposed to get the value as '100'. But Integer.ParseInt is not working and I am getting an error like this:

java.lang.Exception: java.lang.NumberFormatException: For input string: "100"
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.NumberFormatException: For input string: "100"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:580)
    at java.lang.Integer.parseInt(Integer.java:615)
    at com.hadoop.CFMapper.map(CFMapper.java:29)
    at com.hadoop.CFMapper.map(CFMapper.java:1)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

So I got stuck here. Why am I getting a NumberFormatException for this? and how can I rectify this?

2

There are 2 answers

2
sstan On BEST ANSWER

Your file has an invalid Unicode Character 'ZERO WIDTH NO-BREAK SPACE' (U+FEFF). you need to get rid of that.

The character is not visible. So it's understandable that you didn't realize that it was there. You probably copy pasted in there by mistake. you'll have to see where you copied your data from.

I should mention that trimming your string in code won't work to get rid of that character. You really need to go in your input file and fix it from there.

You will find suggestions on how to get rid of the character in this thread.

Otherwise, if your file is not too big, why not start a fresh new file, and type in your values manually to be safe. :)

1
Ankur Anand On

Edit: As you mentioned in comment the length you are getting is 4

Maybe you can Trythis .

LineSplits[0].replace(u'\ufeff', '') and then try to parse and see what happens


If you look at the Integer.parseInt() jsl it says

Throws:

NumberFormatException - if the string does not contain a parsable integer.

So what if when the string: "100" reaches the ENDOFLINE . It can have character like \r\n or \n i.e line termination or null if the end of the stream has been reached, Which is clearly not "parsable integer". So you need to check for these before parsing.