HashMap does not behave as expected for Chinese characters

Question

HashMap does not behave as expected for Chinese characters

894 views Asked by Kaushik Lele At 23 December 2016 at 07:03

China-中国,CN
Angola-安哥拉,AO
Afghanistan-阿富汗,AF
Albania-阿尔巴尼亚,AL
Algeria-阿尔及利亚,DZ
Andorra-安道尔共和国,AD
Anguilla-安圭拉岛,AI

In Java, I'm reading the above text from a file and creating a map where the keys will be the part before the comma and the values will be the region code after the comma.

Here is the code:

public static void main(String[] args) {

    BufferedReader br;
    Map<String,String>  mymap = new HashMap<String,String>();
    try {
        br = new BufferedReader(new InputStreamReader(new FileInputStream("C:/Users/IBM_ADMIN/Desktop/region_code_abbreviations_Chinese.csv"), "UTF-8"));
        String line;
        while ((line = br.readLine()) != null) {
           //System.out.println(line);
           String[] arr= line.split(",");
           mymap.put(arr[0], arr[1]);
        }

        br.close();
    } catch (IOException e) {
        System.out.println("Failed to read users file.");
    } finally {}

    for(String s: mymap.keySet()){
        System.out.println(s);
        if(s.equals("China-中国")){
            System.out.println("Got it");
            break;
        }
    }

    System.out.println("----------------");
    System.out.println("Returned from map  "+ mymap.get("China-中国"));

    mymap = new HashMap<String,String>();
    mymap.put("China-中国","Explicitly Put");
    System.out.println(mymap.get("China-中国"));
    System.out.println("done");
}

The output:

:
:
Egypt-埃及
Guyana-圭亚那
New Zealand-新西兰
China-中国
Indonesia-印度尼西亚
Laos-老挝
Chad-乍得
Korea-韩国
:
:
Returned from map  null
Explicitly Put
done

Map is loaded correctly but when I search the map for "China-中国" - I do not get the value.

If I explicitly put "China-中国" in map, then it returns a value. Why is this happening?

Original Q&A

There are 3 answers

Patrick Parker On 23 December 2016 at 07:13

Since you are having a problem with the first value, I would check to see if the file starts with a BOM (Byte Order Mark).

If so, try stripping the BOM before processing.

See: Byte order mark screws up file reading in Java

Tom Grylls On 23 December 2016 at 07:58

You can use org.apache.commons.io.input.BOMInputStream.

BufferedReader br= new BufferedReader(new InputStreamReader(new BOMInputStream(new FileInputStream("filepath")),"UTF-8"))

**wumpz** · Accepted Answer · 2016-12-23T07:32:49+00:00

Check if your resource file is not UTF-8, e.g. UTF-8Y, with BOM Bytes at the start. But this would only infere with the first value. If you change the test to a value from the middle, do you have a value or not? If not then this is not the problem.

Second possibility is your source code file is not UTF-8. Therefore the byte sequence of "China-中国" of your resource file and your sourcecode file is not equal and you will not get a match. But you include the value with the sourcecodes byte sequence explicitly and it will be found.

In fact this is not a problem with HashMap but with character or file encoding.

TechQA.

HashMap does not behave as expected for Chinese characters

There are 3 answers

Related Questions in JAVA

Related Questions in UTF-8

Related Questions in CHINESE-LOCALE

Popular Questions

Popular Tags

Trending Questions