A wired thing in collide of HashMap in Java

100 views Asked by At

When I use HashMap to get the common key between the left_table and the right_table(also I'm testing Bloom Filter algorithm to compare with HashMap, so I add tag Bloom Filter to get attention, HashMap may have this problem), I declared two HashMap hm1 and hm2, When I put the key from right_table into the hm2(value default by 1), the key always occur an collide. I realized the hash value of the key may be the same, but why it always occur in the same place. When I rearranged the declaration of hm1 and hm2, the collision remains still!

I have test that hm1.size is always equals n which is right and it can store more than 2000000 uuids. If HashMap tool in Java reliable?

    import java.util.ArrayList;
    import java.util.HashMap;
    import java.util.List;
    import java.util.Random;
    import java.util.UUID;

    public class HashMapBugTest {
        public static void main(String[] argv) {
            int n = 100;
            int real = 10;
            List<String> Uuids_in_left_table = new ArrayList<String>();

            // init left table
            Long startInsertTime1 = System.currentTimeMillis();
        for (int i = 0; i < n; i++) {
            String Uuid = UUID.randomUUID().toString();
            Uuids_in_left_table.add(Uuid);
        }
        Long endInsertTime1 = System.currentTimeMillis();
        System.out.println("The length of Uuids_in_left_table is:" + Uuids_in_left_table.size());
        System.out.println("The time use for insert the uuid into the left table used " + (endInsertTime1 - startInsertTime1) + "ms.");

        // init right table
        List<String> Uuids_in_right_table = new ArrayList<String>();
        Random r = new Random(n);
        Long startInsertTime2 = System.currentTimeMillis();
        for (int i = 0; i < n - real; i++) {
            String Uuid = UUID.randomUUID().toString();
            Uuids_in_right_table.add(Uuid);
        }
        for (int i = 0; i < real; i++) {
            String Uuid = Uuids_in_left_table.get(r.nextInt(n));
            Uuids_in_right_table.add(Uuid);
        }
        Long endInsertTime2 = System.currentTimeMillis();
        System.out.println("The length of Uuids_in_left_table is: " + Uuids_in_left_table.size());
        System.out.println("The time use for insert the uuid into the right table used " + (endInsertTime2 - startInsertTime2) + "ms.");

        // build hashmap
        HashMap<String, Object> hm2 = new HashMap<String, Object>();
        HashMap<String, Object> hm1 = new HashMap<String, Object>();
        for (int i = 0; i < n; i++) {
            int ind = hm2.size();
            if (ind == 97)
                System.out.println(hm2.containsKey(Uuids_in_right_table.get(ind)));
            hm2.put(Uuids_in_right_table.get(i), 1);
            if (ind == hm2.size())
                System.out.println("a"+i+"---"+Uuids_in_right_table.get(i));
        }
        for (int i = 0; i < n; i++) {
            hm1.put(Uuids_in_left_table.get(i), 1);
        }

        int cnt = 0;
        System.out.println("length of hm1 is:" + hm1.size());
        System.out.println("length of hm2 is:" + hm2.size());
        Long startHashMapTime = System.currentTimeMillis();
        for (String str:hm1.keySet()) {
            if (hm2.containsKey(str))
                cnt += 1;
        }
        Long endHashMapTime = System.currentTimeMillis();
        System.out.println("The time used for check the uuid common in the left table and right table used " + (endHashMapTime - startHashMapTime) + "ms.");
        System.out.println("The number of common uuid is:" + cnt);
    }
}

The output of the previous code is:

The length of Uuids_in_left_table is:100
The time use for insert the uuid into the left table used 20ms.
The length of Uuids_in_left_table is: 100
The time use for insert the uuid into the right table used 3ms.
true
a97---c3b4f281-d82e-42c5-9a6d-b4de19032689
true
length of hm1 is:100
length of hm2 is:99
The time used for check the uuid common in the left table and right table used 0ms.
The number of common uuid is:9
1

There are 1 answers

0
Erwin Bolwidt On BEST ANSWER

The reason that your problem is so reproducible is this line:

Random r = new Random(n);

It doesn't do what you think it does.

What it does: it creates a random generator with initial seed n. And since n is always 10 in your program, that means you always get the same sequence of random numbers.

Always leading to exactly the same 10 uuids picked from your left list into your right list, and index 88 is used twice.

Fix:

Random r = new Random();

This will create a random generator with an initial seed based on the current time in milliseconds, so you're very likely to get a different list of 10 numbers each time that you run the program.

Your code does not highlight any issues in HashMap. If you insert the same key (the uuid from index 88 in your left list) twice into your hm2, then the second time you insert it will overwrite the first time you insert it. And rather than 100 elements as you expected, it will only contain 99 elements.