Exception in thread "main" java.lang.OutOfMemoryError: Java heap space while trying to Verify Millions of data

1000 views Asked by At

From flat file which contain data line by line, my task to be verify data from DB not present I am trying to verify Using Java first I have inserted flat file data into HashSet1 and another Hashset2 For DB data after that I am trying to check Hashset1.Contain(Hashset2) so that I can identify which data is not Present in DB.

Given Below is Dummy Code which you can Assume hashset1(which is some missing data) as File Reader data and hashset2(full data from db) as DB data

but as I mentioned here I have 30 Million Data need to verify, I am able to verify 1 million data through this way but not able to verify 30 million data which is my task. Is there any best way to do this kindly suggest and some sort of code it will we very thankful.

public class App 
{

    public static void sampleMethod() {
        Set<Integer> hashset1 = new HashSet<Integer>();
        Set<Integer> hashset2 = new HashSet<Integer>();
        for(int i = 0; i<30000000; i++ ) {
            if(i %50000 != 0) {
                hashset1.add(i);
            }
        }
        int count = 0;
        for(int j =0;j<30000000;j++) {
            if(hashset1.contains(j)) {
                count++;
            } else {
                System.out.println(j+" Is Not Present");
                hashset2.add(j);
            }
        }
        System.out.println("Contain Value Count" + count);  

    }
    public static void main( String[] args )
    {
        sampleMethod();
    }
}

Error Stack Trace :

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.HashMap.resize(HashMap.java:703)
    at java.util.HashMap.putVal(HashMap.java:662)
    at java.util.HashMap.put(HashMap.java:611)
    at java.util.HashSet.add(HashSet.java:219)
    at com.java.anz.BankingPro.App.sampleMethod(App.java:20)
    at com.java.anz.BankingPro.App.main(App.java:38)
1

There are 1 answers

0
Daniel S. On

For combining two sets of data, it's enough to load only the smaller of the two into a hashset (1.), then, as the next step, detect the differences of the sets (2.), and only then modify data according to the found differences (3.). Let's call the small set simply smallHashSet in the following pseudo code:

  1. load smaller set of data into smallHashSet

  2. iterate (loop) over entries in bigger set of data, one by one - do not load it all at once, just load one after another and process one at a time:

    2.1. let's say bigSetEntry is such an entry from the bigger set, then
    if (smallHashSet.contains(bigSetEntry)) smallHashSet.remove(bigSetEntry).

  3. When you are done, then smallHashSet contains only the entries which are in the small set, but missing in the big set. And you never needed to load the big set all at once. You can now do something with these differing entries, e.g. add them to the big data file.