Out of memory [divide and conquer algorithm]

120 views Asked by At

So I have a foo table, which is huge and whenever I try to read all data from that table Node.JS gives me out of memory error! However still you can get chuncks of data by having offset and limit; but again I cannot merge all of the chuncks and have them in memory, because I run into out of memory again! In my algorithm I have lots of ids and need to check whether each id exists in foo table or not; what is the best solution (in terms of algorithm complexity) when I cannot have all of the data in memory to see if id exists in foo table or not?

PS: The naive solution is to get chuncks of data and see chunck by chunck for each id; but the complexity is n squared; there should be a better way I believe...

3

There are 3 answers

0
OrenD On BEST ANSWER

Under the constraints you specified, you could create a hash table containing the ID's you are looking for as keys, with all values initialized to false.

Then, read the table chunk by chunk and for each item in the table, look it up in the hash table. If found, mark the hash table entry with true.

After going all the chunks, your hash table will hold values of true for ID's found in the table.

Providing that the hash table lookup has a fixed time complexity, then this algorithm has a time complexity of O(N).

1
Ilya Kobelevskiy On

You can sort your ids, and break them to chunks. Then you can keep in memory range of values in each chunk - (lowestId,highestId) for that chunk.

You can quickly find chunk (if any) id may be contained in using in memory binary search, and then load that specific chunk in memory and to binary search on it.

Complexity should be LogN for both. In general, read about binary search algorithm.

0
shapiro yaacov On

"PS: The naive solution is to get chuncks of data and see chunck by chunck for each id; but the complexity is n squared; there should be a better way I believe..."

Let's say you could load the whole table into your memory. In any case you'll need to check each ID whether or not it is in the DB. You can't do it any better than comparing.

Having said that, a hash table comes to mind. Lets say the IDs are integers, and they are randomly picked. you could hash the IDs you need to check by the last two digits (or the first two for that matter). Then checking the items you have in your memory will be quicker.