How can I perform search on a lookup table without loading it in memory?

470 views Asked by At

Now I have a file recording the entries of a lookup table. If the number of entries is small, I can simply load this file into an STL map and perform search in my code. But what if there are many many entries? If I do it in the way above, it may cause error such as out of memory. I'm here to listen to your advice...

P.S. I just want to perform search without loading all entries into memory.

Can Key-value database solve this problem?

2

There are 2 answers

0
047 On

You'll have to load the data from hard drive eventually but sure if a table is huge it won't fit into memory to do a linear search through it, so:

  1. think if you can split the data into a set of files
  2. make an index table of what file contains what entries (say the first 100 entries are in "file1_100", second hundred is in "file101_201" an so on)
  3. using index table from step 2 locate the file to load
  4. load the file and do a linear search

That is a really simplified scheme for a typical database management system so you may want to use one like MySQL, PostgreSQL, MsSQL, Oracle or any one of them. If that's a study project then after you're done with the search problem, consider optimizing linear operations (by switching to something like binary search) and tables (real databases use balanced tree structures, hash tables and like).

1
Thomas Matthews On

One method would be to reorganize the data in the file into groups.

For example, let's consider a full language dictionary. Usually, dictionaries are too huge to read completely into memory. So one idea is to group the words by first letter.

In this example, you would first read in the appropriate group based on the letter. So if the word you are searching for begins with "m", you would load the "m" group into memory.

There are other methods of grouping such as word (key) length. There can also be subgroups too. In this example, you could divide the "m" group by word lengths or by second letter.

After grouping, you may want to write the data back to another file so you don't have to modify the data anymore.

There are many ways to store groups on the file, such as using a "section" marker. These would be for another question though.

The ideas here, including from @047, are to structure the data for the most efficient search, giving your memory constraints.