How to model data for in-memory processing

976 views Asked by At

I have a lot of static data (i.e. read only data, which is not transactional) which gets updated only once in few days.

I have to support searches on that data (api calls, not sql). So I am thinking I will just load it in Memory, and refresh the in-memory data once in a while. The RAM should not be an issue since we are on 64 bit... data can be in 2 GB to 50 GB range.

I am hoping I can process searches on the in-memory data much faster than querying a database (indexed tables as well).

Is there a certain "approach" I can take to design this in-memory data?

UPDATE:

My question isn't about what RDBMS / noSQLDB to use. I want to know how to structure data in-memory when I am no longer bound by a storage mechanism.

5

There are 5 answers

5
Praneeth Reddy G On BEST ANSWER

It totally depends on what kind of data you are working with and what kind of searches you want to perform on it.

For example, with hash based structures you can not support partial word searches.

You could go for an in-memory relational db if your data is really relational (With lot of columns and relations). You can index all the searchable columns. But RDBMS is of no use, if your data is just a bunch of key value pairs or just a bunch of paragraphs.

A specific DS can not be suggested here with out the knowledge of your requirements.

I suggest you to explore data structures(like search trees, tries, hashtables), databases(like redis), search engines(like solr, lucene) to find out which suits your needs best.

3
ingrid.e On

I have used Redis ( http://redis.io/ ) before and it is a very fast in memory storage. As an approach, creating keys for your data helps optimizing any search and redis supports that .

If you also need any kind of data processing, you could look maybe look at Hadoop / HBase.

0
Alp On

Years ago, I have used prevayler for a non-database web-application. It was incredibly fast! It uses POJOs. Easy to understand and implement.

Data structure was very simple. Think about it as a tree with a default node. Prevayler knows the root of the tree and you add your data to this tree. You can take snapshots of the tree as a backup mechanism. You can even use XML snapshots.

Prevayler was one of the very first of this kind. I am sure there are other libraries. Make a bit of a research before you decide which one to go with.

Cheers.

0
Rob Conklin On

Are you trying to learn how to build a hashtable?

Read up on Binary Search Trees, read a few books on algorithm design.. Probably read The art of computer programming.

Or use whatever Hashtable implementation that your particular language uses.

Many people are recommending databases simply because, unless your model fits into a simple hashtable model, you will end up needing a database (doing your own indexing and joining algorithms is silly). Here is a list of In-Memory databases which might help you choose a path. A lot of what you choose depends on your platform, and if you are looking to spend money.

0
Aidin On

It mostly depends on your access patterns and how you want to deal with your data.

for example if you want fast search and getting some part of the data sorted, a red black tree data structure can be a good way to do it. or if you want simple key-value you can use a hash table.

but implementing these data structures can be tricky and hard and many people have solved this problem several times before.

It's strongly advised to use solutions like redis or other in memory databases. simply DRY.