I know this question is asked multiple times in stackoverflow. I am posting this question to find out what will be the best choice in for my design. I have following schema for my job details.
_unique_key varchar(256) NULL
_job_handle varchar(256) NULL
_data varchar(1024) NULL
_user_id int(11) NULL
_server_ip varchar(39) NULL
_app_version varchar(256) NULL
_state int(11) NULL
_is_set_stopped bool
What operation we are doing on this table:
- For each job we will be having one update and 10 select query on this table. So we need high frequency for read and write.
- There are many application which are manipulating this table by doing filter on:
- _unique_key
- _state
- is_set_stopped
- _user_id
- _data field size varies from 5KB to 1 MB based on type of application and user.
- Application can update selective attribute.
Solution we thought:
MySQL InnoDB
I think MySQL will not scale enough due to requirement on high read and write.
MySQL In Memory Table
Problem with this solution is that
- It doesn't support dynamic field size. MEMORY tables use a fixed-length row-storage format. Variable-length types such as VARCHAR are stored using a fixed length. Source http://dev.mysql.com/doc/refman/5.0/en/memory-storage-engine.html
- select for .... update it will lock a entire table. I don't know will it be a problem.
Redis
Redis look likes a good choice. But I think my table is not good for key value cache server.
- It support only very let's set of datatypes. I can store only string in list. I need to store fields as JSON or some other format.
- If clients want to update a particular attribute they need to download full value and then do parsing of object and repush to server. May be I am wrong is there a way to do that?
- Filtering based on value will not be possible. May be I am wrong is there a way to do that?
MySQL InnoDB on TMPFS file system
This look promising. But don't no will it scale enough similar to Redis or MySQL in memory table.
In this question, you are confusing raw performance (i.e. efficiency) with scalability. They are different concepts.
Between the InnoDB and memory engines, InnoDB is likely to be the most scalable. InnoDB supports multi-versioning concurrency control, has plenty of optimizations to deal with contention, so it will handle concurrent accesses much better than the memory engine. Even if it may be slower in some I/O bound situations.
Redis is a single-threaded server. All the operations are serialized. It has zero scalability. It does not mean it is inefficient. On the contrary, it will likely support more connections that MySQL (due to its epoll-based event loop), and more traffic (due to its very efficient lock-free implementation and in-memory data structures).
To answer your question, I would give a try to MySQL with InnoDB. If it is properly configured (no synchronous commit, enough cache buffer, etc ...), it can sustain a good throughput. And instead of running it on top on tmpfs, I would consider SSD hardware.
Now, if you prefer to use Redis (which is not a relational store btw), you can certainly do it. There is no need to systematically serialize/deserialize your data. And filtering is indeed possible, provided you can anticipate all access paths and find an adapted data structure.
For instance:
For each job insertion, you need to pipeline the following commands:
You can easily update any field individually provided you maintain the corresponding sets.
You can perform filtering queries by intersecting the sets. For instance:
With Redis, the bottleneck will likely be the network, especially if the data field is large. I hope you will have more jobs of 5KB than jobs of 1MB. For instance 1000 write/s of 1 MB objects represents 8 GBits/s, probably more than what your network can sustain. This is true for both Redis and MySQL.