I'm looking for a way to implement (provide) blob storage for an application I'm building.
What I need is the following:
- Access is done using simple keys (like primary keys; I don't need a hierarchy);
- Blobs with sizes will be from 1KiB to 1GiB. Both scenarios must be fast and supported (so systems that work based on large blocks, like I believe Hadoop does, are out);
- Streaming access to blobs (i.e. to be able to read random parts of the blob);
- Access over REST;
- No eventual consistency.
My infrastructure requirements are as follows:
- Horizontally scalable, but sharding is OK (so it is not necessary that the system natively supports horizontal scaling);
- High availability (so replication and automatic failover);
- I can't use Azure or Google blob storage; this is a private cloud application.
I'm prepared to implement such a system myself, but I prefer an out of the box system that implements this or at least parts of it.
I have e.g. looked at Hadoop, but that has eventual consistency, so is out. There seem to be a number of Linux DFS implementations, but these all work using mounting and I just need REST access. Also it looks like the range of blob sizes makes things difficult.
What system could I use for this?
It's a pretty old post, but I'm looking pretty much the same. I've found the stack of GridFS and ngnix-based HTTP access module.