I need to store large amount of small data objects (millions of rows per month). Once they're saved they wont change. I need to :
- store them securely
- use them to analysis (mostly time-oriented)
- retrieve some raw data occasionally
- It would be nice if it could be used with JasperReports or BIRT
My first shot was Infobright Community - just a column-oriented, read-only storing mechanism for MySQL
On the other hand, people says that NoSQL approach could be better. Hadoop+Hive looks promissing, but the documentation looks poor and the version number is less than 1.0 .
I heard about Hypertable, Pentaho, MongoDB ....
Do you have any recommendations ?
(Yes, I found some topics here, but it was year or two ago)
Edit: Other solutions : MonetDB, InfiniDB, LucidDB - what do you think?
Am having the same problem here and made researches; two types of storages for BI :
The answer depends on what you really need :
http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/
Document oriented DBs are not suited for BI, they are more useful for CRM/CMS issues where you need frequent access to a particular row
As for the exact choice inside a category, I'm still undecided. Cassandra in distributed, and Monet or InfiniDB for CODB, are leaders. Monet is reported to have problem loading very big tables because it runs indexes in memory.