I have a use case where i had to analyze real time data using Apache Spark. But i still have a confusion related to choosing data store for my application. The analysis mostly include aggregation, KPI based identity analysis and machine learning tools to predict trends and analysis. Cassandra has good support and large tech companies are already using it in production. But after research i found Druid is faster than Cassandra and is good for OLAP queries but it's results are inconsistent of queries like Count Distinct.
Guys any help related that will be appreciated. Thanks
As your use case is to analyze real time data, I will suggest you to use
Druid
notApache Cassandra
. ForApache Cassandra
, due to its asynchronous master less replication you could have missed the updated data in real time analyzing. On the other hand,Druid
is designed for real time analyzing.Druid
Details: http://druid.io/druid.htmlApache Cassandra
Details: https://en.wikipedia.org/wiki/Apache_Cassandra