I would like to use Cassandra for my website analytics which is geared especially towards customers segmentation. What this analytics will do is gather Page View data for every customers/visitor which includes:
userName, country, city, gender, timeStamp, source, campaign, pageUrl, timeOnPage
This data will have to be sliced by all the dimensions for easy customer segmentation. For example:
SELECT * Users WHERE Country = USA AND pageUrl = "http://mystore.com/bestsettingproduct" AND timeStamp < DateTime.Now AND timeStamp > DateTime.Now - Days.30
OR
SELECT * Users WHERE Campaign = "Last Email Campaign" AND AND timeStamp < DateTime.Now AND timeStamp > DateTime.Now - Days.30
As I understand in Cassandra you can query only by key. Given these sort of dynamic queries where one or more dimension can be included in the where clause, what would be a good data model?
I am thinking of having several tables with the following keys:
table 1 (userName, country, city, gender, timeStampWeek, timeStamp source, campaign, pageUrl, timeOnPage,
primary key((timeStampWeek, country), city, gender, timeStamp source, campaign, pageUrl, timeOnPage));
table 2 (userName, country, city, gender, timeStampWeek, timeStamp source, campaign, pageUrl, timeOnPage,
primary key((timeStampWeek, city), country, gender, timeStamp, source, campaign, pageUrl, timeOnPage));
table 3 (userName, country, city, gender, timeStampWeek, timeStamp, source, campaign, pageUrl, timeOnPage,
primary key((timeStampWeek, campaign), country, city, gender, timeStamp, source, pageUrl, timeOnPage));
And so on for all combinations of the dimensions? But this seems wild? Could there be a smarter way to model around these queries?