I'm not an absolute expert of Cassandra, but what I know (correct me if I'm wrong) is that creating a secondary index for all the fields in a data model is an anti-pattern.
I'm using Elassandra and my data model looks like this :
- A users object that represents a user, with : userID, name, phone, e-mail, and all kind of infos on users (say these users are selling things)
- A sales object that represent a sale made by the user, with : saleID, userID, product name, price, etc. (There can be a lot more fields)
Given that I want to make complex searches on the user (search by phone, search by e-mail, etc etc) only on name, e-mail and phone, is it a good idea to create the 3 following tables from this data model :
- "User core" table with only userID, name, phone and e-mail (fields for search) [Table fully indexed and mapped in Elasticsearch]
- "User info" table with userID + the other infos [Table not indexed or mapped in Elasticsearch]
- "Sales" table with userID, saleID, product name, price, etc. [Table not indexed or mapped in Elasticsearch]
I see at least one advantage : Any kind of indexation (or reindexation when changes happen) and associated costs will happen only if there is a change in the "User core" table, which should not change too frequently. Also, if I need to get all other infos (User other infos or sales), I can just make 2 queries: 1 in "User core" to get the userID and 1 in the other table (with the userID) to get the other data.
But I'm not sure this is a good pattern, or maybe I should not worry about secondary indexation and just basically index any other table ?
In a more summarized way, what are the key reasons to chose - a secondary index like Elasticsearch in Elassandra - VS - denormalizing tables and use partition&clustering keys - ?
Please feel free to ask if you need more examples on my use case.
You should not normalise the tables when you're using Cassandra. The most important aspect of data modelling for Cassandra is to design one table for each application query. To put it another way, you should always denormalise your tables.
After you've modelled a table for each query, index the table with Elassandra which contains the most columns that you need to query.
It's important to note that Elassandra is not a magic bullet. In a lot of cases, you do not need to index the tables if you have modelled them against your application queries correctly.
The use case for Elassandra is to take advantage of features such as free-form text search, faceting, boosting, etc., but it will not be as performant as a native table. The fact is index lookups require more "steps" than a straight-forward single-partition Cassandra read. Of course, YMMV depending on your use case and access patterns. Cheers!