Cassandra sort using updatable query

158 views Asked by At

I have a data set with attributes content and year. I want to put them in to CF 'words' with attributes ('content','year','frequency'). The CF should support following operations.

  • Frequency attribute of a column can be updated (i.e. - : can run query like "UPDATE words SET frequency = 2 WHERE content='abc' AND year=1990;), where clause should contain content and year
  • Should support select query like "Select content from words where year = 2010 ORDER BY frequency DESC LIMIT 10;" (where clause only has year) where results can be ordered using frequency

Is this kind of requirement can be fulfilled using Cassandra? What is the CF structure and indexing I need to use here? What queries should I use to create CF and in indexing?

2

There are 2 answers

0
catpaws On

To use ORDER BY, the frequency has to be the second column in a compound PRIMARY KEY (http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html?scroll=reference_ds_d35_v2q_xj__using-compound-primary-keys-and-sorting-results). Using the frequency as the key prohibits updates to the value of the key: "Specify the row to update in the WHERE clause by including all columns composing the partition key. The IN relation is supported only for the last column of the partition key. The UPDATE SET operation is not valid on a primary key field." (http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/update_r.html)

create table words (
content text,
year int,
frequency int,
PRIMARY KEY (year, frequency));

insert into words (content, year, frequency) VALUES ('lorem ipsum dolor sit amet', 2014, 10 );
insert into words (content, year, frequency) VALUES ('Sed ut perspiciatis unde', 2010, 3 );
insert into words (content, year, frequency) VALUES ('Excepteur sint occaecat', 2010, 4 );
select content, frequency from words where year = 2010 ORDER BY frequency desc limit 2;

 content                  | frequency
--------------------------+-----------
  Excepteur sint occaecat |         4
 Sed ut perspiciatis unde |         3

(2 rows)
0
Chamila Wijayarathna On

I used following table structure as the solution.

create table words (
  year int,
  frequency int,
  content text,
  primary key (year, frequency, content) );

UPDATE statement won't work since we can't change the value of a clustering column. But we can use DELETE and INSERT instead. Since we don't know what the old frequency ahead of time (to do the delete), we'll need to keep another table mapping content,year -> frequency.