Querying CompositeType columns in Cassandra using Hector

6.5k views Asked by At

Here's a sample of the scenario I'm facing. Say I have this column family:

    create column family CompositeTypeCF 
    with comparator = 'CompositeType(IntegerType,UTF8Type)' 
    and key_validation_class = 'UTF8Type' 
    and default_validation_class = 'UTF8Type'

Here's some sample Java code using Hector as to how I'd go about inserting some data into this column family:

 Cluster cluster = HFactory.getOrCreateCluster("Test Cluster", "192.168.1.6:9160");
 Keyspace keyspaceOperator = HFactory.createKeyspace("CompositeTesting", cluster);
 Composite colKey1 = new Composite();
 colKey1.addComponent(1, IntegerSerializer.get());
 colKey1.addComponent("test1", StringSerializer.get());
 Mutator<String> mutator = HFactory.createMutator(keyspaceOperator, StringSerializer.get());
 Mutator<String> addInsertion = mutator.addInsertion("rowkey1", "CompositeTypeCF",
     HFactory.createColumn(colKey1, "Some Data", new CompositeSerializer(), StringSerializer.get()));
 mutator.execute();

This works, and if I go to the cassandra-cli and do a list I get this:

$ list CompositeTypeCF;

Using default limit of 100
-------------------
RowKey: rowkey1
=> (column=1:test1, value=Some Data, timestamp=1326916937547000)

My question now is this: How do I go about querying this data in Hector? Basically I would need to query it in a few ways:

  1. Give me the whole row where Row Key = "rowkey1"
  2. Give me the column data where the first part of the column name = some integer value
  3. Give me all the columns where the first part of the column name is within a certain range
1

There are 1 answers

3
libjack On BEST ANSWER

Good starting point tutorial here.

But, after finally having the need to use a composite component and attempting to write queries against the data, I figured out a few things that I wanted to share.

When searching Composite columns, the results will be a contiguous block of columns.

So, assuming a s composite of 3 Strings, and my columns look like:

A:A:A
A:B:B
A:B:C
A:C:B
B:A:A
B:B:A
B:B:B
C:A:B

For a search from A:A:A to B:B:B, the results will be

A:A:A
A:B:B
A:B:C
A:C:B
B:A:A
B:B:A
B:B:B

Notice the "C" Components? There are no "C" components in the start/end terms! what gives? These are all the results between A:A:A and B:B:B columns. The Composite search terms do not give the results as if processing nested loops (this is what I originally thought), but rather, since the columns are sorted, you are specifying the start and end terms for a contiguous block of columns.

When building the Composite search entries, you must specify the ComponentEquality

Only the last term should be GREATER_THAN_EQUAL, all the others should be EQUAL. e.g. for above

Composite start = new Composite();
start.addComponent(0, "A", Composite.ComponentEquality.EQUAL);
start.addComponent(1, "A", Composite.ComponentEquality.EQUAL);
start.addComponent(2, "A", Composite.ComponentEquality.EQUAL);

Composite end = new Composite();
end.addComponent(0, "B", Composite.ComponentEquality.EQUAL);
end.addComponent(1, "B", Composite.ComponentEquality.EQUAL);
end.addComponent(2, "B", Composite.ComponentEquality.GREATER_THAN_EQUAL);

SliceQuery<String, Composite, String> sliceQuery = HFactory.createSliceQuery(keyspace, se, ce, se);
sliceQuery.setColumnFamily("CF").setKey(myKey);
ColumnSliceIterator<String, Composite, String> csIterator = new ColumnSliceIterator<String, Composite, String>(sliceQuery, start, end, false);

while (csIterator.hasNext()) ....