Riak: search by key prefix

922 views Asked by At

I'm a newcomer to Riak and I've been reading this chapter from riak's docs. It goes to show that by adding structure information to buckets and keys one can overcome some of the limitations of key/value operations.

Though the article states an example on how such key would be structured:

sensor data keys could be prefaced by sensor_ or temp_sensor1_ followed by a timestamp (e.g. sensor1_2013-11-05T08:15:30-05:00)

no method is mentioned on how to query the data by key prefix (e.g sensor1_). Looking around stackoverflow I found this question. In it MapReduce and key filtering are mentioned as a possible solution. But the documentation on key filters states that they are a soon-to-be deprecated feature. I also checked out Riak search as a possible way but wasn't able to find a way to query data by key prefix.

My question is: What is the best way to search data by key prefix? I would greatly appreciate an example.

1

There are 1 answers

4
Craig On BEST ANSWER

The best way to search for a key prefix is to not do it if you don't need to, i.e. design around that search pattern if you can. The primary way to do that is to use deterministic keys that your application can easily compute. That said, if you cannot avoid building your application to require searching on key prefixes there are couple of things you can do (all of which have their drawbacks).

  1. Key Filters - http://docs.basho.com/riak/latest/dev/references/keyfilters/ - as you noted already these are marked as deprecated and not recommended at this point.
  2. MapReduce - http://docs.basho.com/riak/latest/dev/advanced/mapreduce/ - a good option if you can query in batches but not really suited for real time querying. You could cache the query results if precomputing the queries is helpful.
  3. Riak Search 2.0 (Solr) - http://docs.basho.com/riak/latest/dev/using/search/ - this is probably the easiest method to implement from an application perspective and allows to query your keys using a query along the lines of: 'curl "$RIAK_HOST/search/sensor?wt=json&q=_yz_rk:sensor1_*"'. Using search does come with a performance hit over straight key based queries but you can cache queries.
  4. Data Modeling - querying by key directly is always going to provide the best performance as mentioned above. One option is to to take advantage of Riak's Data Types (CRDTs) and create a bucket that uses sets. You could create a set for each sensor that contained a list of keys associated with that sensor in the first bucket. Then you can iterate over the keys in the set and do a multi-get to return all of associated records.

Hope this gives you some ideas.