Solr - Using facets to sum documents based on variable precision geohashes

638 views Asked by At

Is there a way to get facet counts based on a substring of a facet field, akin to an EdgeNGram?


I'm using solr to store geohash strings at a high precision, and want to count the number of documents at a certain geohash precision. Facets are used to count documents in a specific geohash 'cell'.

At the moment, the only way I can see to do this is using tiers of geohashes.

eg. Current facet result set (from the indexed data):

<lst name="facet_counts">
 <lst name="facet_fields">
  <int name="svztdm7w">11</int>
  <int name="sv87rzt8">3</int>
  <int name="sv83t6bf">2</int>
  <int name="syqxp43m">4</int>
  <int name="syr9f0v2">4</int>
  <int name="syp8p8hb">3</int>
  <int name="tuuttmtt">3</int>
  <int name="twj1ynm3">3</int>
  <int name="w30n6u71">3</int>
 </lst>
</lst>

What I want at precision 1 setting:

<int name="s">27</int>
<int name="t">6</int>
<int name="w">3</int>

What I want at precision 2 setting:

<int name="sv">16</int>
<int name="sy">11</int>
<int name="tu">3</int>
<int name="tw">3</int>
<int name="w3">3</int>

Cheers.

1

There are 1 answers

2
David Smiley On BEST ANSWER

I've done a lot of work with geohashes in Solr; my latest work is LSP: http://code.google.com/p/lucene-spatial-playground/ which has various indexing strategies, including geohashes. If you search for my name and geohash, you'll find various material.

It sounds like what you are after is essentially a geohash based heatmap. That is something on my TODO list for LSP but in the mean time you can get it with a little manipulation of how you index the geohashes. After edge n-gramming your geohash, prefix the geohash with a leading number that is the length of the geohash. For example, instead of just "16", index "216". Use hexadecimal notation so you can get 16 values in one character, instead of decimal's 10. When faceting, use facet.prefix=2.

Good luck and keep in touch.