HBase vs Hyptertable vs Lucene

3k views Asked by At

I am using an search system in lucene. By default it is not distributed, so I am thinking of moving to something like HBase or Hadoop.

Do solutions like HBase or Hypertable have a built-in search capability or will I need to implement Lucene on top of them?

4

There are 4 answers

1
David On

Lucene is very different from BigTable clones like HBase or Hypertable. If you are simply looking for a distributed Lucene, then you should look at projects such as Elastic Search or Katta.

Solr/Lucene also has the ability to operate over a cluster, but the partitioning is not automatic. You have to create shards and replicas manually to match the distribution of that data you are looking for. If your underlying data is stored in something like HBase this is much easier to set up, modify, and update.

Fundamentally HBase and Lucene solve different problems. Lucene is an index that allows keyword and other types of searches to return quickly. HBase is a data repository that can serve individual rows in real time; however, HBase does not have a online query capability. For best results, you have to combine them. One example in this area is Lily (http://outerthought.org/site/products/lily.html)

0
ech On

Lucene provides two main features: structured search and full-text search. Hbase doesn't provide any of those, structured search can be done with hbase in a relatively easy way, it's what Lilly does I think. But rebuilding a full text search would be more difficult. To scale you Lucene you can still try to partitioned you index by looking to an attribute that can split your data in separate area (you won't be able to do cross area search). Then you can have one cluster per area.

2
Joshua Martell On

You may also want to look at Lucandra, the Lucene with a Cassandra backend:

https://github.com/tjake/Lucandra

0
Vaibhav On

Another technology to look at is Katta or Distributed Lucene which can operate over HDFS