Fastest full text search today?

3.7k views Asked by At

spoiler :
This is just another Lucene vs Sphinx vs whatever,
I saw that all other threads were almost two years old, so decided to start again..

Here is the requirement :

data size : max 10 GB.
rows : nearly billions
indexing should be fast
searching should be under 0 ms [ ok, joke... laugh... but keep this as low as possible ]

In today's world, which/what/how do I go about it ?

edit : I did some timing on lucene, and for indexing 1.8gb data, it took 5 minutes.
searching is pretty fast, unless I do a a*. a* takes 400 ~ 500 ms.
My biggest worry is indexing, which is taking loooonnnnggg time, and lot of resources!!

3

There are 3 answers

4
Richard H On BEST ANSWER

I have no experience other than with Lucene - it's pretty much the default indexing solution so don't think you can go too wrong.

10GB is not a lot of data. You'll be able to re-index it pretty rapidly - or keep it on SSDs for extra speed. And of course keep your whole index in RAM (which Lucene supports) for super-fast lookups.

0
Narayan On

My biggest worry is indexing, which is taking loooonnnnggg time, and lot of resources!!

Take a look at Lusql, we used it once, FWIW 100 GBdata from mysql on a decent machine took little more than an hour to index, on filesystem(NTFS)

Now if u add SSD or whatever ultra fast disk tecnnology, you can bring it down considerably

4
Shashikant Kore On

Please check Lucene wiki for tips on improving Lucene indexing speed. This is quite succinct. In general, Lucene is quite fast (it is used for real-time search.) The tips will be handy to figure out if you are missing out on something "obvious."