spoiler :
This is just another Lucene vs Sphinx vs whatever,
I saw that all other threads were almost two years old, so decided to start again..
Here is the requirement :
data size : max 10 GB.
rows : nearly billions
indexing should be fast
searching should be under 0 ms [ ok, joke... laugh... but keep this as low as possible ]
In today's world, which/what/how do I go about it ?
edit :
I did some timing on lucene, and for indexing 1.8gb data, it took 5 minutes.
searching is pretty fast, unless I do a a*. a* takes 400 ~ 500 ms.
My biggest worry is indexing, which is taking loooonnnnggg time, and lot of resources!!
I have no experience other than with Lucene - it's pretty much the default indexing solution so don't think you can go too wrong.
10GB is not a lot of data. You'll be able to re-index it pretty rapidly - or keep it on SSDs for extra speed. And of course keep your whole index in RAM (which Lucene supports) for super-fast lookups.