I am trying to add about 21,000 entities already in the database into an nhibernate-search Lucene index. When done, the indexes are around 12 megabytes. I think the time can vary quite a bit, but it's always very slow. In my last run (running with the debugger), it took over 12 minutes to index the data.
private void IndexProducts(ISessionFactory sessionFactory)
{
using (var hibernateSession = sessionFactory.GetCurrentSession())
using (var luceneSession = Search.CreateFullTextSession(hibernateSession))
{
var tx = luceneSession.BeginTransaction();
foreach (var prod in hibernateSession.Query<Product>())
{
luceneSession.Index(prod);
hibernateSession.Evict(prod);
}
hibernateSession.Clear();
tx.Commit();
}
}
The vast majority of the time is spent in tx.Commit(). From what I've read of Hibernate search, this is to be expected. I've come across quite a few ways to help, such as MassIndexer, flushToIndexes, batch modes, etc. But as far as I can tell these are Java-only options.
The session clear and evict are just desperate moves by me - I haven't seen them make a difference one way or another.
Has anyone had success quickly indexing a large amount of existing data?
I've been able to speed up considerable indexing by using a combination of batching and transactions.
My initial code took ~30 minutes to index ~20.000 entities. Using the code bellow I've got it down to ~4 minutes.