How often should I re-warm my Lucene index?

Question

How often should I re-warm my Lucene index?

1k views Asked by user1555451 At 26 January 2014 at 11:39

I was wondering if anyone else has had the the same Lucene (not Solr) situation?

When I open a Lucene index I warm it with a typical query and then keep the searcher cached for a period of time so that many queries can use it. I then re-open it and repeat. Because I am running Lucene 3.6 on Linux, as I understand it most of my open index data resides in the filesystem cache rather than the JVM heap. What I find is that the response time for queries increases over time - unless I keep re-warming the searcher by re-running my typical query. Has anyone else had this issue? If so, is re-warming the only way to keep he query responsive? How often works best?

Some background

the machine is always very busy doing other non-Lucene file processing, which makes me suspect the F/S cache pages are being replaced over time
my indexer does not run in the same JVM as my query server, so NRT etc. isn't relevant

Thanks!

Chris

Original Q&A

There are 2 answers

**mindas** · Answer 1 · 2014-01-27T09:46:41+00:00

mindas On 27 January 2014 at 09:46

Which directory are you using?

You can try playing with swappiness as explained http://wiki.apache.org/lucene-java/ImproveSearchingSpeed.

Another option would be using mlockall as explained in http://jprante.github.io/applications/2012/07/26/Mmap-with-Lucene.html.

**Salah** · Answer 2 · 2014-01-26T18:13:10+00:00

I think that this issue is not related to lucene itself, i think its an OS issues, as you know lucene is using java I/O libraries, which use the OS native I/O methods.

So what i think that happened that for each time you warm your searcher in a new query, your OS has cache the entire files that retrieved by that query, so if you re-warming the searcher in the same query, it will retrieve fast, but if warm your searcher in another query, then your OS need to cache the files again because its different files. and that is really an over head on your OS resources.

But i am really wondering why do want to keep your reader for a period of time, what i am trying to say is, if the search queries come from users, the percentage of repeating the same query is very weak, also creating a new IndexSearcher object is not that cost.

so my suggestions for you is to create a IndexSearcher for each query (get rid of the resources once you finish the job). if your business case can work with that.

TechQA.

How often should I re-warm my Lucene index?

Some background

There are 2 answers

Related Questions in JAVA

Related Questions in LUCENE

Related Questions in INDEXING

Related Questions in WARM-UP

Popular Questions

Popular Tags

Trending Questions