I am trying to get all documents from Lucene Index (which is already not deleted ).
I heard that if I delete something from Lucene Index, Lucene will not delete immediately from file.
So I wanted to get the documents from Index file which is not deleted.
Lucene provides a bitset of all non-deleted documents, called
liveDocs
. You can get it by iterating over allLeafReader
s (or using theSlowCompositeReaderWrapper
) and calling theliveDocs
method or by using theMultiFields
class.Once you have this bitset, you can iterator from
0
toIndexReader#maxDoc
and consult the bitset to know whether a docid is representing a deleted document or a live one. You can access all stored fields of a deleted document just as you would from a live one.However, once a segment gets merged, its deleted documents are permanently deleted and thus removed from the index.