Search works as expected until I update a document in the index. The document that was updated no longer returns in searches, rather a complete unrelated document with docId=0 gets returned instead.
This is how I set it up:
var luceneVersion = LuceneVersion.LUCENE_48;
var analyzersPerField = new Dictionary<string, Analyzer>
{
["name"] = new KeywordAnalyzer()
};
var __analyzer = new PerFieldAnalyzerWrapper(new StandardAnalyzer(luceneVersion), analyzersPerField);
var __luceneDirectory = FSDirectory.Open(_searchDirectory);
var indexConfig = new IndexWriterConfig(luceneVersion, __analyzer)
{
OpenMode = OpenMode.CREATE_OR_APPEND,
};
var _writer = new IndexWriter(__luceneDirectory, indexConfig);
var __directoryReader = DirectoryReader.Open(__luceneDirectory);
var _searcher = new IndexSearcher(__directoryReader);
var _queryParser = new StandardQueryParser(__analyzer);
Here is how documents are defined:
private static Document CreateDocument(FileResult fileResult)
{
var document = new Document
{
new StringField("id", fileResult.Id, Field.Store.YES)
};
document.Add(new TextField("baseKeywords", fileResult.AlternativeName, Field.Store.NO));
document.Add(new StringField("name", fileResult.AlternativeName, Field.Store.YES));
return document;
}
Here is how documents are updated:
public void UpdateDocumentName(FileResult fileResult, string newName)
{
fileResult.AlternativeName = newName;
var document = CreateDocument(fileResult);
_writer.UpdateDocument(new Term("id", fileResult.Id), document);
}
After an update is done, I do a commit and create the new reader:
_writer.Commit();
__directoryReader = DirectoryReader.OpenIfChanged(__directoryReader) ?? __directoryReader;
_searcher = new IndexSearcher(__directoryReader);
How documents are searched:
_queryString = QueryParserUtil.Escape(searchParameters.QueryString);
var query = _queryParser.Parse(searchParameters.QueryString, "baseKeywords");
_searcher.Search(query, resultsCollector.GetCollector());
That custom collector is defined as:
private List<(int doc, float score)> _results;
public ICollector GetCollector()
{
return Collector.NewAnonymous(setScorer: (scorer) =>
{
_scorer = scorer;
}, collect: (doc) =>
{
_results.Add((doc, _scorer.GetScore()));
}, setNextReader: (context) =>
{
//
}, acceptsDocsOutOfOrder: () =>
{
return true;
});
}
These are excerpts from the full source, which is available here
This behavior is consistent, even when using a query on a different field that would only return the updated document, I get docId=0. If I open the index in Luke, this behavior is not present and I'm able to query for the updated document.
The document I'm updating has the highest docId number, so maybe I'm thinking it's a off by one problem?
I tried updated a different document, now I whenever I query for either, I get docId=0.
If I query for the document with docId=0, I get the document with docId=0, not the updated document.
I originally was using the reader from _writer.GetReader(applyAllDeletes: true); but switching to DirectoryReader.Open did not change anything.
Maybe this is some sort of default return value I'm not aware of? Not sure why it is returning docId=0 all the time.
The custom collector was set up wrong!
From the Lucene.NET documentation:
Here is the fixed collector: