Single word hits in Lucene not found

Question

Single word hits in Lucene not found

247 views Asked by Geir K.H. At 06 December 2013 at 11:34

I'm making a system that looks through articles about different stuff and picks out some description about it. Basically a lot like a encyclopaedia. At first I ran into a problem where if I searched for "cat" I got a lot of hits to articles like "CAT5", "CAT6", ".cat" and so on. The number one hit was however still "Cat". I was using StandardAnalyzer for this. I received a tip to use WhitespaceAnalyzer instead which solved the original problem and made Lucene drop hits on articles like CAT6, but now the article "Cat" is no longer in my list of hits at all. Why is this? Any suggestions to for example a different analyzer?

EDIT: The code for the search itself:

public static String searchAbstracts(String input, int hitsPerPage) throws ParseException, IOException {
    String query = input;
    StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_41);
    Query q = new QueryParser(Version.LUCENE_41, "article", analyzer).parse(query);
    Directory index = new NIOFSDirectory(new File(INDEX_PATH));
    IndexReader reader = IndexReader.open(index);
    String resultSet = "";

    IndexSearcher searcher = new IndexSearcher(reader);
    TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
    searcher.search(q, collector);
    ScoreDoc[] hits = collector.topDocs().scoreDocs;

    System.out.println("Found " + hits.length + " articles.");

    for(int i=0;i<hits.length;++i) {
        int docId = hits[i].doc;
        Document d = searcher.doc(docId);
        resultSet += d.get("desc") + " ";
        System.out.println((i + 1) + ". " + d.get("article") + " :: Words from abstract: " + d.get("desc"));
    }
    return resultSet;
}

Original Q&A

There are 1 answers

**Arun** · Accepted Answer · 2013-12-06T13:14:03+00:00

When you run a sentence : "The quick Cat jumped over the lazy CAT6" through WhitespaceAnalyzer this is what it does to it:
[The] [quick] [Cat] [jumped] [over] [the] [lazy] [CAT6]

As you can see "Cat" is clearly with true case in the list, you should be able to find it. How are you querying it? During query what analyzer are you using?

TechQA.

Single word hits in Lucene not found

There are 1 answers

Related Questions in SEARCH

Related Questions in LUCENE

Related Questions in STANDARDANALYZER

Popular Questions

Trending Questions