I have defined my Analyzer as below

@AnalyzerDefs({
@AnalyzerDef(name = "ngram",
            tokenizer = @TokenizerDef(factory = KeywordTokenizerFactory.class),
            filters = {
                    //@TokenFilterDef(factory = StandardFilterFactory.class),
                    @TokenFilterDef(factory = LowerCaseFilterFactory.class),
                    @TokenFilterDef(factory = NGramFilterFactory.class, params = {
                            @Parameter(name = "minGramSize", value = "3"),
                            @Parameter(name = "maxGramSize", value = "255") }) }),
//-----------------------------------------------------------------------
    @AnalyzerDef(name = "ngram_query",
            tokenizer = @TokenizerDef(factory = KeywordTokenizerFactory.class),
            filters = {
                    //@TokenFilterDef(factory = StandardFilterFactory.class),
                    @TokenFilterDef(factory = LowerCaseFilterFactory.class)
                    }) 
})

@Analyzer(definition = "ngram")
public class EPCAsset extends Asset {
    @Field
    private String obturatorMaterial;

}

It perfectly makes n-grams term vectors during index time. But it also makes n-gram of search query during search time.

What i want is a way by which search query uses n-gram index to search without breaking the search term into grams.

Note: I have to use n-gram here because the requirement is to search anywhere in the text. either start or in middle. so edge-n-gram is not an option for me.

Example: Input Data to be index ICQ 234

Then during index time its term vectors are

    "234"
    " 23"
    " 234"
    "cq "
    "cq 2"
    "cq 23"
    "cq 234"
    "icq"
    "icq "
    "icq 2"
    "icq 23"
    "icq 234"
    "q 2"
    "q 23"
    "q 234"

Now when I search icq it works perfectly. But it also works for icqabc As during search time it makes n-grams of search query. So is there a way that during search time it do not break the search term but use n-gram index for searching.

Here is my search query building

FullTextEntityManager fullTextEntityManager = Search
            .getFullTextEntityManager(entityManager);

QueryBuilder qb = fullTextEntityManager.getSearchFactory().buildQueryBuilder()
            .forEntity(entityClass).get();
Query query = qb.phrase().onField("obturatorMaterial").sentence("icqabc").createQuery();
FullTextQuery fullTextQuery = fullTextEntityManager.createFullTextQuery(query,
            entityClass);
fullTextQuery.getResultList()

I am using elastic search as backend for Hibernate search.

EDIT: I also has applied query time analyzer as per @yrodiere's answer but it give me error.

QueryBuilder qb = fullTextEntityManager.getSearchFactory().buildQueryBuilder()
            .forEntity(entityClass).overridesForField("obturatorMaterial","ngram_query").get();

org.hibernate.search.exception.SearchException: HSEARCH000353: Unknown analyzer: 'ngram_query'. Make sure you defined this analyzer.

EDIT

As per this link overriderForField when using elasticsearch backed hibernate search

I am now able to define a query time 2nd analyzer and it solved the problem.

2 Answers

1
yrodiere On Best Solutions

First, you should double check that an ngram filter really is what you want. I'm mentioning this because the ngram analyzer is generally used both at indexing and querying, so that it provides fuzzy matches. It's kind of the whole point of this analyzer.

Do you really need matches when the user types cq 2? Does it make sense? When implementing autocomplete, people generally prefer to only match documents containing words that start with the user input, so i would match, ic and icq would too, but not cq 2. If this seems to be what you want, you should have a look at the "edge_ngram" filter. It tends to improve the relevance of matches and also doesn't require as much disk space.

Now, even with the "edge_ngram" filter you will need to disable ngrams at query time. In Hibernate Search, this is done by "overriding" the analyzer.

  1. First, define a second analyzer, identical to the one you use during indexing, but without the "ngram" or "edge_ngram" filter. Name it "ngram_query".
  2. Then, use this to create your query builder:

    QueryBuilder queryBuilder = fullTextEntityManager.getSearchFactory().buildQueryBuilder().forEntity(EPCAsset.class)
        .overridesForField( "obturatorMaterial", "ngram_query" )
        .get();
    
  3. Use the query builder to create your query as usual.

Note that, if you rely on Hibernate Search to push the index schema and analyzers to Elasticsearch, you will have to use a hack in order for the query-only analyzer to be pushed: by default only the analyzers that are actually used during indexing are pushed. See https://discourse.hibernate.org/t/cannot-find-the-overridden-analyzer-when-using-overridesforfield/1043/4

1
Amit Khandelwal On

Either you need to use search time analyzer and very likely it would be the keyword analyzer during search time. Or need to use term query instead of match query, which is analyzed means it uses the same analyzer used index time.

Read more about term query and match query for more information.

Edit :- https://www.elastic.co/guide/en/elasticsearch/reference/current/search-analyzer.html clearly talked about the use of search_analyzer, in case of edgeNGram tokenizer and autocomplete search which is exactly your use case.