ElasticSearch term suggest on analyzed field returns no suggestions

1.5k views Asked by At

I'd like to use ElasticSearch term suggest feature for spelling corrections (Did you mean ...?). Here's the official documentation:

Here's my (shortened to the basics) scheme:

{
    "settings": {
        "analysis": {
            "filter": {
                "en_stop_filter": { "type": "stop", "stopwords": ["_english_"] },
                "en_stem_filter": { "type": "stemmer", "name": "minimal_english" },
                "de_stop_filter": { "type": "stop", "stopwords": ["_german_"] },
                "de_stem_filter": { "type": "stemmer", "name": "minimal_german" }
            },
            "analyzer": {
                "en_analyzer": { "type": "custom", "tokenizer": "icu_tokenizer", "filter": ["icu_folding", "icu_normalizer", "en_stop_filter", "en_stem_filter"] },
                "de_analyzer": { "type": "custom", "tokenizer": "icu_tokenizer", "filter": ["icu_folding", "icu_normalizer", "de_stop_filter", "de_stem_filter"] }
            }
        }
    },
    "mappings": {
        "blog": {
            "_analyzer": { "path": "my_analyzer", "index": "no" },
            "properties": {
                "title": { "type": "string" },
                "my_analyzer": { "type": "string", "index": "no" }
            }
        },
        "photo": {
            "properties": {
                "tags_en": { "type": "string", "analyzer": "en_analyzer", "index_name": "tag_en" }
                "tags_de": { "type": "string", "analyzer": "de_analyzer", "index_name": "tag_de" }
            }
        }
    }
}

And that's how data in indexed via Python/Django for a) our blog:

data = ''
for i, p in enumerate(BlogPost.objects.all()):
    data += '{"index": {"_id": "%s"}}\n' % p.pk
    data += json.dumps({ "my_analyzer": p.language+"_analyzer", "title": p.title })+'\n'
resp = requests.put(ELASTICSEARCH_URL+'blog/_bulk', data=data)

I'm setting the analyzer according to the language of each blog post (p.language = 'de' or 'en'), either German or English.

I'm able to search this index (via Python) and I do get spelling suggestions returned with these params:

{
  "query": {
    "query_string": {
      "query": q,
      "analyzer": "en_analyzer"
    }
  },
  "suggest": {
    "my_suggestion": {
      "text": q,
      "term": {
        "size": 1,
        "field": "title"
      }
    }
  }
}

However, what I really need, are spelling suggestions for searches on our photo scheme, which is indexed by this (Python/Django):

for p in Photo.objects.all():
    data += '{"index": {"_id": "%s"}}\n' % p.pk
    data += json.dumps({
        "tags_cs": p.tags_en,
        "tags_de": p.tags_de
    })+'\n'
resp = requests.put(ELASTICSEARCH_URL+'photo/_bulk', data=data)

p.tags_en and p.tags_de may be indexed as comma-separated tag strings, or as actual lists of strings. Both work for ElasticSearch and it doesn't seem to make a difference for this problem.

Searching photos works, both in English and German, but no spelling suggestions ever get returned:

{
  "query": {
    "query_string": {
      "query": q,
      "fields": [
        "tags_en"
      ],
      "analyzer": "en_analyzer"
    }
  },
  "suggest": {
    "my_suggestion": {
      "text": q,
      "term": {
        "size": 1,
        "field": "tags_en"
      }
    }
  }
}

It doesn't make a difference if I define an analyzer for the suggestion term, like this:

{
  "query": {
    "query_string": {
      "query": q,
      "fields": [
        "tags_en"
      ],
      "analyzer": "en_analyzer"
    }
  },
  "suggest": {
    "my_suggestion": {
      "text": q,
      "term": {
        "size": 1,
        "field": "tags_en",
        "analyzer": "en_analyzer"
      }
    }
  }
}

Note the difference in analyzing between blog posts and photos: Our blog posts get analyzed in one language per post. via the my_analyzer field in the scheme. Our photos, however, are analyzed on a per-field basis. We do have 20 languages (only two are shown here to keep code as small as possible) and each tag-field is analyzed accordingly. If I remove this type of analyzation for photos, I also get suggestions there, but we really do need the field-based analyzers.

So the issue must have something to do with the analyzers, but I'm totally stuck. Any ideas?

1

There are 1 answers

0
Simon Steinberger On

A working solution/workaround is to simply include a non-analyzed field in the scheme and match term suggestions on this field only. It works for us, however it should be possible without this extra data.