Elasticsearch - How to specify the same analyzer for search and index

1.1k views Asked by At

I'm working on a Spanish search engine. (I don't speak Spanish) But based on my research, the goal is more or less like this: 1. filter stopwords like "dos","de","la"... 2. stem the words for both search and index. e.g If you search "primera", then "primero","primer" should also show up.

My attempt:

es_analyzer={
        "settings": {
            "analysis": {
            "filter": {
                "spanish_stop": {
                "type":       "stop",
                "stopwords":  "_spanish_" 
                },
                "spanish_stemmer": {
                "type":       "stemmer",
                "language":   "spanish"
                }
            },
            "analyzer": {
                "default_search": {
                    "type": "spanish"
                },
                "rebuilt_spanish": {
                "tokenizer":  "standard",
                "filter": [
                    "lowercase",
                    "spanish_stop",
                    "spanish_stemmer"
                ]
                }
            }
            }
        }
    }

The problem: When I use "type":"spanish" in the "default_search", my query "primera" gets stemmed to "primer", which is correct, but even though I specified to use "spanish_stemmer" in the filter, the documents in the index aren't stemmed. So as a result when I search for "primera", it only shows exact matches for "primer". Any suggestions on fixing this?

Potential fix but I haven't figured out the syntax:

  1. Using built-in "spanish" analyzer in filter. What's the syntax?
  2. Adding spanish stemmer and stopwords in "default_search". But I don't know how to use compound settings there.
1

There are 1 answers

5
ESCoder On BEST ANSWER

Adding a working example with index data, mapping, search query, and search result

Index Mapping:

 {
  "settings": {
    "analysis": {
      "filter": {
        "spanish_stop": {
          "type": "stop",
          "stopwords": "_spanish_"
        },
        "spanish_stemmer": {
          "type": "stemmer",
          "language": "spanish"
        }
      },
      "analyzer": {
        "default_search": {
          "type":"spanish",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "spanish_stop",
            "spanish_stemmer"
          ]
        }
      }
    }
  },
  "mappings":{
    "properties":{
      "title":{
        "type":"text",
        "analyzer":"default_search"
      }
    }
  }
}

Index Data:

{
  "title": "primer"
}
{
  "title": "primera"
}
{
  "title": "primero"
}

Search Query:

{
  "query":{
    "match":{
      "title":"primer"
    }
  }
}

Search Result:

"hits": [
      {
        "_index": "stof_64420517",
        "_type": "_doc",
        "_id": "3",
        "_score": 0.13353139,
        "_source": {
          "title": "primer"
        }
      },
      {
        "_index": "stof_64420517",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.13353139,
        "_source": {
          "title": "primera"
        }
      },
      {
        "_index": "stof_64420517",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.13353139,
        "_source": {
          "title": "primero"
        }
      }
    ]